Creating a SQLite database#
2024 NHERI Computational Academy
Scott J. Brandenberg sjbrandenberg@g.ucla.edu
Jupyter Notebooks#
About SQLite#
This script will create a SQLite database, which is a lightweight file-based SQL database engine. It comes installed with Python, so you don’t need to install any special packages or set up a SQL server on your computer to use it. Details can be found here: https://www.sqlite.org/. There are, of course, other relational database management systems, like MySQL, Postgres, etc. However, to use these database engines, you either need to set up those databases on a server, or use software to make your computer simulate a server (e.g., XAMPP). So we’ll use SQLite here.
Learning outcomes#
This notebook will demonstrate how to:
Create a SQLite database
Add tables to a SQLite database using CREATE TABLE statements, including data types and key constraints
Use INSERT INTO statements to add data to tables
Query individual tables using SELECT statements
Update table entries using UPDATE statements
Query multiple tables using SELECT and JOIN statements
Example database#
We will create a very simple database consisting of three tables related to earthquake ground motions. The tables are specified below.
Events table#
event_id |
event_name |
magnitude |
epicenter_latitude [deg] |
epicentrer_longitude [deg] |
---|---|---|---|---|
1 |
Woodland Hills |
6.3 |
34.0689 |
-118.4452 |
2 |
Hollywood Valley |
7.2 |
34.1027 |
-118.3404 |
Stations table#
station_id |
station_name |
station_latitude [deg] |
station_longitude [deg] |
\(V_{S30}\) [m/s] |
---|---|---|---|---|
1 |
Factor Building |
34.06693 |
-118.44203 |
380 |
2 |
Santa Monica Courthouse |
34.10978 |
-118.49027 |
215 |
Motions table#
motion_id |
event_id |
station_id |
\(R_{JB}\) [km] |
\(PGA\) [\(m/s^2\)] |
---|---|---|---|---|
1 |
1 |
1 |
2.0 |
0.84 |
2 |
1 |
2 |
14.0 |
0.28 |
3 |
2 |
1 |
20.0 |
0.61 |
4 |
2 |
2 |
30.0 |
0.32 |
import sqlite3
import os
import pandas as pd
# Create an in-memory SQLite database. You can also create a SQLite database stored in
# a file by replacing ':memory:' with 'filename.db'.
con = sqlite3.connect(':memory:')
# Create a cursor object. A cursor is a middleware object that
cur = con.cursor()
# create event table
res = cur.execute(
"""
CREATE TABLE events(
event_id INTEGER PRIMARY KEY,
event_name TEXT NOT NULL,
magnitude REAL NOT NULL,
latitude REAL NOT NULL,
longitude REAL NOT NULL
)
"""
)
# Put some data into the events table
sql = """
INSERT INTO events VALUES
(1, 'Woodland Hill', 6.3, 34.0689, 118.4452),
(2, 'Hollywood Valley', 7.2, 34.1027, 118.3404)
"""
res = cur.execute(sql)
# Query event table to verify we can retrieve data from it.
res = cur.execute('SELECT * FROM events')
cols = [column[0] for column in res.description]
results= pd.DataFrame.from_records(data = res.fetchall(), columns = cols)
results
event_id | event_name | magnitude | latitude | longitude | |
---|---|---|---|---|---|
0 | 1 | Woodland Hill | 6.3 | 34.0689 | 118.4452 |
1 | 2 | Hollywood Valley | 7.2 | 34.1027 | 118.3404 |
# Aww snap, looks like I made a mistake entering the event_name for the Woodland Hills event.
# Let's fix that using an UPDATE statement
sql = """
UPDATE events
SET event_name = 'Woodland Hills'
WHERE event_id = 1
"""
res = cur.execute(sql)
# Let's make sure the table has been updated
res = cur.execute('SELECT * FROM events')
cols = [column[0] for column in res.description]
results= pd.DataFrame.from_records(data = res.fetchall(), columns = cols)
results
event_id | event_name | magnitude | latitude | longitude | |
---|---|---|---|---|---|
0 | 1 | Woodland Hills | 6.3 | 34.0689 | 118.4452 |
1 | 2 | Hollywood Valley | 7.2 | 34.1027 | 118.3404 |
# Now create stations table
sql = """
CREATE TABLE stations(
station_id INTEGER PRIMARY KEY,
station_name TEXT NOT NULL,
station_latitude REAL NOT NULL,
station_longitude REAL NOT NULL,
VS30 REAL
)
"""
res = cur.execute(sql)
sql = """
INSERT INTO stations VALUES
(1,'Factor Building', 34.06693, -118.44203, 380),
(2, 'Santa Monica Courthouse', 34.10978, -118.49027, 215)
"""
res = cur.execute(sql)
res = cur.execute('SELECT * FROM stations')
res.fetchall()
[(1, 'Factor Building', 34.06693, -118.44203, 380.0),
(2, 'Santa Monica Courthouse', 34.10978, -118.49027, 215.0)]
# Create motions table
sql = """
CREATE TABLE motions (
motion_id INTEGER PRIMARY KEY,
event_id INTEGER NOT NULL,
station_id INTEGER NOT NULL,
rjb REAL NOT NULL,
pga REAL NOT NULL,
FOREIGN KEY (station_id) REFERENCES stations (station_id),
FOREIGN KEY (event_id) REFERENCES events (event_id)
)
"""
res = cur.execute(sql)
res = cur.execute("""
INSERT INTO motions VALUES
(1, 1, 1, 2.0, 0.84),
(2, 1, 2, 14.0, 0.28),
(3, 2, 1, 20.0, 0.61),
(4, 2, 2, 30.0, 0.32)
""")
# Run a SELECT query with JOIN statements to flatten all three tables into a
# single Pandas Dataframe
sql = """
SELECT * FROM motions
JOIN events ON motions.event_id = events.event_id
JOIN stations ON motions.station_id = stations.station_id
"""
res = cur.execute(sql)
cols = [column[0] for column in res.description]
results= pd.DataFrame.from_records(data = res.fetchall(), columns = cols)
results
motion_id | event_id | station_id | rjb | pga | event_id | event_name | magnitude | latitude | longitude | station_id | station_name | station_latitude | station_longitude | VS30 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 2.0 | 0.84 | 1 | Woodland Hills | 6.3 | 34.0689 | 118.4452 | 1 | Factor Building | 34.06693 | -118.44203 | 380.0 |
1 | 2 | 1 | 2 | 14.0 | 0.28 | 1 | Woodland Hills | 6.3 | 34.0689 | 118.4452 | 2 | Santa Monica Courthouse | 34.10978 | -118.49027 | 215.0 |
2 | 3 | 2 | 1 | 20.0 | 0.61 | 2 | Hollywood Valley | 7.2 | 34.1027 | 118.3404 | 1 | Factor Building | 34.06693 | -118.44203 | 380.0 |
3 | 4 | 2 | 2 | 30.0 | 0.32 | 2 | Hollywood Valley | 7.2 | 34.1027 | 118.3404 | 2 | Santa Monica Courthouse | 34.10978 | -118.49027 | 215.0 |