Creating a SQLite database#

2024 NHERI Computational Academy
Scott J. Brandenberg sjbrandenberg@g.ucla.edu

Jupyter Notebooks#

Complete Notebook
Try on DesignSafe
Template
Try on DesignSafe

About SQLite#

This script will create a SQLite database, which is a lightweight file-based SQL database engine. It comes installed with Python, so you don’t need to install any special packages or set up a SQL server on your computer to use it. Details can be found here: https://www.sqlite.org/. There are, of course, other relational database management systems, like MySQL, Postgres, etc. However, to use these database engines, you either need to set up those databases on a server, or use software to make your computer simulate a server (e.g., XAMPP). So we’ll use SQLite here.

Learning outcomes#

This notebook will demonstrate how to:

  1. Create a SQLite database

  2. Add tables to a SQLite database using CREATE TABLE statements, including data types and key constraints

  3. Use INSERT INTO statements to add data to tables

  4. Query individual tables using SELECT statements

  5. Update table entries using UPDATE statements

  6. Query multiple tables using SELECT and JOIN statements

Example database#

We will create a very simple database consisting of three tables related to earthquake ground motions. The tables are specified below.

Events table#

event_id

event_name

magnitude

epicenter_latitude [deg]

epicentrer_longitude [deg]

1

Woodland Hills

6.3

34.0689

-118.4452

2

Hollywood Valley

7.2

34.1027

-118.3404

Stations table#

station_id

station_name

station_latitude [deg]

station_longitude [deg]

\(V_{S30}\) [m/s]

1

Factor Building

34.06693

-118.44203

380

2

Santa Monica Courthouse

34.10978

-118.49027

215

Motions table#

motion_id

event_id

station_id

\(R_{JB}\) [km]

\(PGA\) [\(m/s^2\)]

1

1

1

2.0

0.84

2

1

2

14.0

0.28

3

2

1

20.0

0.61

4

2

2

30.0

0.32

import sqlite3
import os
import pandas as pd

# Create an in-memory SQLite database. You can also create a SQLite database stored in 
# a file by replacing ':memory:' with 'filename.db'.
con = sqlite3.connect(':memory:')

# Create a cursor object. A cursor is a middleware object that 
cur = con.cursor()

# create event table
res = cur.execute(
    """
    CREATE TABLE events(
        event_id INTEGER PRIMARY KEY, 
        event_name TEXT NOT NULL, 
        magnitude REAL NOT NULL, 
        latitude REAL NOT NULL, 
        longitude REAL NOT NULL
    )
    """
)
# Put some data into the events table

sql = """
    INSERT INTO events VALUES
    (1, 'Woodland Hill', 6.3, 34.0689, 118.4452),
    (2, 'Hollywood Valley', 7.2, 34.1027, 118.3404)
"""
res = cur.execute(sql)
# Query event table to verify we can retrieve data from it.

res = cur.execute('SELECT * FROM events')
cols = [column[0] for column in res.description]
results= pd.DataFrame.from_records(data = res.fetchall(), columns = cols)
results
event_id event_name magnitude latitude longitude
0 1 Woodland Hill 6.3 34.0689 118.4452
1 2 Hollywood Valley 7.2 34.1027 118.3404
# Aww snap, looks like I made a mistake entering the event_name for the Woodland Hills event.
# Let's fix that using an UPDATE statement

sql = """
UPDATE events 
SET event_name = 'Woodland Hills' 
WHERE event_id = 1
"""
res = cur.execute(sql)
# Let's make sure the table has been updated

res = cur.execute('SELECT * FROM events')
cols = [column[0] for column in res.description]
results= pd.DataFrame.from_records(data = res.fetchall(), columns = cols)
results
event_id event_name magnitude latitude longitude
0 1 Woodland Hills 6.3 34.0689 118.4452
1 2 Hollywood Valley 7.2 34.1027 118.3404
# Now create stations table

sql = """
CREATE TABLE stations(
    station_id INTEGER PRIMARY KEY,
    station_name TEXT NOT NULL,
    station_latitude REAL NOT NULL,
    station_longitude REAL NOT NULL,
    VS30 REAL
)
"""

res = cur.execute(sql)

sql = """
INSERT INTO stations VALUES
    (1,'Factor Building', 34.06693, -118.44203, 380),
    (2, 'Santa Monica Courthouse', 34.10978, -118.49027, 215)
"""

res = cur.execute(sql)
res = cur.execute('SELECT * FROM stations')
res.fetchall()
[(1, 'Factor Building', 34.06693, -118.44203, 380.0),
 (2, 'Santa Monica Courthouse', 34.10978, -118.49027, 215.0)]
# Create motions table

sql = """
CREATE TABLE motions (
    motion_id   INTEGER PRIMARY KEY,
    event_id INTEGER NOT NULL,
    station_id INTEGER NOT NULL,
    rjb REAL NOT NULL,
    pga REAL NOT NULL,
    FOREIGN KEY (station_id) REFERENCES stations (station_id),
    FOREIGN KEY (event_id) REFERENCES events (event_id)
    )
"""
res = cur.execute(sql)
res = cur.execute("""
    INSERT INTO motions VALUES
        (1, 1, 1, 2.0, 0.84),
        (2, 1, 2, 14.0, 0.28),
        (3, 2, 1, 20.0, 0.61),
        (4, 2, 2, 30.0, 0.32)
""")
# Run a SELECT query with JOIN statements to flatten all three tables into a 
# single Pandas Dataframe

sql = """
SELECT * FROM motions 
JOIN events ON motions.event_id = events.event_id 
JOIN stations ON motions.station_id = stations.station_id
"""
res = cur.execute(sql)
cols = [column[0] for column in res.description]
results= pd.DataFrame.from_records(data = res.fetchall(), columns = cols)
results
motion_id event_id station_id rjb pga event_id event_name magnitude latitude longitude station_id station_name station_latitude station_longitude VS30
0 1 1 1 2.0 0.84 1 Woodland Hills 6.3 34.0689 118.4452 1 Factor Building 34.06693 -118.44203 380.0
1 2 1 2 14.0 0.28 1 Woodland Hills 6.3 34.0689 118.4452 2 Santa Monica Courthouse 34.10978 -118.49027 215.0
2 3 2 1 20.0 0.61 2 Hollywood Valley 7.2 34.1027 118.3404 1 Factor Building 34.06693 -118.44203 380.0
3 4 2 2 30.0 0.32 2 Hollywood Valley 7.2 34.1027 118.3404 2 Santa Monica Courthouse 34.10978 -118.49027 215.0