Creating a SQLite database

Creating a SQLite database#

2024 NHERI Computational Academy
Scott J. Brandenberg sjbrandenberg@g.ucla.edu

Jupyter Notebooks#

Complete Notebook

Template

About SQLite#

This script will create a SQLite database, which is a lightweight file-based SQL database engine. It comes installed with Python, so you don’t need to install any special packages or set up a SQL server on your computer to use it. Details can be found here: https://www.sqlite.org/. There are, of course, other relational database management systems, like MySQL, Postgres, etc. However, to use these database engines, you either need to set up those databases on a server, or use software to make your computer simulate a server (e.g., XAMPP). So we’ll use SQLite here.

Learning outcomes#

This notebook will demonstrate how to:

Create a SQLite database
Add tables to a SQLite database using CREATE TABLE statements, including data types and key constraints
Use INSERT INTO statements to add data to tables
Query individual tables using SELECT statements
Update table entries using UPDATE statements
Query multiple tables using SELECT and JOIN statements

Example database#

We will create a very simple database consisting of three tables related to earthquake ground motions. The tables are specified below.

Events table#

event_id	event_name	magnitude	epicenter_latitude [deg]	epicentrer_longitude [deg]
1	Woodland Hills	6.3	34.0689	-118.4452
2	Hollywood Valley	7.2	34.1027	-118.3404

Stations table#

station_id	station_name	station_latitude [deg]	station_longitude [deg]	\(V_{S30}\) [m/s]
1	Factor Building	34.06693	-118.44203	380
2	Santa Monica Courthouse	34.10978	-118.49027	215

Motions table#

motion_id	event_id	station_id	\(R_{JB}\) [km]	\(PGA\) [\(m/s^2\)]
1	1	1	2.0	0.84
2	1	2	14.0	0.28
3	2	1	20.0	0.61
4	2	2	30.0	0.32

import sqlite3
import os
import pandas as pd

# Create an in-memory SQLite database. You can also create a SQLite database stored in 
# a file by replacing ':memory:' with 'filename.db'.
con = sqlite3.connect(':memory:')

# Create a cursor object. A cursor is a middleware object that 
cur = con.cursor()

# create event table
res = cur.execute(
    """
    CREATE TABLE events(
        event_id INTEGER PRIMARY KEY, 
        event_name TEXT NOT NULL, 
        magnitude REAL NOT NULL, 
        latitude REAL NOT NULL, 
        longitude REAL NOT NULL
    )
    """
)

# Put some data into the events table

sql = """
    INSERT INTO events VALUES
    (1, 'Woodland Hill', 6.3, 34.0689, 118.4452),
    (2, 'Hollywood Valley', 7.2, 34.1027, 118.3404)
"""
res = cur.execute(sql)

# Query event table to verify we can retrieve data from it.

res = cur.execute('SELECT * FROM events')
cols = [column[0] for column in res.description]
results= pd.DataFrame.from_records(data = res.fetchall(), columns = cols)
results

	event_id	event_name	magnitude	latitude	longitude
0	1	Woodland Hill	6.3	34.0689	118.4452
1	2	Hollywood Valley	7.2	34.1027	118.3404

# Aww snap, looks like I made a mistake entering the event_name for the Woodland Hills event.
# Let's fix that using an UPDATE statement

sql = """
UPDATE events 
SET event_name = 'Woodland Hills' 
WHERE event_id = 1
"""
res = cur.execute(sql)

# Let's make sure the table has been updated

res = cur.execute('SELECT * FROM events')
cols = [column[0] for column in res.description]
results= pd.DataFrame.from_records(data = res.fetchall(), columns = cols)
results

	event_id	event_name	magnitude	latitude	longitude
0	1	Woodland Hills	6.3	34.0689	118.4452
1	2	Hollywood Valley	7.2	34.1027	118.3404

# Now create stations table

sql = """
CREATE TABLE stations(
    station_id INTEGER PRIMARY KEY,
    station_name TEXT NOT NULL,
    station_latitude REAL NOT NULL,
    station_longitude REAL NOT NULL,
    VS30 REAL
)
"""

res = cur.execute(sql)

sql = """
INSERT INTO stations VALUES
    (1,'Factor Building', 34.06693, -118.44203, 380),
    (2, 'Santa Monica Courthouse', 34.10978, -118.49027, 215)
"""

res = cur.execute(sql)

res = cur.execute('SELECT * FROM stations')
res.fetchall()

[(1, 'Factor Building', 34.06693, -118.44203, 380.0),
 (2, 'Santa Monica Courthouse', 34.10978, -118.49027, 215.0)]

# Create motions table

sql = """
CREATE TABLE motions (
    motion_id   INTEGER PRIMARY KEY,
    event_id INTEGER NOT NULL,
    station_id INTEGER NOT NULL,
    rjb REAL NOT NULL,
    pga REAL NOT NULL,
    FOREIGN KEY (station_id) REFERENCES stations (station_id),
    FOREIGN KEY (event_id) REFERENCES events (event_id)
    )
"""
res = cur.execute(sql)

res = cur.execute("""
    INSERT INTO motions VALUES
        (1, 1, 1, 2.0, 0.84),
        (2, 1, 2, 14.0, 0.28),
        (3, 2, 1, 20.0, 0.61),
        (4, 2, 2, 30.0, 0.32)
""")

# Run a SELECT query with JOIN statements to flatten all three tables into a 
# single Pandas Dataframe

sql = """
SELECT * FROM motions 
JOIN events ON motions.event_id = events.event_id 
JOIN stations ON motions.station_id = stations.station_id
"""
res = cur.execute(sql)
cols = [column[0] for column in res.description]
results= pd.DataFrame.from_records(data = res.fetchall(), columns = cols)
results

	motion_id	event_id	station_id	rjb	pga	event_id	event_name	magnitude	latitude	longitude	station_id	station_name	station_latitude	station_longitude	VS30
0	1	1	1	2.0	0.84	1	Woodland Hills	6.3	34.0689	118.4452	1	Factor Building	34.06693	-118.44203	380.0
1	2	1	2	14.0	0.28	1	Woodland Hills	6.3	34.0689	118.4452	2	Santa Monica Courthouse	34.10978	-118.49027	215.0
2	3	2	1	20.0	0.61	2	Hollywood Valley	7.2	34.1027	118.3404	1	Factor Building	34.06693	-118.44203	380.0
3	4	2	2	30.0	0.32	2	Hollywood Valley	7.2	34.1027	118.3404	2	Santa Monica Courthouse	34.10978	-118.49027	215.0