Principles of Relational Database Design and Normalization Theory
Principles of Relational Database Design and Normalization Theory sits at the boundary between data engineering and analytics. Whether your data lives in Postgres, Snowflake or a data lake, the concepts in this lesson let you ingest, query and reshape it efficiently at scale.
Why Principles Relational Database Matters
A data scientist fluent in SQL is independent of analysts and engineers for most data access. That independence lets you iterate faster and ask sharper, better-informed questions of the data.
- Keep analytical queries declarative — describe what, not how.
- Lean on window functions and CTEs for readability.
- Design indexes based on query patterns, not table structure.
- Know when to push computation to the database versus to Python.
How Principles Relational Database Shows Up in Practice
In a typical project, principles of relational database design and normalization theory is combined with the rest of the SQL & Databases toolkit. You rarely use any one technique in isolation; the real skill is knowing which combination fits the problem you are trying to solve, and being able to explain that choice to a non-technical stakeholder.
Expect to use this when pulling training data, shipping a dashboard, optimising a slow query or architecting a new analytical table.
- The Macroeconomic Impact of Big Data
- Data Ingestion Manipulation Structured Query Language
- Advanced SQL Window Functions Ctes Query
- High-performance SQL and Database Indexing Strategies
Back to the Data Science curriculum →
Code Examples: Principles Relational Database Design Normalization Theory (5 runnable snippets)
Copy any block into a file or notebook and run it end-to-end — each example stands alone.
Example 1: SQLAlchemy Core: typed schema + bulk insert
# Example 1: SQLAlchemy Core: typed schema + bulk insert -- Principles Relational Database Design Normalization Theory
from sqlalchemy import (create_engine, MetaData, Table, Column,
Integer, String, Float, select, func)
engine = create_engine("sqlite:///:memory:", future=True)
meta = MetaData()
orders = Table("orders", meta,
Column("id", Integer, primary_key=True),
Column("customer", String(64), nullable=False),
Column("amount", Float, nullable=False),
)
meta.create_all(engine)
with engine.begin() as conn:
conn.execute(orders.insert(), [
{"customer": "Ana", "amount": 42.0},
{"customer": "Bob", "amount": 99.5},
{"customer": "Ana", "amount": 17.3},
])
stmt = (select(orders.c.customer, func.sum(orders.c.amount).label("total"))
.group_by(orders.c.customer))
for row in conn.execute(stmt):
print(row.customer, round(row.total, 2))
Example 2: Recursive CTE for hierarchical data
# Example 2: Recursive CTE for hierarchical data -- Principles Relational Database Design Normalization Theory
import sqlite3
con = sqlite3.connect(":memory:")
con.executescript("""
CREATE TABLE org (id INT PRIMARY KEY, name TEXT, manager_id INT);
INSERT INTO org VALUES
(1,'Ada', NULL),
(2,'Ben', 1),
(3,'Cai', 1),
(4,'Dee', 2),
(5,'Eli', 2),
(6,'Fay', 3),
(7,'Gio', 5);
""")
query = """
WITH RECURSIVE chain(id, name, level, path) AS (
SELECT id, name, 0, name
FROM org
WHERE manager_id IS NULL
UNION ALL
SELECT o.id, o.name, c.level + 1, c.path || ' / ' || o.name
FROM org o JOIN chain c ON o.manager_id = c.id
)
SELECT level, id, printf('%s%s', replace(hex(zeroblob(level*2)),'00',' '),
name) AS tree
FROM chain
ORDER BY path;
"""
for row in con.execute(query):
print(row)
Example 3: Window functions and common table expressions
# Example 3: Window functions and common table expressions -- Principles Relational Database Design Normalization Theory
import sqlite3
con = sqlite3.connect(":memory:")
con.executescript("""
CREATE TABLE sales (day DATE, region TEXT, amount REAL);
INSERT INTO sales VALUES
('2026-01-01','NA', 1200), ('2026-01-02','NA', 1550),
('2026-01-01','EU', 900), ('2026-01-02','EU', 1180),
('2026-01-03','NA', 1700), ('2026-01-03','EU', 1220);
""")
query = """
WITH daily AS (
SELECT region, day, SUM(amount) AS revenue
FROM sales GROUP BY region, day
)
SELECT region, day, revenue,
SUM(revenue) OVER (PARTITION BY region ORDER BY day) AS running_total,
RANK() OVER (PARTITION BY day ORDER BY revenue DESC) AS day_rank
FROM daily
ORDER BY day, region;
"""
for row in con.execute(query):
print(row)
Example 4: Parameterised upsert against an indexed table
# Example 4: Parameterised upsert against an indexed table -- Principles Relational Database Design Normalization Theory
import sqlite3
con = sqlite3.connect(":memory:")
con.execute("""
CREATE TABLE users (
id INTEGER PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
visits INTEGER NOT NULL DEFAULT 0
);
""")
def record_visit(email: str) -> None:
con.execute("""
INSERT INTO users (email, visits) VALUES (?, 1)
ON CONFLICT(email) DO UPDATE
SET visits = visits + 1;
""", (email,))
for e in ["a@x.com", "b@x.com", "a@x.com", "a@x.com", "b@x.com"]:
record_visit(e)
for row in con.execute("SELECT email, visits FROM users ORDER BY visits DESC"):
print(row)
Example 5: EXPLAIN QUERY PLAN before and after indexing
# Example 5: EXPLAIN QUERY PLAN before and after indexing -- Principles Relational Database Design Normalization Theory
import sqlite3, random
con = sqlite3.connect(":memory:")
con.execute("CREATE TABLE events (id INTEGER PRIMARY KEY, user_id INT, kind TEXT);")
rng = random.Random(0)
rows = [(i, rng.randint(1, 10_000), rng.choice(["click", "view", "buy"]))
for i in range(200_000)]
con.executemany("INSERT INTO events VALUES (?, ?, ?)", rows)
sql = "SELECT COUNT(*) FROM events WHERE user_id = 4242 AND kind = 'buy'"
print("BEFORE index:")
for r in con.execute("EXPLAIN QUERY PLAN " + sql):
print(" ", r)
con.execute("CREATE INDEX idx_events_user_kind ON events(user_id, kind);")
print("AFTER index:")
for r in con.execute("EXPLAIN QUERY PLAN " + sql):
print(" ", r)
print("count =", con.execute(sql).fetchone()[0])