Principles of Object-oriented Design for Data Scientists

Principles of Object-oriented Design for Data Scientists is a practical programming skill Ã¢â‚¬â€ something you'll reach for on almost every data-science project in Python. This guide focuses on the idiomatic patterns professional engineers actually use, not textbook toy examples.

Why Principles Object-oriented Design Matters

Data scientists who write clean, testable, well-structured Python ship faster, re-use more and collaborate better. Craftsmanship here pays dividends on every subsequent project.

Write small, composable functions with explicit inputs and outputs.
Prefer built-in data structures and the standard library where they fit.
Handle failure with narrow, named exceptions instead of bare except.
Measure before you optimise Ã¢â‚¬â€ always profile first.

How Principles Object-oriented Design Shows Up in Practice

In a typical project, principles of object-oriented design for data scientists is combined with the rest of the Python Programming toolkit. You rarely use any one technique in isolation; the real skill is knowing which combination fits the problem you are trying to solve, and being able to explain that choice to a non-technical stakeholder.

This shows up every day: building pipelines, writing analysis notebooks, packaging reusable utilities and reviewing a teammate's pull request.

Back to the Data Science curriculum Ã¢â€ â€™

Code Examples: Principles of Object-oriented Design for Data (5 runnable snippets)

Copy any block into a file or notebook and run it end-to-end Ã¢â‚¬â€ each example stands alone.

Example 1: Concurrent I/O with asyncio + aiohttp

# Example 1: Concurrent I/O with asyncio + aiohttp -- Principles of Object-oriented Design for Data
import asyncio
import aiohttp

URLS = [
    "https://httpbin.org/uuid",
    "https://httpbin.org/user-agent",
    "https://httpbin.org/ip",
    "https://httpbin.org/headers",
]

async def fetch(session, url):
    async with session.get(url, timeout=10) as resp:
        return url, resp.status, len(await resp.text())

async def main():
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(*(fetch(session, u) for u in URLS))
    for url, status, size in results:
        print(f"{status}  {size:>5} bytes  {url}")

asyncio.run(main())

Example 2: Typed dataclass with custom methods

# Example 2: Typed dataclass with custom methods -- Principles of Object-oriented Design for Data
from dataclasses import dataclass, field
from typing import Iterable

@dataclass(slots=True)
class Sample:
    id: int
    features: list[float] = field(default_factory=list)
    label:    str | None  = None

    def norm(self) -> float:
        return sum(x * x for x in self.features) ** 0.5

    def scaled(self, factor: float) -> "Sample":
        return Sample(self.id, [x * factor for x in self.features], self.label)

def build(rows: Iterable[tuple[int, list[float], str]]) -> list[Sample]:
    return [Sample(i, f, y) for i, f, y in rows]

batch = build([(1, [1.0, 2.0], "A"), (2, [-3.0, 4.0], "B")])
for s in batch:
    print(s.id, round(s.norm(), 3), s.label)

Example 3: Generators, itertools and lazy pipelines

# Example 3: Generators, itertools and lazy pipelines -- Principles of Object-oriented Design for Data
from itertools import islice, accumulate

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

def running_stats(seq):
    total, n = 0, 0
    for x in seq:
        total += x
        n     += 1
        yield x, total / n, total

first10 = list(islice(fibonacci(), 10))
print("fib(0..9)   :", first10)
for x, avg, cum in islice(running_stats(first10), 10):
    print(f"  x={x:>3}  mean={avg:>6.2f}  cumulative={cum:>4}")

partial_sums = list(accumulate(first10))
print("partial sums:", partial_sums)

Example 4: Context manager with timing and error handling

# Example 4: Context manager with timing and error handling -- Principles of Object-oriented Design for Data
from contextlib import contextmanager
import time, traceback

@contextmanager
def timed(name: str):
    t0 = time.perf_counter()
    try:
        yield
    except Exception as exc:
        print(f"[{name}] failed: {exc!r}")
        traceback.print_exc()
        raise
    finally:
        dt_ms = (time.perf_counter() - t0) * 1_000
        print(f"[{name}] took {dt_ms:.2f} ms")

with timed("hash 1M ints"):
    total = sum(hash(i) for i in range(1_000_000))
print("result:", total % 9_973)

Example 5: Decorator for memoised pure functions

# Example 5: Decorator for memoised pure functions -- Principles of Object-oriented Design for Data
from functools import wraps

def memoise(fn):
    cache: dict = {}
    @wraps(fn)
    def inner(*args):
        if args not in cache:
            cache[args] = fn(*args)
        return cache[args]
    inner.cache = cache
    return inner

@memoise
def fib(n: int) -> int:
    return n if n < 2 else fib(n - 1) + fib(n - 2)

print([fib(i) for i in range(15)])
print("cache entries:", len(fib.cache))