Principles of Object-oriented Design for Data Scientists
Principles of Object-oriented Design for Data Scientists is a practical programming skill — something you'll reach for on almost every data-science project in Python. This guide focuses on the idiomatic patterns professional engineers actually use, not textbook toy examples.
Why Principles Object-oriented Design Matters
Data scientists who write clean, testable, well-structured Python ship faster, re-use more and collaborate better. Craftsmanship here pays dividends on every subsequent project.
- Write small, composable functions with explicit inputs and outputs.
- Prefer built-in data structures and the standard library where they fit.
- Handle failure with narrow, named exceptions instead of bare except.
- Measure before you optimise — always profile first.
How Principles Object-oriented Design Shows Up in Practice
In a typical project, principles of object-oriented design for data scientists is combined with the rest of the Python Programming toolkit. You rarely use any one technique in isolation; the real skill is knowing which combination fits the problem you are trying to solve, and being able to explain that choice to a non-technical stakeholder.
This shows up every day: building pipelines, writing analysis notebooks, packaging reusable utilities and reviewing a teammate's pull request.
- Management of Professional Development Environments and
- Pythonic Code Idiomatic Expressions Adherence Pep
- Advanced Data Structures and Algorithmic Complexity
- Control Flow Iterators and Generators in
Back to the Data Science curriculum →
Code Examples: Principles of Object-oriented Design for Data (5 runnable snippets)
Copy any block into a file or notebook and run it end-to-end — each example stands alone.
Example 1: Concurrent I/O with asyncio + aiohttp
# Example 1: Concurrent I/O with asyncio + aiohttp -- Principles of Object-oriented Design for Data
import asyncio
import aiohttp
URLS = [
"https://httpbin.org/uuid",
"https://httpbin.org/user-agent",
"https://httpbin.org/ip",
"https://httpbin.org/headers",
]
async def fetch(session, url):
async with session.get(url, timeout=10) as resp:
return url, resp.status, len(await resp.text())
async def main():
async with aiohttp.ClientSession() as session:
results = await asyncio.gather(*(fetch(session, u) for u in URLS))
for url, status, size in results:
print(f"{status} {size:>5} bytes {url}")
asyncio.run(main())
Example 2: Typed dataclass with custom methods
# Example 2: Typed dataclass with custom methods -- Principles of Object-oriented Design for Data
from dataclasses import dataclass, field
from typing import Iterable
@dataclass(slots=True)
class Sample:
id: int
features: list[float] = field(default_factory=list)
label: str | None = None
def norm(self) -> float:
return sum(x * x for x in self.features) ** 0.5
def scaled(self, factor: float) -> "Sample":
return Sample(self.id, [x * factor for x in self.features], self.label)
def build(rows: Iterable[tuple[int, list[float], str]]) -> list[Sample]:
return [Sample(i, f, y) for i, f, y in rows]
batch = build([(1, [1.0, 2.0], "A"), (2, [-3.0, 4.0], "B")])
for s in batch:
print(s.id, round(s.norm(), 3), s.label)
Example 3: Generators, itertools and lazy pipelines
# Example 3: Generators, itertools and lazy pipelines -- Principles of Object-oriented Design for Data
from itertools import islice, accumulate
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
def running_stats(seq):
total, n = 0, 0
for x in seq:
total += x
n += 1
yield x, total / n, total
first10 = list(islice(fibonacci(), 10))
print("fib(0..9) :", first10)
for x, avg, cum in islice(running_stats(first10), 10):
print(f" x={x:>3} mean={avg:>6.2f} cumulative={cum:>4}")
partial_sums = list(accumulate(first10))
print("partial sums:", partial_sums)
Example 4: Context manager with timing and error handling
# Example 4: Context manager with timing and error handling -- Principles of Object-oriented Design for Data
from contextlib import contextmanager
import time, traceback
@contextmanager
def timed(name: str):
t0 = time.perf_counter()
try:
yield
except Exception as exc:
print(f"[{name}] failed: {exc!r}")
traceback.print_exc()
raise
finally:
dt_ms = (time.perf_counter() - t0) * 1_000
print(f"[{name}] took {dt_ms:.2f} ms")
with timed("hash 1M ints"):
total = sum(hash(i) for i in range(1_000_000))
print("result:", total % 9_973)
Example 5: Decorator for memoised pure functions
# Example 5: Decorator for memoised pure functions -- Principles of Object-oriented Design for Data
from functools import wraps
def memoise(fn):
cache: dict = {}
@wraps(fn)
def inner(*args):
if args not in cache:
cache[args] = fn(*args)
return cache[args]
inner.cache = cache
return inner
@memoise
def fib(n: int) -> int:
return n if n < 2 else fib(n - 1) + fib(n - 2)
print([fib(i) for i in range(15)])
print("cache entries:", len(fib.cache))