Cloud Data Warehousing Solutions Snowflake Bigquery Redshift
Cloud Data Warehousing Solutions Snowflake Bigquery Redshift is about doing data science at serious scale without drowning in plumbing. Cloud platforms and managed ML services let small teams ship capabilities that only the largest companies could build in-house a decade ago.
Why Cloud Data Warehousing Matters
Managed platforms let teams trade toil for velocity. Skill with one of the major clouds is effectively mandatory for senior data and ML roles today.
- Use managed services until the boundary becomes painful.
- Cost-model every long-running job before you scale it.
- Isolate environments — dev, staging and prod are not negotiable.
- Design for multi-region failure from day one for critical workloads.
How Cloud Data Warehousing Shows Up in Practice
In a typical project, cloud data warehousing solutions snowflake bigquery redshift is combined with the rest of the Cloud & Platforms toolkit. You rarely use any one technique in isolation; the real skill is knowing which combination fits the problem you are trying to solve, and being able to explain that choice to a non-technical stakeholder.
The default deployment target for modern data and ML systems, from start-ups to regulated enterprises.
Back to the Data Science curriculum →
Code Examples: Cloud Data Warehousing Solutions Snowflake Bigquery (5 runnable snippets)
Copy any block into a file or notebook and run it end-to-end — each example stands alone.
Example 1: S3 upload with retries and listing
# Example 1: S3 upload with retries and listing -- Cloud Data Warehousing Solutions Snowflake Bigquery
import boto3
from botocore.config import Config
s3 = boto3.client(
"s3",
config=Config(retries={"max_attempts": 5, "mode": "standard"}),
)
bucket = "my-datalake-staging"
prefix = "exports/2026/"
s3.upload_file(
"features.parquet", bucket, prefix + "features.parquet",
ExtraArgs={"ServerSideEncryption": "AES256"},
)
total = 0
for page in s3.get_paginator("list_objects_v2").paginate(Bucket=bucket, Prefix=prefix):
for obj in page.get("Contents", []):
total += obj["Size"]
print(obj["Key"], obj["Size"])
print(f"total bytes in {prefix}: {total:,}")
Example 2: BigQuery aggregation with cost awareness
# Example 2: BigQuery aggregation with cost awareness -- Cloud Data Warehousing Solutions Snowflake Bigquery
from google.cloud import bigquery
client = bigquery.Client(project="my-analytics-project")
query = """
SELECT
DATE(event_time) AS day,
COUNTIF(event = 'signup') AS signups,
COUNTIF(event = 'purchase') AS purchases,
SAFE_DIVIDE(COUNTIF(event = 'purchase'),
COUNTIF(event = 'signup')) AS conversion
FROM `my_project.analytics.events`
WHERE event_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY day
ORDER BY day;
"""
job = client.query(query)
df = job.to_dataframe()
print(df.head(10))
print(f"bytes scanned: {job.total_bytes_processed/1e9:.2f} GB")
Example 3: Azure blob download with managed identity
# Example 3: Azure blob download with managed identity -- Cloud Data Warehousing Solutions Snowflake Bigquery
from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient
account_url = "https://mydatalake.blob.core.windows.net"
credential = DefaultAzureCredential()
service = BlobServiceClient(account_url, credential=credential)
container = service.get_container_client("raw-events")
for blob in container.list_blobs(name_starts_with="2026/04/"):
print(blob.name, blob.size)
client = container.get_blob_client(blob.name)
with open(f"/tmp/{blob.name.rsplit('/', 1)[-1]}", "wb") as f:
f.write(client.download_blob().readall())
Example 4: Kubernetes job manifest for a batch-scoring run
apiVersion: batch/v1
kind: Job
metadata:
name: nightly-scoring
labels:
app: risk-scorer
spec:
backoffLimit: 2
ttlSecondsAfterFinished: 3600
template:
spec:
restartPolicy: OnFailure
serviceAccountName: risk-scorer-sa
containers:
- name: scorer
image: ghcr.io/example/risk-scorer:1.14.0
args: ["--date", "$(date +%F)", "--output", "s3://ml-outputs/"]
resources:
requests: { cpu: "1", memory: "2Gi" }
limits: { cpu: "4", memory: "8Gi" }
env:
- name: MODEL_URI
value: "s3://ml-registry/risk/v3.2.1/model.joblib"
- name: LOG_LEVEL
value: "INFO"
Example 5: Terraform module for a managed Postgres database
terraform {
required_version = ">= 1.6"
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.50" }
}
}
resource "aws_db_subnet_group" "analytics" {
name = "analytics-db-subnets"
subnet_ids = var.private_subnet_ids
}
resource "aws_db_instance" "analytics" {
identifier = "analytics-warehouse"
engine = "postgres"
engine_version = "16.2"
instance_class = "db.m6g.large"
allocated_storage = 100
storage_type = "gp3"
storage_encrypted = true
db_name = "analytics"
username = var.db_username
password = var.db_password
db_subnet_group_name = aws_db_subnet_group.analytics.name
vpc_security_group_ids = [aws_security_group.db.id]
backup_retention_period = 14
deletion_protection = true
performance_insights_enabled = true
tags = { Environment = "prod", Team = "data" }
}
output "db_endpoint" { value = aws_db_instance.analytics.endpoint }