Ensemble Methods Iii Model Stacking and Blending for Performance Optimization

Ensemble Methods Iii Model Stacking and Blending for Performance Optimization is a core technique in the machine-learning toolkit. This lesson walks through the intuition behind the method, the math that underpins it, and the practical decisions — features, hyperparameters, evaluation — that separate a naive model from a production-grade one.

Why Ensemble Methods Iii Matters

Machine learning is a general-purpose technology: the same core techniques power recommendation engines, fraud detection, medical diagnostics and scientific research. Mastering the fundamentals unlocks them all.

Define a single north-star evaluation metric up front.
Build a trivial baseline before reaching for anything fancy.
Use cross-validation that respects your data's real structure.
Tune hyperparameters on held-out data, never on the test set.

How Ensemble Methods Iii Shows Up in Practice

In a typical project, ensemble methods iii model stacking and blending for performance optimization is combined with the rest of the Machine Learning toolkit. You rarely use any one technique in isolation; the real skill is knowing which combination fits the problem you are trying to solve, and being able to explain that choice to a non-technical stakeholder.

Relevant for churn prediction, demand forecasting, fraud detection, anomaly monitoring, ranking, personalisation and scientific modelling.

Back to the Data Science curriculum →

Code Examples: Ensemble Methods Iii Model Stacking Blending (5 runnable snippets)

Copy any block into a file or notebook and run it end-to-end — each example stands alone.

Example 1: Grid search over an SVM pipeline

# Example 1: Grid search over an SVM pipeline -- Ensemble Methods Iii Model Stacking Blending
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

X, y = load_breast_cancer(return_X_y=True)

pipe = Pipeline([("sc", StandardScaler()), ("svc", SVC())])
grid = {
    "svc__C":      [0.1, 1, 10, 100],
    "svc__gamma":  ["scale", 0.01, 0.001],
    "svc__kernel": ["rbf"],
}

search = GridSearchCV(pipe, grid, cv=5, scoring="f1", n_jobs=-1)
search.fit(X, y)

print("best f1     :", round(search.best_score_, 3))
print("best params :", search.best_params_)

Example 2: Gradient-boosted trees with early stopping

# Example 2: Gradient-boosted trees with early stopping -- Ensemble Methods Iii Model Stacking Blending
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import classification_report, roc_auc_score

X, y = fetch_openml("credit-g", version=1, as_frame=True, return_X_y=True)
y    = (y == "good").astype(int)
X    = X.apply(lambda c: c.astype("category").cat.codes if c.dtype == "O" else c)

Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2,
                                      stratify=y, random_state=0)

model = HistGradientBoostingClassifier(
    learning_rate=0.05, max_iter=400,
    early_stopping=True, validation_fraction=0.15,
    random_state=0,
)
model.fit(Xtr, ytr)

proba = model.predict_proba(Xte)[:, 1]
print("AUC:", round(roc_auc_score(yte, proba), 3))
print(classification_report(yte, model.predict(Xte), digits=3))

Example 3: K-Means clustering with silhouette score

# Example 3: K-Means clustering with silhouette score -- Ensemble Methods Iii Model Stacking Blending
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

X, y_true = make_blobs(n_samples=1_500, centers=5, cluster_std=0.9,
                       random_state=0)

for k in range(2, 8):
    km    = KMeans(n_clusters=k, n_init=10, random_state=0).fit(X)
    score = silhouette_score(X, km.labels_)
    print(f"k={k}  inertia={km.inertia_:>8.1f}  silhouette={score:.3f}")

Example 4: End-to-end pipeline with cross-validated ROC-AUC

# Example 4: End-to-end pipeline with cross-validated ROC-AUC -- Ensemble Methods Iii Model Stacking Blending
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

X, y = make_classification(
    n_samples=2_000, n_features=20, n_informative=10,
    n_redundant=5, weights=[0.7, 0.3], random_state=0,
)

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("clf",    LogisticRegression(max_iter=1000, C=1.0)),
])

cv     = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
scores = cross_val_score(pipe, X, y, cv=cv, scoring="roc_auc", n_jobs=-1)

print(f"ROC-AUC : {scores.mean():.3f} +/- {scores.std():.3f}")
print(f"folds   : {scores.round(3).tolist()}")

Example 5: Random forest regression + feature importances

# Example 5: Random forest regression + feature importances -- Ensemble Methods Iii Model Stacking Blending
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error

data           = fetch_california_housing(as_frame=True)
X, y           = data.data, data.target
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2, random_state=0)

rf = RandomForestRegressor(n_estimators=300, n_jobs=-1, random_state=0)
rf.fit(Xtr, ytr)
yhat = rf.predict(Xte)

print(f"R^2 : {r2_score(yte, yhat):.3f}")
print(f"MAE : {mean_absolute_error(yte, yhat):.3f}")

order = np.argsort(rf.feature_importances_)[::-1]
for i in order[:5]:
    print(f"  {X.columns[i]:<12}  {rf.feature_importances_[i]:.3f}")