Advanced Optimization Algorithms and Learning Rate Schedules

Advanced Optimization Algorithms and Learning Rate Schedules belongs to the mathematical machinery that makes modern data science possible. Linear algebra, calculus and optimisation are the languages in which machine-learning models are written; this lesson translates the key ideas into practical intuition.

Why Advanced Optimization Algorithms Matters

Every gradient step, every matrix factorisation, every regulariser comes from this toolkit. Intuition here turns black-box libraries into glass-box tools you can debug and extend.

Reformulate the problem as an optimisation of a clear objective.
Reach for linear-algebra building blocks (dot products, projections, decompositions).
Track how gradients flow through your computations.
Recognise when convexity or sparsity buys you huge algorithmic wins.

How Advanced Optimization Algorithms Shows Up in Practice

In a typical project, advanced optimization algorithms and learning rate schedules is combined with the rest of the Mathematics & Optimisation toolkit. You rarely use any one technique in isolation; the real skill is knowing which combination fits the problem you are trying to solve, and being able to explain that choice to a non-technical stakeholder.

You will lean on this toolkit whenever you debug a model, derive a new loss function or reason about why a convex optimiser converges and a non-convex one does not.

Back to the Data Science curriculum →

Code Examples: Advanced Optimization Algorithms Learning Rate Schedules (5 runnable snippets)

Copy any block into a file or notebook and run it end-to-end — each example stands alone.

Example 1: Quasi-Newton optimisation with scipy

# Example 1: Quasi-Newton optimisation with scipy -- Advanced Optimization Algorithms Learning Rate Schedules
import numpy as np
from scipy.optimize import minimize

def f(x):
    return (x[0] - 3) ** 2 + (x[1] + 2) ** 2 + x[0] * x[1]

def grad(x):
    return np.array([2 * (x[0] - 3) + x[1],
                     2 * (x[1] + 2) + x[0]])

res = minimize(f, x0=[0.0, 0.0], jac=grad, method="BFGS")
print("optimum  :", np.round(res.x, 4))
print("f(x*)    :", round(res.fun, 4))
print("grad(x*) :", np.round(grad(res.x), 4))
print("iters    :", res.nit, "status:", res.success)

Example 2: Constrained optimisation with SLSQP

# Example 2: Constrained optimisation with SLSQP -- Advanced Optimization Algorithms Learning Rate Schedules
import numpy as np
from scipy.optimize import minimize

# Portfolio of 4 assets with expected returns and a covariance matrix
mu    = np.array([0.12, 0.10, 0.07, 0.03])
Sigma = np.array([
    [0.10, 0.02, 0.04, 0.00],
    [0.02, 0.08, 0.02, 0.01],
    [0.04, 0.02, 0.09, 0.00],
    [0.00, 0.01, 0.00, 0.02],
])

def neg_sharpe(w, rf=0.02):
    ret = w @ mu
    vol = np.sqrt(w @ Sigma @ w)
    return -(ret - rf) / vol

w0          = np.ones(4) / 4
bounds      = [(0.0, 1.0)] * 4
constraints = [{"type": "eq", "fun": lambda w: w.sum() - 1.0}]

res = minimize(neg_sharpe, w0, method="SLSQP",
               bounds=bounds, constraints=constraints)
print("optimal weights :", np.round(res.x, 3))
print("expected return :", round(res.x @ mu, 4))
print("sharpe ratio    :", round(-res.fun, 3))

Example 3: Newton-Raphson root finding

# Example 3: Newton-Raphson root finding -- Advanced Optimization Algorithms Learning Rate Schedules
import numpy as np

def newton(f, fp, x0, tol=1e-10, max_iter=50):
    x = float(x0)
    for i in range(max_iter):
        fx = f(x)
        if abs(fx) < tol:
            return x, i
        x = x - fx / fp(x)
    raise RuntimeError("did not converge")

# Find the root of f(x) = x**3 - 2x - 5 near x=2
root, iters = newton(lambda x: x**3 - 2*x - 5,
                     lambda x: 3*x**2 - 2,
                     x0=2.0)
print(f"root  : {root:.12f}")
print(f"iters : {iters}")
print(f"check : {root**3 - 2*root - 5:.2e}")

Example 4: SVD-based PCA from scratch

# Example 4: SVD-based PCA from scratch -- Advanced Optimization Algorithms Learning Rate Schedules
import numpy as np

rng = np.random.default_rng(0)
X   = rng.standard_normal((200, 5))
X[:, 1] = 0.8 * X[:, 0] + 0.2 * X[:, 1]   # correlated columns

Xc          = X - X.mean(axis=0)
U, s, Vt    = np.linalg.svd(Xc, full_matrices=False)
var_ratio   = (s ** 2) / (s ** 2).sum()
X_projected = Xc @ Vt[:2].T               # first two PCs

print("singular values :", np.round(s, 3))
print("variance ratio  :", np.round(var_ratio, 3))
print("projected shape :", X_projected.shape)

Example 5: Batch gradient descent for linear regression

# Example 5: Batch gradient descent for linear regression -- Advanced Optimization Algorithms Learning Rate Schedules
import numpy as np

rng    = np.random.default_rng(0)
n      = 500
X      = rng.standard_normal((n, 3))
true_w = np.array([2.0, -1.0, 0.5])
y      = X @ true_w + rng.normal(0, 0.3, n)

w, lr = np.zeros(3), 0.05
for epoch in range(200):
    grad = (2 / n) * X.T @ (X @ w - y)
    w   -= lr * grad
    if epoch % 40 == 0:
        mse = ((X @ w - y) ** 2).mean()
        print(f"epoch {epoch:3d}: w = {np.round(w, 3)}, mse = {mse:.4f}")

print("recovered weights:", np.round(w, 3))