Advanced Optimization Algorithms and Learning Rate Schedules
Advanced Optimization Algorithms and Learning Rate Schedules belongs to the mathematical machinery that makes modern data science possible. Linear algebra, calculus and optimisation are the languages in which machine-learning models are written; this lesson translates the key ideas into practical intuition.
Why Advanced Optimization Algorithms Matters
Every gradient step, every matrix factorisation, every regulariser comes from this toolkit. Intuition here turns black-box libraries into glass-box tools you can debug and extend.
- Reformulate the problem as an optimisation of a clear objective.
- Reach for linear-algebra building blocks (dot products, projections, decompositions).
- Track how gradients flow through your computations.
- Recognise when convexity or sparsity buys you huge algorithmic wins.
How Advanced Optimization Algorithms Shows Up in Practice
In a typical project, advanced optimization algorithms and learning rate schedules is combined with the rest of the Mathematics & Optimisation toolkit. You rarely use any one technique in isolation; the real skill is knowing which combination fits the problem you are trying to solve, and being able to explain that choice to a non-technical stakeholder.
You will lean on this toolkit whenever you debug a model, derive a new loss function or reason about why a convex optimiser converges and a non-convex one does not.
- Discerning Signal from Noise in High-dimensional
- Formulation Optimization Key Performance Indicators KPIS
- Linear Algebraic Structures Data Analysis Vector
- Multivariate Calculus for Gradient-based Optimization
Back to the Data Science curriculum →
Code Examples: Advanced Optimization Algorithms Learning Rate Schedules (5 runnable snippets)
Copy any block into a file or notebook and run it end-to-end — each example stands alone.
Example 1: Quasi-Newton optimisation with scipy
# Example 1: Quasi-Newton optimisation with scipy -- Advanced Optimization Algorithms Learning Rate Schedules
import numpy as np
from scipy.optimize import minimize
def f(x):
return (x[0] - 3) ** 2 + (x[1] + 2) ** 2 + x[0] * x[1]
def grad(x):
return np.array([2 * (x[0] - 3) + x[1],
2 * (x[1] + 2) + x[0]])
res = minimize(f, x0=[0.0, 0.0], jac=grad, method="BFGS")
print("optimum :", np.round(res.x, 4))
print("f(x*) :", round(res.fun, 4))
print("grad(x*) :", np.round(grad(res.x), 4))
print("iters :", res.nit, "status:", res.success)
Example 2: Constrained optimisation with SLSQP
# Example 2: Constrained optimisation with SLSQP -- Advanced Optimization Algorithms Learning Rate Schedules
import numpy as np
from scipy.optimize import minimize
# Portfolio of 4 assets with expected returns and a covariance matrix
mu = np.array([0.12, 0.10, 0.07, 0.03])
Sigma = np.array([
[0.10, 0.02, 0.04, 0.00],
[0.02, 0.08, 0.02, 0.01],
[0.04, 0.02, 0.09, 0.00],
[0.00, 0.01, 0.00, 0.02],
])
def neg_sharpe(w, rf=0.02):
ret = w @ mu
vol = np.sqrt(w @ Sigma @ w)
return -(ret - rf) / vol
w0 = np.ones(4) / 4
bounds = [(0.0, 1.0)] * 4
constraints = [{"type": "eq", "fun": lambda w: w.sum() - 1.0}]
res = minimize(neg_sharpe, w0, method="SLSQP",
bounds=bounds, constraints=constraints)
print("optimal weights :", np.round(res.x, 3))
print("expected return :", round(res.x @ mu, 4))
print("sharpe ratio :", round(-res.fun, 3))
Example 3: Newton-Raphson root finding
# Example 3: Newton-Raphson root finding -- Advanced Optimization Algorithms Learning Rate Schedules
import numpy as np
def newton(f, fp, x0, tol=1e-10, max_iter=50):
x = float(x0)
for i in range(max_iter):
fx = f(x)
if abs(fx) < tol:
return x, i
x = x - fx / fp(x)
raise RuntimeError("did not converge")
# Find the root of f(x) = x**3 - 2x - 5 near x=2
root, iters = newton(lambda x: x**3 - 2*x - 5,
lambda x: 3*x**2 - 2,
x0=2.0)
print(f"root : {root:.12f}")
print(f"iters : {iters}")
print(f"check : {root**3 - 2*root - 5:.2e}")
Example 4: SVD-based PCA from scratch
# Example 4: SVD-based PCA from scratch -- Advanced Optimization Algorithms Learning Rate Schedules
import numpy as np
rng = np.random.default_rng(0)
X = rng.standard_normal((200, 5))
X[:, 1] = 0.8 * X[:, 0] + 0.2 * X[:, 1] # correlated columns
Xc = X - X.mean(axis=0)
U, s, Vt = np.linalg.svd(Xc, full_matrices=False)
var_ratio = (s ** 2) / (s ** 2).sum()
X_projected = Xc @ Vt[:2].T # first two PCs
print("singular values :", np.round(s, 3))
print("variance ratio :", np.round(var_ratio, 3))
print("projected shape :", X_projected.shape)
Example 5: Batch gradient descent for linear regression
# Example 5: Batch gradient descent for linear regression -- Advanced Optimization Algorithms Learning Rate Schedules
import numpy as np
rng = np.random.default_rng(0)
n = 500
X = rng.standard_normal((n, 3))
true_w = np.array([2.0, -1.0, 0.5])
y = X @ true_w + rng.normal(0, 0.3, n)
w, lr = np.zeros(3), 0.05
for epoch in range(200):
grad = (2 / n) * X.T @ (X @ w - y)
w -= lr * grad
if epoch % 40 == 0:
mse = ((X @ w - y) ** 2).mean()
print(f"epoch {epoch:3d}: w = {np.round(w, 3)}, mse = {mse:.4f}")
print("recovered weights:", np.round(w, 3))