Multivariate Calculus for Gradient-based Optimization
Multivariate Calculus for Gradient-based Optimization belongs to the mathematical machinery that makes modern data science possible. Linear algebra, calculus and optimisation are the languages in which machine-learning models are written; this lesson translates the key ideas into practical intuition.
Why Multivariate Calculus Gradient-based Matters
Every gradient step, every matrix factorisation, every regulariser comes from this toolkit. Intuition here turns black-box libraries into glass-box tools you can debug and extend.
- Reformulate the problem as an optimisation of a clear objective.
- Reach for linear-algebra building blocks (dot products, projections, decompositions).
- Track how gradients flow through your computations.
- Recognise when convexity or sparsity buys you huge algorithmic wins.
How Multivariate Calculus Gradient-based Shows Up in Practice
In a typical project, multivariate calculus for gradient-based optimization is combined with the rest of the Mathematics & Optimisation toolkit. You rarely use any one technique in isolation; the real skill is knowing which combination fits the problem you are trying to solve, and being able to explain that choice to a non-technical stakeholder.
You will lean on this toolkit whenever you debug a model, derive a new loss function or reason about why a convex optimiser converges and a non-convex one does not.
- Discerning Signal from Noise in High-dimensional
- Formulation Optimization Key Performance Indicators KPIS
- Linear Algebraic Structures Data Analysis Vector
- Principles of Convex and Non-convex Optimization
Back to the Data Science curriculum →
Code Examples: Multivariate Calculus for Gradient-based Optimization (5 runnable snippets)
Copy any block into a file or notebook and run it end-to-end — each example stands alone.
Example 1: Quasi-Newton optimisation with scipy
# Example 1: Quasi-Newton optimisation with scipy -- Multivariate Calculus for Gradient-based Optimization
import numpy as np
from scipy.optimize import minimize
def f(x):
return (x[0] - 3) ** 2 + (x[1] + 2) ** 2 + x[0] * x[1]
def grad(x):
return np.array([2 * (x[0] - 3) + x[1],
2 * (x[1] + 2) + x[0]])
res = minimize(f, x0=[0.0, 0.0], jac=grad, method="BFGS")
print("optimum :", np.round(res.x, 4))
print("f(x*) :", round(res.fun, 4))
print("grad(x*) :", np.round(grad(res.x), 4))
print("iters :", res.nit, "status:", res.success)
Example 2: Constrained optimisation with SLSQP
# Example 2: Constrained optimisation with SLSQP -- Multivariate Calculus for Gradient-based Optimization
import numpy as np
from scipy.optimize import minimize
# Portfolio of 4 assets with expected returns and a covariance matrix
mu = np.array([0.12, 0.10, 0.07, 0.03])
Sigma = np.array([
[0.10, 0.02, 0.04, 0.00],
[0.02, 0.08, 0.02, 0.01],
[0.04, 0.02, 0.09, 0.00],
[0.00, 0.01, 0.00, 0.02],
])
def neg_sharpe(w, rf=0.02):
ret = w @ mu
vol = np.sqrt(w @ Sigma @ w)
return -(ret - rf) / vol
w0 = np.ones(4) / 4
bounds = [(0.0, 1.0)] * 4
constraints = [{"type": "eq", "fun": lambda w: w.sum() - 1.0}]
res = minimize(neg_sharpe, w0, method="SLSQP",
bounds=bounds, constraints=constraints)
print("optimal weights :", np.round(res.x, 3))
print("expected return :", round(res.x @ mu, 4))
print("sharpe ratio :", round(-res.fun, 3))
Example 3: Newton-Raphson root finding
# Example 3: Newton-Raphson root finding -- Multivariate Calculus for Gradient-based Optimization
import numpy as np
def newton(f, fp, x0, tol=1e-10, max_iter=50):
x = float(x0)
for i in range(max_iter):
fx = f(x)
if abs(fx) < tol:
return x, i
x = x - fx / fp(x)
raise RuntimeError("did not converge")
# Find the root of f(x) = x**3 - 2x - 5 near x=2
root, iters = newton(lambda x: x**3 - 2*x - 5,
lambda x: 3*x**2 - 2,
x0=2.0)
print(f"root : {root:.12f}")
print(f"iters : {iters}")
print(f"check : {root**3 - 2*root - 5:.2e}")
Example 4: SVD-based PCA from scratch
# Example 4: SVD-based PCA from scratch -- Multivariate Calculus for Gradient-based Optimization
import numpy as np
rng = np.random.default_rng(0)
X = rng.standard_normal((200, 5))
X[:, 1] = 0.8 * X[:, 0] + 0.2 * X[:, 1] # correlated columns
Xc = X - X.mean(axis=0)
U, s, Vt = np.linalg.svd(Xc, full_matrices=False)
var_ratio = (s ** 2) / (s ** 2).sum()
X_projected = Xc @ Vt[:2].T # first two PCs
print("singular values :", np.round(s, 3))
print("variance ratio :", np.round(var_ratio, 3))
print("projected shape :", X_projected.shape)
Example 5: Batch gradient descent for linear regression
# Example 5: Batch gradient descent for linear regression -- Multivariate Calculus for Gradient-based Optimization
import numpy as np
rng = np.random.default_rng(0)
n = 500
X = rng.standard_normal((n, 3))
true_w = np.array([2.0, -1.0, 0.5])
y = X @ true_w + rng.normal(0, 0.3, n)
w, lr = np.zeros(3), 0.05
for epoch in range(200):
grad = (2 / n) * X.T @ (X @ w - y)
w -= lr * grad
if epoch % 40 == 0:
mse = ((X @ w - y) ** 2).mean()
print(f"epoch {epoch:3d}: w = {np.round(w, 3)}, mse = {mse:.4f}")
print("recovered weights:", np.round(w, 3))