Multivariate Calculus for Gradient-based Optimization

Multivariate Calculus for Gradient-based Optimization belongs to the mathematical machinery that makes modern data science possible. Linear algebra, calculus and optimisation are the languages in which machine-learning models are written; this lesson translates the key ideas into practical intuition.

Why Multivariate Calculus Gradient-based Matters

Every gradient step, every matrix factorisation, every regulariser comes from this toolkit. Intuition here turns black-box libraries into glass-box tools you can debug and extend.

Reformulate the problem as an optimisation of a clear objective.
Reach for linear-algebra building blocks (dot products, projections, decompositions).
Track how gradients flow through your computations.
Recognise when convexity or sparsity buys you huge algorithmic wins.

How Multivariate Calculus Gradient-based Shows Up in Practice

In a typical project, multivariate calculus for gradient-based optimization is combined with the rest of the Mathematics & Optimisation toolkit. You rarely use any one technique in isolation; the real skill is knowing which combination fits the problem you are trying to solve, and being able to explain that choice to a non-technical stakeholder.

You will lean on this toolkit whenever you debug a model, derive a new loss function or reason about why a convex optimiser converges and a non-convex one does not.

Back to the Data Science curriculum →

Code Examples: Multivariate Calculus for Gradient-based Optimization (5 runnable snippets)

Copy any block into a file or notebook and run it end-to-end — each example stands alone.

Example 1: Quasi-Newton optimisation with scipy

# Example 1: Quasi-Newton optimisation with scipy -- Multivariate Calculus for Gradient-based Optimization
import numpy as np
from scipy.optimize import minimize

def f(x):
    return (x[0] - 3) ** 2 + (x[1] + 2) ** 2 + x[0] * x[1]

def grad(x):
    return np.array([2 * (x[0] - 3) + x[1],
                     2 * (x[1] + 2) + x[0]])

res = minimize(f, x0=[0.0, 0.0], jac=grad, method="BFGS")
print("optimum  :", np.round(res.x, 4))
print("f(x*)    :", round(res.fun, 4))
print("grad(x*) :", np.round(grad(res.x), 4))
print("iters    :", res.nit, "status:", res.success)

Example 2: Constrained optimisation with SLSQP

# Example 2: Constrained optimisation with SLSQP -- Multivariate Calculus for Gradient-based Optimization
import numpy as np
from scipy.optimize import minimize

# Portfolio of 4 assets with expected returns and a covariance matrix
mu    = np.array([0.12, 0.10, 0.07, 0.03])
Sigma = np.array([
    [0.10, 0.02, 0.04, 0.00],
    [0.02, 0.08, 0.02, 0.01],
    [0.04, 0.02, 0.09, 0.00],
    [0.00, 0.01, 0.00, 0.02],
])

def neg_sharpe(w, rf=0.02):
    ret = w @ mu
    vol = np.sqrt(w @ Sigma @ w)
    return -(ret - rf) / vol

w0          = np.ones(4) / 4
bounds      = [(0.0, 1.0)] * 4
constraints = [{"type": "eq", "fun": lambda w: w.sum() - 1.0}]

res = minimize(neg_sharpe, w0, method="SLSQP",
               bounds=bounds, constraints=constraints)
print("optimal weights :", np.round(res.x, 3))
print("expected return :", round(res.x @ mu, 4))
print("sharpe ratio    :", round(-res.fun, 3))

Example 3: Newton-Raphson root finding

# Example 3: Newton-Raphson root finding -- Multivariate Calculus for Gradient-based Optimization
import numpy as np

def newton(f, fp, x0, tol=1e-10, max_iter=50):
    x = float(x0)
    for i in range(max_iter):
        fx = f(x)
        if abs(fx) < tol:
            return x, i
        x = x - fx / fp(x)
    raise RuntimeError("did not converge")

# Find the root of f(x) = x**3 - 2x - 5 near x=2
root, iters = newton(lambda x: x**3 - 2*x - 5,
                     lambda x: 3*x**2 - 2,
                     x0=2.0)
print(f"root  : {root:.12f}")
print(f"iters : {iters}")
print(f"check : {root**3 - 2*root - 5:.2e}")

Example 4: SVD-based PCA from scratch

# Example 4: SVD-based PCA from scratch -- Multivariate Calculus for Gradient-based Optimization
import numpy as np

rng = np.random.default_rng(0)
X   = rng.standard_normal((200, 5))
X[:, 1] = 0.8 * X[:, 0] + 0.2 * X[:, 1]   # correlated columns

Xc          = X - X.mean(axis=0)
U, s, Vt    = np.linalg.svd(Xc, full_matrices=False)
var_ratio   = (s ** 2) / (s ** 2).sum()
X_projected = Xc @ Vt[:2].T               # first two PCs

print("singular values :", np.round(s, 3))
print("variance ratio  :", np.round(var_ratio, 3))
print("projected shape :", X_projected.shape)

Example 5: Batch gradient descent for linear regression

# Example 5: Batch gradient descent for linear regression -- Multivariate Calculus for Gradient-based Optimization
import numpy as np

rng    = np.random.default_rng(0)
n      = 500
X      = rng.standard_normal((n, 3))
true_w = np.array([2.0, -1.0, 0.5])
y      = X @ true_w + rng.normal(0, 0.3, n)

w, lr = np.zeros(3), 0.05
for epoch in range(200):
    grad = (2 / n) * X.T @ (X @ w - y)
    w   -= lr * grad
    if epoch % 40 == 0:
        mse = ((X @ w - y) ** 2).mean()
        print(f"epoch {epoch:3d}: w = {np.round(w, 3)}, mse = {mse:.4f}")

print("recovered weights:", np.round(w, 3))