Model Predictive Control.

A reference for the working researcher. Every section opens with a one-sentence definition and a runnable picture. Each closes with optional deeper material in the form of expandable derivations and code. The running example is a four-tank process.

finite horizon OCP receding horizon surrogate dynamics robust MPC
Scroll · §01 begins below
§ 01 The idea FIG. 01

What model predictive control is.

Def · MPC

Model predictive control is a feedback policy that, at every sampling time, uses a dynamics model to predict the system over a finite future window, solves an optimization problem to choose a sequence of inputs, applies only the first input, and then repeats with a fresh measurement.

MPC turns control into a question we know how to answer: optimization. The cost of a candidate input sequence is computed against a forecast, the optimizer picks the sequence with lowest cost, only the first action is applied, and the procedure restarts at the next time step.

Three ingredients define every MPC controller:

  • A model that predicts the state evolution given a candidate control sequence. The model can be a first-principles ODE, a linear state-space realisation, a neural network, or a Gaussian process.
  • A cost that scores predicted trajectories against a goal, usually a quadratic penalty on tracking error plus a penalty on input effort.
  • A repeated solve. The optimization is solved from scratch at each step using the latest state estimate. This is the receding horizon.
Static · one shot

Choose once, commit

Pick the action that minimises a fixed objective. No measurement is taken after the decision. Works when the model is exact and the environment is benign.

u* = argmin f(u)
Sequential · feedback

Decide, observe, redecide

The optimal action at time k depends on what will be possible at k+1, which depends on a state that has not yet been measured. The loop closes by measurement.

uₖ* = argmin Σ J(xₖ, uₖ)

Where MPC sits among controllers.

The four methods below all map a measured state to a control input. They differ in how much structure they impose on the problem and how that structure is exploited online.

Method Model Constraints Online cost Typical use
PID none, tuned gains handled via anti-windup heuristics µs single-loop regulation
LQR linear, fixed not represented µs (precomputed gain) linear regulation around an operating point
MPC nonlinear, possibly learned explicit, hard and soft ms per step multivariable, constrained, anticipative
RL policy implicit (in the policy network) via reward shaping, not native µs (forward pass) large-scale or hard-to-model dynamics
Key insight

The argument for MPC is structural. Constraints, anticipation of future setpoints, and multivariable coupling enter the formulation directly rather than through heuristic tuning. The price is an optimization at every sample.

DEEPER · 1A A short history [ ▼ ]

The earliest practical MPC was developed inside Shell in the late 1970s under the name Dynamic Matrix Control. By the mid 1980s, model-based predictive controllers were standard practice in refineries because they could enforce operating constraints that PID layers could only approximate. The theoretical foundation, in particular stability via terminal cost and terminal constraint, matured through the 1990s in the work of Mayne, Rawlings, Rakovic, and others.

Two shifts changed the field after 2010. Fast convex solvers and code generation tools (CVXGEN, qpOASES, HPIPM, acados) brought MPC into the millisecond regime, opening up automotive and robotic applications. Learning-based components (Gaussian processes, neural surrogates, policies that imitate the optimizer) closed the remaining gap to systems whose first-principles models are unavailable or too slow to integrate online.

DEEPER · 1B When is MPC the right tool? [ ▼ ]

Use MPC when at least two of the following are true: the system has multiple coupled inputs and outputs, operating constraints are tight and cannot be slack, future setpoints or disturbances are known in advance, or the plant is genuinely nonlinear in the operating range.

Avoid MPC when the sampling time is below what the solver can return reliably (typically below a few hundred microseconds for nonlinear NLPs), when the model is unreliable and no honest uncertainty representation is available, or when the cost of an occasional infeasible solve cannot be tolerated and no fallback policy is in place.

§ 02 The OCP EQ. 01

The optimal control problem.

Def · OCP

At every sampling step k, MPC solves a finite-horizon constrained optimization over a sequence of inputs u₀, …, uN−1. The cost is a sum of stage costs plus a terminal cost; the dynamics enter as equality constraints; bounds enter as inequalities.

The problem below is the discrete-time form used by almost every MPC implementation. Continuous-time dynamics are first discretised, then this same template applies.

State
x ∈ ℝnx
Control
u ∈ ℝnu
Horizon
N steps
Dynamics
xk+1 = f(xk, uk)
Constraints
g(x,u) ≤ 0
EQ. 01 Discrete-time MPC problem [ ▼ ]

The standard formulation, after Rawlings and Mayne[1]. m is the terminal (Mayer) cost; is the stage (Lagrange) cost.

minu₀,…,uN-1 m(xN) + Σk=0N-1 ℓ(xk, uk) subject to x0 = # current measurement xk+1 = f(xk, uk) # dynamics g(xk, uk) ≤ 0 # path constraints uk ∈ 𝒰 # input set xk ∈ 𝒳 # state set xN ∈ 𝒳f # terminal set Apply u0* to the plant. Discard u1*, …, uN-1*. At k+1, re-solve.

The tracking case sets (x, u) = ‖x − xref2Q + ‖u − uref2R with positive definite weighting matrices Q and R. Economic MPC drops the reference and lets be any user-defined performance metric (cost, throughput, energy).

DEEPER · 2A Continuous-time form and discretisation [ ▼ ]

Most physical systems are described in continuous time: ẋ(t) = fc(x(t), u(t)). The OCP in continuous time replaces the sum with an integral and the recurrence with an ODE constraint. Direct methods (single shooting, multiple shooting, collocation) reduce it to a finite-dimensional NLP. §07 covers the trade-offs between these.

# Continuous-time OCP min m(x(T)) + ∫0T ℓ(x(t), u(t)) dt s.t. ẋ(t) = fc(x(t), u(t)), x(0) = x̂ g(x(t), u(t)) ≤ 0 # Time-discretised (Δt = T / N) xk+1 = xk + ∫kΔt(k+1)Δt fc(x(τ), uk) dτ
DEEPER · 2B Terminal cost and stability [ ▼ ]

A finite-horizon optimum need not approach the infinite-horizon optimum. The standard fix is to choose m(x) so that it approximates the cost-to-go beyond the horizon, and to add a terminal set 𝒳f in which a local stabilizing control law is known to exist.

For linear systems with a quadratic cost, this is automatic: take m(x) = xᵀPx where P solves the discrete-time algebraic Riccati equation, and let 𝒳f be the largest invariant set under the unconstrained LQR gain. The resulting MPC is a stabilising policy on the feasible region.

For nonlinear systems, a sufficient condition is the existence of a control-Lyapunov function on 𝒳f. The Borrelli, Bemporad, and Morari textbook (2017)[2] gives the full account.

DEEPER · 2C Soft constraints (slacks) [ ▼ ]

Hard state constraints can render the OCP infeasible after a large disturbance. The standard remedy is to relax them with a non-negative slack variable s and to penalise s heavily in the cost.

# Hard: g(x, u) ≤ 0 # Softened: g(x, u) ≤ s, s ≥ 0, cost += ρ · ‖s‖1

The choice ρ ≫ 0 ensures that s = 0 whenever the hard constraint is feasible. L1 penalisation is preferred over L2 because it preserves the property that the constraint is exactly satisfied when possible.

DEEPER · 2D Tracking vs. economic objectives [ ▼ ]

Tracking MPC penalises deviation from a reference. Economic MPC optimises a performance index directly, with no reference. A power plant that minimises fuel cost or a batch reactor that maximises product yield is naturally economic. Stability theory is more delicate in the economic case because the optimal steady state is not known in advance and may not coincide with the minimiser of the stage cost.

§ 03 The receding horizon FIG. 02 · live

Plan five, do one, repeat.

Def · receding horizon

Receding horizon control means: solve the OCP from the current measurement, apply only u0*, discard the rest of the optimal sequence, take a new measurement, and re-solve. The horizon shifts forward by one sample at every step.

This is the closed-loop ingredient. The first action is always the one that meets reality; everything past u0* is plan, not commitment. Re-solving with a fresh measurement is how the controller absorbs model error and unmodelled disturbances.

Prediction horizon
Np = 5 (this fig.)
Control horizon
Nc = Np
Applied each step
u0* only
Solver budget
one NLP / sample
FIG. 02 Receding horizon, h₁ tracking the setpoint stepk = 0 J*·
MPC receding horizon chart
setpoint candidates best plan actual past u₁ applied now
k=0. The optimiser evaluates many input sequences. Only the vermillion segment is applied.
Why discard the plan

The plan is computed against a forecast that drifts from reality. Replanning at every sample from the measured state collapses that drift before it accumulates. The only sample that ever touches the plant is u0*.

Two horizons can in principle differ. The prediction horizon Np is how far ahead the model is integrated. The control horizon Nc ≤ Np is how many of those steps the optimiser is free to choose; beyond Nc, the input is held constant or computed by a known feedback law. Shorter Nc reduces solver work; equal horizons are most common.

DEEPER · 3A Closed-loop stability of the receding-horizon law [ ▼ ]

Let VN(x) be the optimal value of the N-step OCP from state x. Under the terminal ingredients of §02·2B, one can show that VN is a Lyapunov function for the closed-loop system: VN(f(x, u0*(x))) ≤ VN(x) − ℓ(x, u0*(x)). This decrease implies asymptotic stability of the desired equilibrium on the region where the OCP remains feasible.

Without a terminal cost or set, only suboptimality bounds are available. They are often satisfactory in practice but no longer give a clean stability certificate.

DEEPER · 3B Warm starting between samples [ ▼ ]

Two consecutive OCPs differ by one time step. The cheapest warm start is the shift initialisation: take the previous optimal sequence, drop u0*, and append a guess for uN-1 (often the steady-state input or the terminal feedback evaluated at xN*).

With Newton-type solvers, this gives most of the convergence in one or two iterations. Real-time iteration (Diehl et al., 2002)[4] takes this further: do exactly one Newton step per sample. Optimality is sacrificed for a hard real-time guarantee that the controller always produces an output on time.

DEEPER · 3C Choosing Np [ ▼ ]

Three rules of thumb. First, Np should be long enough to cover the slowest dominant time constant of the system; otherwise the controller cannot see the consequence of its actions. Second, doubling Np typically doubles solver cost, so the practical range is dictated by the available compute budget. Third, in the presence of a good terminal cost, surprisingly short horizons (Np = 5 to 10) often suffice even for systems with much longer settling times.

§ 04 The running example FIG. 03 · live

The four-tank process.

Def · plant

Two pumps fill four interconnected tanks. The lower tanks T1 and T3 are the controlled variables. The upper tanks T2 and T4 drain into them with state-dependent flow splits. The plant is nonlinear, multivariable, and (for certain split values) non-minimum phase.

We use the quadruple-tank rig of Johansson (2000)[3] as the running example. It is small enough to follow by hand and rich enough to make every later concept visible: coupling, nonlinearity, learned surrogates, uncertainty.

FIG. 03 4-tank rig, closed-loop step stepk = 0 / 10
4-tank system animation
h₁ · target 28 cm
20.0 cm
h₂ · uncontrolled
25.0 cm
h₃ · target 27 cm
18.0 cm
h₄ · uncontrolled
35.0 cm
Step forward to advance the closed-loop simulation.
Control u
[u₁, u₂] ∈ [0, 25]²
State x
[h₁, h₂, h₃, h₄]
Outputs y
[h₁, h₃]
Objective
‖y − yref‖²

The true dynamics follow four coupled ODEs with square-root outflow terms (Torricelli's law). Each pump drives one upper tank directly and routes a state-dependent fraction γ to the diagonal lower tank. The fractions γ1, γ2 are slowly time-varying, which makes the system non-stationary by construction.

# Continuous-time dynamics. What the simulator integrates. def four_tank(x, u, t): h1, h2, h3, h4 = x u1, u2 = u v1, v2 = 20*u1, 20*u2 # pump flows gamma1 = (1 + np.sin(t/15)) * 0.1 * h1/45 gamma2 = (1 + np.cos(t/15)) * 0.1 * h2/45 return np.array([ (1 - gamma2) * c * v2 - c2 * np.sqrt(h1), (1 - gamma1) * c * v1 - c2 * np.sqrt(h2), gamma1 * c * v1 + c2 * (np.sqrt(h1) - np.sqrt(h3)), gamma2 * c * v2 + c2 * (np.sqrt(h2) - np.sqrt(h4)), ]) # c = 0.003488, c2 = 0.15440
Why a learned surrogate

Integrating this ODE inside an optimiser that evaluates hundreds of candidate sequences per sample is too slow for real-time control. The fix in §05 is to replace f with a cheap regression model M fitted to data from this ODE.

DEEPER · 4A Mass balance and Torricelli's law [ ▼ ]

Each tank has cross-section A. Conservation of mass on tank i gives A · dhi/dt = qin,i − qout,i. Outflow through a small hole at the bottom of cross-section ai obeys Torricelli's law, qout,i = ai √(2g hi). After lumping constants, the outflow term takes the form −c2√hi seen in the code above.

For the upper tanks, inflow is the fraction of pump flow not diverted: (1 − γ)·c·v. For the lower tanks, inflow has two contributions: the diverted fraction γ·c·v from the opposite pump, and the gravity drain from the tank directly above, +c2√hupper.

DEEPER · 4B Non-minimum phase regime [ ▼ ]

Linearise the system at an operating point and examine the transfer function from u to y. The numerator polynomial depends on γ1 + γ2. When the sum exceeds 1, the linearisation has no right-half-plane zeros; when it falls below 1, a right-half-plane zero appears and the response goes the wrong way before settling.

In the wrong-way regime, PID tuning is brittle and gain scheduling becomes necessary. MPC handles both regimes with the same controller because the model carries the structural information directly.

DEEPER · 4C Discretisation choice [ ▼ ]

The simulator uses RK4 with Δt = 5 s for the closed-loop animation. The controller's internal model is one-step: a learned map from (xk, uk) to xk+1. The mismatch between integrator step and controller step is part of the plant-model gap that receding-horizon feedback is supposed to absorb.

§ 05 Predictive models FIG. 04

Predicting the next state with a surrogate.

Def · surrogate model

A surrogate is a fast approximation to the true dynamics, fit from training data {(xk, uk), xk+1}. Any regression family works. They differ in accuracy, sample efficiency, and whether they expose a usable notion of uncertainty.

MPC needs f. The cheapest f is the one that runs in microseconds and matches the plant well enough on the operating envelope. Below are four common choices; all of them plug into the same optimisation loop through the same predict interface.

M.01 · baseline M / 01 β

Linear regression

Δx ≈ β · [1, x, u]. Interpretable, fast, and analytic. Adding √x features makes it surprisingly close to the true Torricelli outflow.

M.02 · better M / 02

Quadratic features

Adds xixj, xi², uj². Captures pairwise nonlinearity with the same OLS machinery. The quiet workhorse.

M.03 · flexible M / 03

Neural network

A small MLP (32 hidden units) trained with Adam. Universal approximator. Data-hungrier than the polynomial models, harder to certify.

M.04 · principled M / 04 𝒩

Gaussian process

Probabilistic regression with calibrated uncertainty. Tells the optimiser where it does not know. The next section unpacks it.

The unified interface looks like this:

# Same call site for every surrogate. def predict_next(model, x_k, u_k): z = np.hstack([x_k, u_k]).reshape(1, -1) dx = model.predict(z)[0] return x_k + dx # model = LinearRegression(...) → fast, analytical # model = QuadraticRegression(...) → polynomial features # model = MLPRegressor((32,)) → flexible, data-hungry # model = GaussianProcessRegressor(...) → probabilistic
Physics-informed features

The true outflow term is c2√h. Augmenting the linear feature vector with √h recovers the dominant nonlinearity for free. Whenever first-principles knowledge is partial, encoding it as features is cheaper and safer than learning it from scratch.

EQ. 02 Linear and physics-informed surrogate [ ▼ ]

Predict the per-step change as a linear function of features φ:

Δxk = xk+1 − xkφ(xk, uk) · β # Baseline features φ(x, u) = [1, x1, …, x4, u1, u2] # Physics-informed features φ(x, u) = [1, x, u, √x1, √x2, √x3, √x4] # Least-squares fit β = (XT X)-1 XT Y
EQ. 03 Quadratic surrogate [ ▼ ]

Extend the feature vector with squared and pairwise cross-product terms. For dimension d = nx + nu, the expansion has 1 + d + d(d+1)/2 terms.

φ(x, u) = [1, x, u, x12, …, x42, x1x2, …, x3x4, u12, u22, u1u2] # Ridge regression: needed to keep the Gram matrix conditioned β = (XT X + λI)-1 XT Y with λ ≈ 10-6

Quadratic features change the cost surface but not the call site. The optimiser still sees a smooth, twice-differentiable model.

DEEPER · 5A Neural network surrogate [ ▼ ]

A scalar-output MLP with one hidden layer and ReLU activation is universal in the sense that, for enough hidden units, it can approximate any continuous function on a compact set. For the 4-tank problem, 32 hidden units fit the per-step Δx well from a few hundred samples.

MNN(z) = W2 · ReLU(W1 z + b1) + b2

The trade-off is differentiability and conditioning, not expressivity. Optimising through a deep network at every MPC sample is feasible if (a) gradients are propagated analytically and (b) the input space is normalised so that the network's local Lipschitz constant is bounded.

DEEPER · 5B Koopman lifting [ ▼ ]

Koopman theory replaces the nonlinear state with a (usually higher-dimensional) lifted state ψ(x) on which the dynamics are linear: ψk+1 = K · ψk + B uk. The choice of lifting function ψ matters more than the algorithm.

In practice, the lift is learned (auto-encoder, dictionary of polynomials and trig functions, or eigenfunctions of K). The reward is a globally linear surrogate that admits convex QP for the resulting MPC, at the cost of a possibly large lifted dimension.

DEEPER · 5C How much data is enough? [ ▼ ]

For the linear model on the 4-tank rig, around 50 (state, input) transitions sampled from a chirped pump signal already give one-step prediction error well below the closed-loop tolerance. The quadratic model needs roughly 200. The neural network needs 400 to 1000. The Gaussian process can match the linear baseline on 50 points but then exposes its uncertainty grow in regions the chirp did not cover.

These numbers are problem-specific. The right way to set the data budget is to plot a learning curve: closed-loop tracking cost versus number of training transitions, separately for each surrogate family.

§ 06 Gaussian processes FIG. 05 · live

A surrogate that knows
what it does not know.

Def · GP

A Gaussian process places a probability distribution over functions. Conditioning on observations gives a posterior with a closed-form predictive mean and variance at every test point. The variance is what makes GPs useful for control: it tells the optimiser how trustworthy the prediction is.

Click on the chart to add a (noisy) observation. The GP posterior mean curves toward the points; the shaded band narrows around them and fans out in unexplored regions. That fanning is the model admitting it has no data.

FIG. 05 GP fit, click to add observations
Gaussian Process demo
±2σ band GP mean observations
Click 3 to 8 points on the chart. Uncertainty collapses near observations and grows in the gaps.

Closed-form posterior

Given n noisy observations {(xi, yi)}, the GP gives the mean and variance at any test point x*:

Kernel · squared exponential
k(x, x') = σf2 · exp(− ‖x − x'‖2 / 2ℓ2)
Nearby inputs are correlated. σf = output scale, ℓ = length scale.
Posterior · closed-form Bayes update
μ* = k*T (K + σn2I)-1 y
σ*2 = k** − k*T (K + σn2I)-1 k*
k* = kernel between x* and training X. K = n × n training kernel matrix.
Why this matters for control

Adding a penalty on σ*(x) inside the MPC cost discourages the controller from venturing into regions the model has not seen. The same idea underpins Bayesian optimisation and safe exploration.

EQ. 04 Derivation of the posterior [ ▼ ]

Stack training values y ∈ ℝn and a single test value y*. Under the GP prior, the joint is Gaussian:

[ y ] ~ 𝒩( 0, [ K + σn2I k* ] ) [ y* ] [ k*T k** ]

Conditioning y* on the observed y gives a Gaussian whose mean and variance are the standard Schur-complement formulas, namely the posterior moments shown above. No iteration, no MCMC.

DEEPER · 6A Choosing the kernel [ ▼ ]

The kernel encodes prior beliefs about smoothness, periodicity, and length scales.

  • Squared exponential. Infinitely differentiable sample paths. Often too smooth for physical signals.
  • Matérn ν = 3/2 or 5/2. Sample paths that are once or twice differentiable. A better default for control because real plants are not analytic.
  • Periodic. k(x,x') = σ2 exp(−2 sin2(π(x−x')/p) / ℓ2). Useful when the disturbance is daily or weekly.
  • Composite. Kernels add and multiply. k = kSE + kperiodic captures a smooth trend plus a periodic ripple.
DEEPER · 6B Hyperparameters and marginal likelihood [ ▼ ]

The kernel has hyperparameters θ = (σf, ℓ, σn). They are fit by maximising the log marginal likelihood, the probability of the observed data integrating out the latent function values:

log p(y | X, θ) = −½ yT (Kθ + σn2I)-1 y − ½ log|Kθ + σn2I| − (n/2) log 2π

Three terms with three meanings: data fit, model complexity, normalisation. Maximising this objective trades off the first two. Gradient-based optimisation (L-BFGS) converges in a few hundred iterations for the kernel sizes typical of control problems.

DEEPER · 6C GP-MPC and uncertainty in the cost [ ▼ ]

Replace the deterministic prediction with the GP posterior. Two common ways to use the variance σ*2:

  • Penalty. Add λ · σ*2(xk, uk) to the stage cost. Larger λ means more cautious behaviour.
  • Chance constraint. Require P(g(x, u) ≤ 0) ≥ 1 − α. Under a Gaussian state, this is equivalent to tightening the deterministic constraint by a multiple of σ.

The cost of propagating uncertainty through N steps is the cost of integrating the posterior mean and variance jointly. Linearising the dynamics around the posterior mean gives a closed-form update (Hewing et al., 2020)[5]. Otherwise sampling is the fallback.

DEEPER · 6D Scaling beyond a few hundred points [ ▼ ]

Naive GP inference is O(n3) because of the matrix factorisation. Fine for n ≲ 500. Beyond that, two common remedies: inducing points (FITC, VFE, SVGP) that summarise the data with mn pseudo-inputs at cost O(nm2); and structured kernels (KISS-GP, SKI) that exploit Toeplitz or Kronecker structure for O(n log n) inference. Both are off-the-shelf in GPyTorch.

§ 07 Solving the NLP EQ. 05

From OCP to solved sequence.

Def · NLP

Discretising the OCP turns it into a finite-dimensional nonlinear program (NLP): a smooth cost plus equality constraints (the dynamics) plus inequality constraints (bounds, path constraints). Two design axes: how to encode states, and how to find the minimum.

The OCP is infinite-dimensional in continuous time. The first design decision is how to discretise it; the second is which numerical algorithm to apply to the resulting NLP. The two decisions interact, especially when the dynamics are stiff or the constraints are tight.

Three ways to discretise.

01

Single shooting

Decision variables: inputs u0, …, uN-1 only. States are propagated forward by the integrator. Cheap NLP, but ill-conditioned for unstable systems.

02

Multiple shooting

Decision variables include states at each shooting node. Dynamics enter as equality constraints xk+1 = f(xk, uk). Larger NLP, far better conditioning, the workhorse of fast NMPC.

03

Direct collocation

States and their derivatives at collocation points are decision variables. Dynamics become algebraic equations. Largest NLP, best for stiff ODEs and DAEs, default in CasADi[7] and do-mpc[8].

Solvers.

Two algorithms dominate practical MPC. Sequential quadratic programming (SQP) builds a quadratic model of the Lagrangian and a linear model of the constraints, then solves a QP subproblem at each iteration. Interior-point methods (IPOPT) handle inequalities by adding a logarithmic barrier and applying Newton steps to the perturbed KKT system. SQP is fast when warm-started; interior-point is more robust on tight constraints.

A typical setup in do-mpc[8] looks like the snippet below. The toolbox handles collocation, automatic differentiation through the dynamics, and IPOPT interfacing.

# A minimal NMPC controller with do-mpc. import do_mpc from casadi import SX, vertcat model = do_mpc.model.Model('continuous') h = [model.set_variable('_x', f'h{i}') for i in range(1, 5)] u = [model.set_variable('_u', f'u{i}') for i in range(1, 3)] # right-hand side from §04 (truncated for brevity) model.set_rhs('h1', (1 - g2)*c*v2 - c2*sqrt(h[0])) ... model.setup() mpc = do_mpc.controller.MPC(model) mpc.set_param(n_horizon=5, t_step=5.0, store_full_solution=True) mpc.set_objective( lterm = (h[0] - 28)**2 + (h[2] - 27)**2, # stage cost ℓ mterm = (h[0] - 28)**2 + (h[2] - 27)**2, # terminal cost m ) mpc.bounds['lower','_u','u1'] = 0; mpc.bounds['upper','_u','u1'] = 25 mpc.bounds['lower','_u','u2'] = 0; mpc.bounds['upper','_u','u2'] = 25 mpc.setup() # Closed loop x = x0 for step in range(200): u_apply = mpc.make_step(x) # solves the NLP, returns u0* x = simulator.step(x, u_apply)
EQ. 05 KKT conditions, the optimality certificate [ ▼ ]

For the NLP min f(z) s.t. h(z) = 0, g(z) ≤ 0, a primal point z* with multipliers λ, μ ≥ 0 is a local minimum if the Karush-Kuhn-Tucker conditions hold:

stationarity: ∇f(z*) + λT∇h(z*) + μT∇g(z*) = 0 primal feasibility: h(z*) = 0, g(z*) ≤ 0 dual feasibility: μ ≥ 0 complementarity: μi · gi(z*) = 0, ∀ i

SQP solves a sequence of QPs whose KKT systems converge to these conditions. Interior-point smooths the complementarity condition with a barrier parameter μbar and drives it to zero.

DEEPER · 7A Automatic differentiation [ ▼ ]

Every NLP solver needs gradients of the objective and the constraints. Finite differences are too noisy and too slow for control. The two robust options:

  • Symbolic AD (CasADi[7], JAX). The dynamics are expressed in a symbolic graph; the solver pulls exact Jacobians and Hessians from it. Best for systems where the dynamics can be written in code.
  • Tape-based AD (PyTorch, autograd). Records the operations of a forward pass, then walks the tape backwards to compute gradients. Native for neural-network surrogates.

For learned surrogates, exposing the model through the same AD machinery used by the solver removes the largest source of bugs in NMPC pipelines.

DEEPER · 7B Real-time iteration [ ▼ ]

The real-time iteration scheme (Diehl, Bock, Schlöder, 2002)[4] accepts that the NLP at sample k is only one Newton step away from the NLP at k−1, given a shifted warm start. It performs exactly one SQP iteration per sample. The resulting controller is no longer the exact MPC law but a tracking approximation. In experiments on automotive and aerospace systems, the closed-loop performance gap is small while the worst-case timing is bounded.

The acados toolbox is the modern reference implementation.

DEEPER · 7C When the QP is convex [ ▼ ]

For linear dynamics with a quadratic cost and linear constraints, the resulting program is a quadratic program. Active-set methods (qpOASES) excel under warm starting, interior-point methods (HPIPM) excel on cold starts. Both reach kilohertz rates on modest hardware. This is why linear MPC dominates production deployments even when the plant is mildly nonlinear: the speedup of staying in QP land is often worth the modelling compromise.

§ 08 Robust and stochastic FIG. 06 · comparison

What happens when the model is wrong.

Def · robust MPC

A controller is robust if it satisfies its constraints for every realisation of an uncertain parameter or disturbance in a specified set. A stochastic controller goes further: it controls the probability of violating each constraint to a chosen level.

Nominal MPC plans against a single forecast. Real systems disagree with the forecast in ways that are sometimes bounded (parameter mismatch) and sometimes random (process noise). Three families of methods exist for handling this gap; the right choice depends on what the uncertainty looks like.

R.01 · pessimistic R / 01

Min-max MPC

Plan for the worst case over the uncertainty set. Strong guarantees, often conservative. Solver is a saddle-point problem.

R.02 · scenario R / 02

Multi-stage MPC

Build a scenario tree of plausible futures[6]. Optimise over the tree with non-anticipativity constraints. The default in do-mpc[8].

R.03 · geometric R / 03

Tube MPC

Decompose the trajectory into a nominal part and an error tube. Tighten constraints by the tube; control the nominal part only.

R.04 · probabilistic R / 04

Chance constraints

Require P(g ≤ 0) ≥ 1 − α. Under Gaussian state, this is constraint tightening by k·σ.

Scenario tree, intuitively.

At each decision point, several discrete realisations of the uncertain parameter p are considered. The tree branches into all of them. The optimiser chooses one control trajectory per branch, with the constraint that branches sharing the same parent node must share the same control input (information cannot be used before it arrives).

Scenario tree, two uncertainty realisations branching over two stages k = 0 k = 1 k = 2 k = 3 x₀ p⁺ p⁻ N_robust measured branching held (p frozen)
Scenario tree for a 3-stage horizon with two uncertainty realisations p± and a robust horizon Nrobust = 2. After the robust horizon, the uncertainty is frozen to keep the tree finite.

Surrogate quality and closed-loop spread.

Robustness is not only a matter of the controller. It also depends on the surrogate. The histogram below shows the closed-loop cost over 100 runs per model on the 4-tank rig. Distributions concentrated to the left and narrow indicate a surrogate-controller pair that is reliably good. Wide tails are the warning sign that the closed loop has worst-case behaviour the mean does not show.

FIG. 06 Cost distribution, 100 runs per surrogate N500 runs
Model performance comparison
Lower cost means better tracking. Toggle to compare. The GP and the NN concentrate to the left; the decision tree has a heavy right tail.
EQ. 06 Multi-stage NMPC formulation [ ▼ ]

Index every node of the scenario tree by (k, j). Let p(j) denote the parent and r(j) the realisation of p at node j. The robust NLP is:

minxkj, ukj Σj ωj · Jj(𝐱j, 𝐮j) subject to x0 = x̂ xk+1j = f(xkp(j), ukj, pr(j)) uki = ukj # non-anticipativity: identical inputs share parents g(xkj, ukj) ≤ 0 bounds on xkj, ukj

The scenario weights ωj can be set to equal probability or to a chosen worst-case bias. Branching is usually limited to Nrobust ≤ N to keep the tree size manageable; after the robust horizon, the uncertainty is held fixed.

DEEPER · 8A Tube MPC [ ▼ ]

Decompose the true state as xk = zk + ek, with z the nominal state and e the error driven by the bounded disturbance. A local feedback gain K drives e to a robustly positively invariant set ℰ. The MPC then controls only z through a nominal model, with constraints tightened by ℰ (Pontryagin difference).

Tube MPC gives clean theoretical guarantees and modest online cost (a nominal NLP plus a precomputed local gain). Computing ℰ exactly is hard in high dimensions; approximations (homothetic, elastic, parameterised tubes) are an active area.

DEEPER · 8B Chance constraints under Gaussian state [ ▼ ]

Require P(aTx ≤ b) ≥ 1 − α. If x is Gaussian with mean μ and covariance Σ, the constraint becomes a deterministic, tightened linear inequality:

aT μ + Φ-1(1 − α) · √(aT Σ a) ≤ b

Φ-1 is the standard normal inverse CDF; α = 0.05 gives Φ-1(0.95) ≈ 1.645. Larger σ means a larger tightening, exactly the behaviour we want when the GP variance is high.

DEEPER · 8C When in practice does this matter? [ ▼ ]

For the 4-tank rig with a well-identified linear surrogate, nominal MPC already gives tight closed-loop tracking. Robust methods become essential when (a) the constraints are operationally tight (a tank that must not overflow, a vehicle that must not leave a lane), (b) the disturbance has a known but persistent magnitude (a wind gust, an inlet temperature drift), or (c) the learned model is honest about its uncertainty and the controller should respect that honesty rather than overrule it.

§ 09 Where to go next REFERENCES

What to read, what to try.

Outlook

The classical theory of MPC is settled. The interesting research directions are at the seams with machine learning: learned models, learned solvers, learned policies that imitate the optimiser. Each entry below is a self-contained rabbit hole.

A · LEARNING

Learning-based MPC

Replace or augment the dynamics model with a learned component (GP, NN, Koopman). Open questions: safe exploration, statistical guarantees, distributional shift between training and deployment. See Hewing et al. (2020)[5] for the survey.

B · RL

MPC ↔ reinforcement learning

Either direction works. MPC as a teacher: generate demonstrations, distill via behaviour cloning into a fast policy network (approximate MPC, see acados-amg, do-mpc.approximateMPC). MPC as a stabilising prior inside RL: actor-critic with a built-in safety filter.

C · OPTIMIZATION

Faster solvers

Code-generated NMPC (acados, FORCES Pro) reaches kilohertz on embedded targets. Differentiable optimisation (cvxpylayers, theseus) makes the solver a layer in a larger gradient pipeline. ProxQP and OSQP for linear MPC at microsecond latency.

D · APPLICATIONS

Economic and hierarchical MPC

Drop the reference and optimise a performance index directly. Cascade slow economic decisions and fast tracking layers on the same plant. Active in process industries, autonomous racing, building energy management, and grid balancing.

Selected references

  • [1] Rawlings, J. B., Mayne, D. Q., Diehl, M. Model Predictive Control: Theory, Computation, and Design. 2nd ed., Nob Hill, 2020. The standard graduate textbook.
  • [2] Borrelli, F., Bemporad, A., Morari, M. Predictive Control for Linear and Hybrid Systems. Cambridge, 2017. Linear MPC done rigorously, plus hybrid extensions.
  • [3] Johansson, K. H. The quadruple-tank process: a multivariable laboratory process with an adjustable zero. IEEE TCST, 2000. The original 4-tank rig used here.
  • [4] Diehl, M., Bock, H. G., Schlöder, J. P. A real-time iteration scheme for nonlinear optimization in optimal feedback control. SIAM J. Control Optim., 2002. The one-Newton-step trick.
  • [5] Hewing, L., Wabersich, K. P., Menner, M., Zeilinger, M. N. Learning-based model predictive control: toward safe learning in control. Annu. Rev. Control Robot. Auton. Syst., 2020. The reference survey.
  • [6] Lucia, S., Finkler, T., Engell, S. Multi-stage nonlinear model predictive control applied to a semi-batch polymerization reactor under uncertainty. J. Process Control, 2013. The multi-stage approach in detail.
  • [7] Andersson, J. A. E., Gillis, J., Horn, G., Rawlings, J. B., Diehl, M. CasADi: a software framework for nonlinear optimization and optimal control. Math. Program. Comput., 2019. The AD and modelling layer underneath most modern MPC stacks.
  • [8] Fiedler, F., Karg, B., Lüken, L., Brandner, D., Heinlein, M., Brabender, F., Lucia, S. do-mpc: towards FAIR nonlinear and robust model predictive control. Control Eng. Pract., 2023. The toolbox used in §07's code snippet.
r/cavemanscience · posted by u/Grug_Big_Brain · 40,012 sun-times ago IRL: Parastoo Semnani, TU Berlin · Imperial College London ▮ LIVE
G
u/Grug_Big_Brain
posted in r/cavemanscience · 40,012 sun-times ago · edited (add more rock) · ★★★★★
+1,204
▲ 1,204 ROCK 💬 89 REPLY ↗ SHARE WITH CAVE 🦴 SAVE FOR LATER 🔥 TRENDING #1
PINNED BY MOD u/Ugg_The_Wise · "this what we need." · 87 sun-times ago

OOGA BOOGA
MPC.

Grug learn MPC in 5 sun-time. Maybe 4.

U HIT HEAD ON ROCK? Same. Now Grug teach. We learn how big-brain robot make smart move.

Real word: Model Predictive Control. Grug call it: think then throw stick.

Five chunk. Eat slow. No swallow whole. 5/5 ROCKS

be Grug
see mammoth far away
belly empty
cannot just yeet spear and pray
must THINK first
think then throw
mfw I just invented control theory

1THE WANT.

Every smart thing need want. Grug want mammoth meat. Robot want car stay on road. Rocket want go up sky, not down sky.

grug want mammoth in belly. mammoth over there. grug here. need close that space. 🦣
The reference trajectory (or setpoint) is where you want your system's state to go over time. The controller's job is to push the system toward this target.
GRUG MAMMOTH THIS = THE WANT
fig 1 — grug → mammoth · arrow = direction of want

2BRAIN-PICTURE OF FUTURE.

Before throw spear, Grug close eye. Grug see spear in head. Spear go up, spear come down, spear maybe miss. Grug see this before it happen. PSYCHIC????

if grug throw spear THIS hard, THIS angle... where spear go? grug see in head-cave. 🧠
This is the model. A set of equations (or rules) that predict how the system will behave when you apply a control input. Without a model, you cannot look into the future.

Robot brain do same. It have math-rule: "if I push gas pedal this much, car go this fast in two heartbeats." Math-rule called dynamics model.

no model = no future-see
no future-see = throw spear blind
blind throw = no mammoth
no mammoth = grug cry

3TRY MANY PLAN IN HEAD.

Grug not throw first thing pop in head. Grug think: throw soft? throw hard? throw left? throw right? Grug imagine MANY MANY spear-flight. Pick best one.

throw 50 spear in head. only one go in real world. save many spear that way. 🪨
Optimization over a sequence of future control inputs. The controller searches many candidate action sequences and picks the one that minimizes a cost function — a number that says "how bad is this plan?"
throw_many_spear.bmp - Paint _ ×
FileEditViewImageColorsHelp
GRUG (BAD PLAN) (ALSO BAD) (VERY BAD) ★ BEST SPEAR-PATH ★ FOOD
For Help, click Help Topics on the Help Menu. 412, 158
fig 2 — blue = throws in head · red = the one grug pick

cost = how-much-bad number. Big cost mean: spear miss, OR Grug arm fall off, OR spear hit own foot. Small cost mean: spear hit mammoth, Grug not break. Pick plan with smallest bad-number.

4RULES. NO CHEAT.

Grug arm not bend backward. Grug not throw spear faster than wind-god allow. Spear not poke own eye. These rules. CANNOT BREAK

arm bend only one way. cannot throw harder than grug arm. these rule cannot ignore. even in head.
Constraints. Hard limits on what the controller can do (max motor torque, valve fully open, no negative speed) or what the system must stay within (don't crash, don't overheat, stay on road). MPC handles these directly — that's its superpower over older control methods.

This where MPC beat other think-tools. Old way: just push pedal, hope no crash. MPC way: plan whole future, refuse plan that crash. Safety in the plan.

5THROW ONE. THEN LOOK AGAIN.

Big trick. Listen close. Grug make plan for many step. But Grug only do FIRST step. Then Grug LOOK AGAIN. Make NEW plan. Do first step of NEW plan. Look again. forever.

world change fast. wind blow. mammoth run. grug not commit to old plan. look, think, move ONCE. 🔁 repeat.
This is the receding horizon. At every timestep: (1) measure the current state, (2) solve the optimization for the next N steps, (3) apply only the first control input, (4) throw the rest of the plan away, (5) repeat. This gives closed-loop feedback — the controller constantly re-plans using fresh information.
look_loop_forever_v2_FINAL.bmp - Paint _ ×
FileEditViewImageColorsHelp
LOOK PLAN 5 DO 1 LOOK PLAN 5 DO 1 REPEAT FOREVER. ALWAYS LOOK FRESH.
Selection: 600 × 200 320, 96
fig 3 — loop forever · plan 5 · do 1 · throw away the rest (screenshot lost left-corner, sry)

Why? Because world lie. Wind change. Mammoth move. Grug brain-picture not perfect. By always look-again, MPC fix own mistake every heartbeat.

▶ WATCH GRUG HUNT (LIVE)

orange dot = grug. brown blob = mammoth (it run away). yellow dots = grug's plan in head. solid orange line = where grug actually go. watch plan change every step.

PLAN LENGTH (HORIZON): 15 STEPS

BIG-BRAIN RECAP.

Grug tired. Grug list it. Five thing. Remember five thing, Grug know MPC.

  1. the want. where thing should be. (reference / setpoint)
  2. brain-picture. math that predict future if you push thing. (model / dynamics)
  3. try many, pick best. search for action sequence with lowest pain number. (optimization / cost function)
  4. cannot break rule. honor limits on action and state. (constraints)
  5. do one, look again. apply first action, re-plan from new measurement. (receding horizon, closed-loop)

where MPC live in real world:

self-drive car (steer + brake). rocket land on legs. big factory tower keep temperature good. insulin pump for sick people. drone fly through trees. power grid balance. even your phone battery charge plan.

anywhere there is: goal + things-that-must-not-happen + future-can-be-predicted. MPC sit there. FACT

Q1 / WHY MPC ONLY DO FIRST STEP OF PLAN, THEN RE-PLAN?
grug lazy. grug forget rest.
reality differ from prediction; fresh measurement fix the drift
computer too slow for one plan
Q2 / WHAT IS COST FUNCTION?
how many rock grug pay shaman
how fast computer run
a number that scores how bad a candidate plan is; we minimize it
Q3 / WHAT MPC DO THAT SIMPLE PID CANNOT?
handle constraints (limits) directly inside the plan
use electricity
move fast