Model Predictive Control.
A reference for the working researcher. Every section opens with a one-sentence definition and a runnable picture. Each closes with optional deeper material in the form of expandable derivations and code. The running example is a four-tank process.
What model predictive control is.
Model predictive control is a feedback policy that, at every sampling time, uses a dynamics model to predict the system over a finite future window, solves an optimization problem to choose a sequence of inputs, applies only the first input, and then repeats with a fresh measurement.
MPC turns control into a question we know how to answer: optimization. The cost of a candidate input sequence is computed against a forecast, the optimizer picks the sequence with lowest cost, only the first action is applied, and the procedure restarts at the next time step.
Three ingredients define every MPC controller:
- A model that predicts the state evolution given a candidate control sequence. The model can be a first-principles ODE, a linear state-space realisation, a neural network, or a Gaussian process.
- A cost that scores predicted trajectories against a goal, usually a quadratic penalty on tracking error plus a penalty on input effort.
- A repeated solve. The optimization is solved from scratch at each step using the latest state estimate. This is the receding horizon.
Choose once, commit
Pick the action that minimises a fixed objective. No measurement is taken after the decision. Works when the model is exact and the environment is benign.
Decide, observe, redecide
The optimal action at time k depends on what will be possible at k+1, which depends on a state that has not yet been measured. The loop closes by measurement.
Where MPC sits among controllers.
The four methods below all map a measured state to a control input. They differ in how much structure they impose on the problem and how that structure is exploited online.
| Method | Model | Constraints | Online cost | Typical use |
|---|---|---|---|---|
| PID | none, tuned gains | handled via anti-windup heuristics | µs | single-loop regulation |
| LQR | linear, fixed | not represented | µs (precomputed gain) | linear regulation around an operating point |
| MPC | nonlinear, possibly learned | explicit, hard and soft | ms per step | multivariable, constrained, anticipative |
| RL policy | implicit (in the policy network) | via reward shaping, not native | µs (forward pass) | large-scale or hard-to-model dynamics |
The argument for MPC is structural. Constraints, anticipation of future setpoints, and multivariable coupling enter the formulation directly rather than through heuristic tuning. The price is an optimization at every sample.
The earliest practical MPC was developed inside Shell in the late 1970s under the name Dynamic Matrix Control. By the mid 1980s, model-based predictive controllers were standard practice in refineries because they could enforce operating constraints that PID layers could only approximate. The theoretical foundation, in particular stability via terminal cost and terminal constraint, matured through the 1990s in the work of Mayne, Rawlings, Rakovic, and others.
Two shifts changed the field after 2010. Fast convex solvers and code generation tools (CVXGEN, qpOASES, HPIPM, acados) brought MPC into the millisecond regime, opening up automotive and robotic applications. Learning-based components (Gaussian processes, neural surrogates, policies that imitate the optimizer) closed the remaining gap to systems whose first-principles models are unavailable or too slow to integrate online.
Use MPC when at least two of the following are true: the system has multiple coupled inputs and outputs, operating constraints are tight and cannot be slack, future setpoints or disturbances are known in advance, or the plant is genuinely nonlinear in the operating range.
Avoid MPC when the sampling time is below what the solver can return reliably (typically below a few hundred microseconds for nonlinear NLPs), when the model is unreliable and no honest uncertainty representation is available, or when the cost of an occasional infeasible solve cannot be tolerated and no fallback policy is in place.
The optimal control problem.
At every sampling step k, MPC solves a finite-horizon constrained optimization over a sequence of inputs u₀, …, uN−1. The cost is a sum of stage costs plus a terminal cost; the dynamics enter as equality constraints; bounds enter as inequalities.
The problem below is the discrete-time form used by almost every MPC implementation. Continuous-time dynamics are first discretised, then this same template applies.
The standard formulation, after Rawlings and Mayne[1]. m is the terminal (Mayer) cost; ℓ is the stage (Lagrange) cost.
The tracking case sets ℓ(x, u) = ‖x − xref‖2Q + ‖u − uref‖2R with positive definite weighting matrices Q and R. Economic MPC drops the reference and lets ℓ be any user-defined performance metric (cost, throughput, energy).
Most physical systems are described in continuous time: ẋ(t) = fc(x(t), u(t)). The OCP in continuous time replaces the sum with an integral and the recurrence with an ODE constraint. Direct methods (single shooting, multiple shooting, collocation) reduce it to a finite-dimensional NLP. §07 covers the trade-offs between these.
A finite-horizon optimum need not approach the infinite-horizon optimum. The standard fix is to choose m(x) so that it approximates the cost-to-go beyond the horizon, and to add a terminal set 𝒳f in which a local stabilizing control law is known to exist.
For linear systems with a quadratic cost, this is automatic: take m(x) = xᵀPx where P solves the discrete-time algebraic Riccati equation, and let 𝒳f be the largest invariant set under the unconstrained LQR gain. The resulting MPC is a stabilising policy on the feasible region.
For nonlinear systems, a sufficient condition is the existence of a control-Lyapunov function on 𝒳f. The Borrelli, Bemporad, and Morari textbook (2017)[2] gives the full account.
Hard state constraints can render the OCP infeasible after a large disturbance. The standard remedy is to relax them with a non-negative slack variable s and to penalise s heavily in the cost.
The choice ρ ≫ 0 ensures that s = 0 whenever the hard constraint is feasible. L1 penalisation is preferred over L2 because it preserves the property that the constraint is exactly satisfied when possible.
Tracking MPC penalises deviation from a reference. Economic MPC optimises a performance index directly, with no reference. A power plant that minimises fuel cost or a batch reactor that maximises product yield is naturally economic. Stability theory is more delicate in the economic case because the optimal steady state is not known in advance and may not coincide with the minimiser of the stage cost.
Plan five, do one, repeat.
Receding horizon control means: solve the OCP from the current measurement, apply only u0*, discard the rest of the optimal sequence, take a new measurement, and re-solve. The horizon shifts forward by one sample at every step.
This is the closed-loop ingredient. The first action is always the one that meets reality; everything past u0* is plan, not commitment. Re-solving with a fresh measurement is how the controller absorbs model error and unmodelled disturbances.
The plan is computed against a forecast that drifts from reality. Replanning at every sample from the measured state collapses that drift before it accumulates. The only sample that ever touches the plant is u0*.
Two horizons can in principle differ. The prediction horizon Np is how far ahead the model is integrated. The control horizon Nc ≤ Np is how many of those steps the optimiser is free to choose; beyond Nc, the input is held constant or computed by a known feedback law. Shorter Nc reduces solver work; equal horizons are most common.
Let VN(x) be the optimal value of the N-step OCP from state x. Under the terminal ingredients of §02·2B, one can show that VN is a Lyapunov function for the closed-loop system: VN(f(x, u0*(x))) ≤ VN(x) − ℓ(x, u0*(x)). This decrease implies asymptotic stability of the desired equilibrium on the region where the OCP remains feasible.
Without a terminal cost or set, only suboptimality bounds are available. They are often satisfactory in practice but no longer give a clean stability certificate.
Two consecutive OCPs differ by one time step. The cheapest warm start is the shift initialisation: take the previous optimal sequence, drop u0*, and append a guess for uN-1 (often the steady-state input or the terminal feedback evaluated at xN*).
With Newton-type solvers, this gives most of the convergence in one or two iterations. Real-time iteration (Diehl et al., 2002)[4] takes this further: do exactly one Newton step per sample. Optimality is sacrificed for a hard real-time guarantee that the controller always produces an output on time.
Three rules of thumb. First, Np should be long enough to cover the slowest dominant time constant of the system; otherwise the controller cannot see the consequence of its actions. Second, doubling Np typically doubles solver cost, so the practical range is dictated by the available compute budget. Third, in the presence of a good terminal cost, surprisingly short horizons (Np = 5 to 10) often suffice even for systems with much longer settling times.
The four-tank process.
Two pumps fill four interconnected tanks. The lower tanks T1 and T3 are the controlled variables. The upper tanks T2 and T4 drain into them with state-dependent flow splits. The plant is nonlinear, multivariable, and (for certain split values) non-minimum phase.
We use the quadruple-tank rig of Johansson (2000)[3] as the running example. It is small enough to follow by hand and rich enough to make every later concept visible: coupling, nonlinearity, learned surrogates, uncertainty.
The true dynamics follow four coupled ODEs with square-root outflow terms (Torricelli's law). Each pump drives one upper tank directly and routes a state-dependent fraction γ to the diagonal lower tank. The fractions γ1, γ2 are slowly time-varying, which makes the system non-stationary by construction.
# Continuous-time dynamics. What the simulator integrates.
def four_tank(x, u, t):
h1, h2, h3, h4 = x
u1, u2 = u
v1, v2 = 20*u1, 20*u2 # pump flows
gamma1 = (1 + np.sin(t/15)) * 0.1 * h1/45
gamma2 = (1 + np.cos(t/15)) * 0.1 * h2/45
return np.array([
(1 - gamma2) * c * v2 - c2 * np.sqrt(h1),
(1 - gamma1) * c * v1 - c2 * np.sqrt(h2),
gamma1 * c * v1 + c2 * (np.sqrt(h1) - np.sqrt(h3)),
gamma2 * c * v2 + c2 * (np.sqrt(h2) - np.sqrt(h4)),
])
# c = 0.003488, c2 = 0.15440
Integrating this ODE inside an optimiser that evaluates hundreds of candidate sequences per sample is too slow for real-time control. The fix in §05 is to replace f with a cheap regression model M fitted to data from this ODE.
Each tank has cross-section A. Conservation of mass on tank i gives A · dhi/dt = qin,i − qout,i. Outflow through a small hole at the bottom of cross-section ai obeys Torricelli's law, qout,i = ai √(2g hi). After lumping constants, the outflow term takes the form −c2√hi seen in the code above.
For the upper tanks, inflow is the fraction of pump flow not diverted: (1 − γ)·c·v. For the lower tanks, inflow has two contributions: the diverted fraction γ·c·v from the opposite pump, and the gravity drain from the tank directly above, +c2√hupper.
Linearise the system at an operating point and examine the transfer function from u to y. The numerator polynomial depends on γ1 + γ2. When the sum exceeds 1, the linearisation has no right-half-plane zeros; when it falls below 1, a right-half-plane zero appears and the response goes the wrong way before settling.
In the wrong-way regime, PID tuning is brittle and gain scheduling becomes necessary. MPC handles both regimes with the same controller because the model carries the structural information directly.
The simulator uses RK4 with Δt = 5 s for the closed-loop animation. The controller's internal model is one-step: a learned map from (xk, uk) to xk+1. The mismatch between integrator step and controller step is part of the plant-model gap that receding-horizon feedback is supposed to absorb.
Predicting the next state with a surrogate.
A surrogate is a fast approximation to the true dynamics, fit from training data {(xk, uk), xk+1}. Any regression family works. They differ in accuracy, sample efficiency, and whether they expose a usable notion of uncertainty.
MPC needs f. The cheapest f is the one that runs in microseconds and matches the plant well enough on the operating envelope. Below are four common choices; all of them plug into the same optimisation loop through the same predict interface.
Linear regression
Δx ≈ β · [1, x, u]. Interpretable, fast, and analytic. Adding √x features makes it surprisingly close to the true Torricelli outflow.
Quadratic features
Adds xixj, xi², uj². Captures pairwise nonlinearity with the same OLS machinery. The quiet workhorse.
Neural network
A small MLP (32 hidden units) trained with Adam. Universal approximator. Data-hungrier than the polynomial models, harder to certify.
Gaussian process
Probabilistic regression with calibrated uncertainty. Tells the optimiser where it does not know. The next section unpacks it.
The unified interface looks like this:
# Same call site for every surrogate.
def predict_next(model, x_k, u_k):
z = np.hstack([x_k, u_k]).reshape(1, -1)
dx = model.predict(z)[0]
return x_k + dx
# model = LinearRegression(...) → fast, analytical
# model = QuadraticRegression(...) → polynomial features
# model = MLPRegressor((32,)) → flexible, data-hungry
# model = GaussianProcessRegressor(...) → probabilistic
The true outflow term is c2√h. Augmenting the linear feature vector with √h recovers the dominant nonlinearity for free. Whenever first-principles knowledge is partial, encoding it as features is cheaper and safer than learning it from scratch.
Predict the per-step change as a linear function of features φ:
Extend the feature vector with squared and pairwise cross-product terms. For dimension d = nx + nu, the expansion has 1 + d + d(d+1)/2 terms.
Quadratic features change the cost surface but not the call site. The optimiser still sees a smooth, twice-differentiable model.
A scalar-output MLP with one hidden layer and ReLU activation is universal in the sense that, for enough hidden units, it can approximate any continuous function on a compact set. For the 4-tank problem, 32 hidden units fit the per-step Δx well from a few hundred samples.
The trade-off is differentiability and conditioning, not expressivity. Optimising through a deep network at every MPC sample is feasible if (a) gradients are propagated analytically and (b) the input space is normalised so that the network's local Lipschitz constant is bounded.
Koopman theory replaces the nonlinear state with a (usually higher-dimensional) lifted state ψ(x) on which the dynamics are linear: ψk+1 = K · ψk + B uk. The choice of lifting function ψ matters more than the algorithm.
In practice, the lift is learned (auto-encoder, dictionary of polynomials and trig functions, or eigenfunctions of K). The reward is a globally linear surrogate that admits convex QP for the resulting MPC, at the cost of a possibly large lifted dimension.
For the linear model on the 4-tank rig, around 50 (state, input) transitions sampled from a chirped pump signal already give one-step prediction error well below the closed-loop tolerance. The quadratic model needs roughly 200. The neural network needs 400 to 1000. The Gaussian process can match the linear baseline on 50 points but then exposes its uncertainty grow in regions the chirp did not cover.
These numbers are problem-specific. The right way to set the data budget is to plot a learning curve: closed-loop tracking cost versus number of training transitions, separately for each surrogate family.
A surrogate that knows
what it does not know.
A Gaussian process places a probability distribution over functions. Conditioning on observations gives a posterior with a closed-form predictive mean and variance at every test point. The variance is what makes GPs useful for control: it tells the optimiser how trustworthy the prediction is.
Click on the chart to add a (noisy) observation. The GP posterior mean curves toward the points; the shaded band narrows around them and fans out in unexplored regions. That fanning is the model admitting it has no data.
Closed-form posterior
Given n noisy observations {(xi, yi)}, the GP gives the mean and variance at any test point x*:
σ*2 = k** − k*T (K + σn2I)-1 k*
Adding a penalty on σ*(x) inside the MPC cost discourages the controller from venturing into regions the model has not seen. The same idea underpins Bayesian optimisation and safe exploration.
Stack training values y ∈ ℝn and a single test value y*. Under the GP prior, the joint is Gaussian:
Conditioning y* on the observed y gives a Gaussian whose mean and variance are the standard Schur-complement formulas, namely the posterior moments shown above. No iteration, no MCMC.
The kernel encodes prior beliefs about smoothness, periodicity, and length scales.
- Squared exponential. Infinitely differentiable sample paths. Often too smooth for physical signals.
- Matérn ν = 3/2 or 5/2. Sample paths that are once or twice differentiable. A better default for control because real plants are not analytic.
- Periodic. k(x,x') = σ2 exp(−2 sin2(π(x−x')/p) / ℓ2). Useful when the disturbance is daily or weekly.
- Composite. Kernels add and multiply. k = kSE + kperiodic captures a smooth trend plus a periodic ripple.
The kernel has hyperparameters θ = (σf, ℓ, σn). They are fit by maximising the log marginal likelihood, the probability of the observed data integrating out the latent function values:
Three terms with three meanings: data fit, model complexity, normalisation. Maximising this objective trades off the first two. Gradient-based optimisation (L-BFGS) converges in a few hundred iterations for the kernel sizes typical of control problems.
Replace the deterministic prediction with the GP posterior. Two common ways to use the variance σ*2:
- Penalty. Add λ · σ*2(xk, uk) to the stage cost. Larger λ means more cautious behaviour.
- Chance constraint. Require P(g(x, u) ≤ 0) ≥ 1 − α. Under a Gaussian state, this is equivalent to tightening the deterministic constraint by a multiple of σ.
The cost of propagating uncertainty through N steps is the cost of integrating the posterior mean and variance jointly. Linearising the dynamics around the posterior mean gives a closed-form update (Hewing et al., 2020)[5]. Otherwise sampling is the fallback.
Naive GP inference is O(n3) because of the matrix factorisation. Fine for n ≲ 500. Beyond that, two common remedies: inducing points (FITC, VFE, SVGP) that summarise the data with m ≪ n pseudo-inputs at cost O(nm2); and structured kernels (KISS-GP, SKI) that exploit Toeplitz or Kronecker structure for O(n log n) inference. Both are off-the-shelf in GPyTorch.
From OCP to solved sequence.
Discretising the OCP turns it into a finite-dimensional nonlinear program (NLP): a smooth cost plus equality constraints (the dynamics) plus inequality constraints (bounds, path constraints). Two design axes: how to encode states, and how to find the minimum.
The OCP is infinite-dimensional in continuous time. The first design decision is how to discretise it; the second is which numerical algorithm to apply to the resulting NLP. The two decisions interact, especially when the dynamics are stiff or the constraints are tight.
Three ways to discretise.
Single shooting
Decision variables: inputs u0, …, uN-1 only. States are propagated forward by the integrator. Cheap NLP, but ill-conditioned for unstable systems.
Multiple shooting
Decision variables include states at each shooting node. Dynamics enter as equality constraints xk+1 = f(xk, uk). Larger NLP, far better conditioning, the workhorse of fast NMPC.
Solvers.
Two algorithms dominate practical MPC. Sequential quadratic programming (SQP) builds a quadratic model of the Lagrangian and a linear model of the constraints, then solves a QP subproblem at each iteration. Interior-point methods (IPOPT) handle inequalities by adding a logarithmic barrier and applying Newton steps to the perturbed KKT system. SQP is fast when warm-started; interior-point is more robust on tight constraints.
A typical setup in do-mpc[8] looks like the snippet below. The toolbox handles collocation, automatic differentiation through the dynamics, and IPOPT interfacing.
# A minimal NMPC controller with do-mpc.
import do_mpc
from casadi import SX, vertcat
model = do_mpc.model.Model('continuous')
h = [model.set_variable('_x', f'h{i}') for i in range(1, 5)]
u = [model.set_variable('_u', f'u{i}') for i in range(1, 3)]
# right-hand side from §04 (truncated for brevity)
model.set_rhs('h1', (1 - g2)*c*v2 - c2*sqrt(h[0]))
...
model.setup()
mpc = do_mpc.controller.MPC(model)
mpc.set_param(n_horizon=5, t_step=5.0, store_full_solution=True)
mpc.set_objective(
lterm = (h[0] - 28)**2 + (h[2] - 27)**2, # stage cost ℓ
mterm = (h[0] - 28)**2 + (h[2] - 27)**2, # terminal cost m
)
mpc.bounds['lower','_u','u1'] = 0; mpc.bounds['upper','_u','u1'] = 25
mpc.bounds['lower','_u','u2'] = 0; mpc.bounds['upper','_u','u2'] = 25
mpc.setup()
# Closed loop
x = x0
for step in range(200):
u_apply = mpc.make_step(x) # solves the NLP, returns u0*
x = simulator.step(x, u_apply)
For the NLP min f(z) s.t. h(z) = 0, g(z) ≤ 0, a primal point z* with multipliers λ, μ ≥ 0 is a local minimum if the Karush-Kuhn-Tucker conditions hold:
SQP solves a sequence of QPs whose KKT systems converge to these conditions. Interior-point smooths the complementarity condition with a barrier parameter μbar and drives it to zero.
Every NLP solver needs gradients of the objective and the constraints. Finite differences are too noisy and too slow for control. The two robust options:
- Symbolic AD (CasADi[7], JAX). The dynamics are expressed in a symbolic graph; the solver pulls exact Jacobians and Hessians from it. Best for systems where the dynamics can be written in code.
- Tape-based AD (PyTorch, autograd). Records the operations of a forward pass, then walks the tape backwards to compute gradients. Native for neural-network surrogates.
For learned surrogates, exposing the model through the same AD machinery used by the solver removes the largest source of bugs in NMPC pipelines.
The real-time iteration scheme (Diehl, Bock, Schlöder, 2002)[4] accepts that the NLP at sample k is only one Newton step away from the NLP at k−1, given a shifted warm start. It performs exactly one SQP iteration per sample. The resulting controller is no longer the exact MPC law but a tracking approximation. In experiments on automotive and aerospace systems, the closed-loop performance gap is small while the worst-case timing is bounded.
The acados toolbox is the modern reference implementation.
For linear dynamics with a quadratic cost and linear constraints, the resulting program is a quadratic program. Active-set methods (qpOASES) excel under warm starting, interior-point methods (HPIPM) excel on cold starts. Both reach kilohertz rates on modest hardware. This is why linear MPC dominates production deployments even when the plant is mildly nonlinear: the speedup of staying in QP land is often worth the modelling compromise.
What happens when the model is wrong.
A controller is robust if it satisfies its constraints for every realisation of an uncertain parameter or disturbance in a specified set. A stochastic controller goes further: it controls the probability of violating each constraint to a chosen level.
Nominal MPC plans against a single forecast. Real systems disagree with the forecast in ways that are sometimes bounded (parameter mismatch) and sometimes random (process noise). Three families of methods exist for handling this gap; the right choice depends on what the uncertainty looks like.
Min-max MPC
Plan for the worst case over the uncertainty set. Strong guarantees, often conservative. Solver is a saddle-point problem.
Multi-stage MPC
Build a scenario tree of plausible futures[6]. Optimise over the tree with non-anticipativity constraints. The default in do-mpc[8].
Tube MPC
Decompose the trajectory into a nominal part and an error tube. Tighten constraints by the tube; control the nominal part only.
Chance constraints
Require P(g ≤ 0) ≥ 1 − α. Under Gaussian state, this is constraint tightening by k·σ.
Scenario tree, intuitively.
At each decision point, several discrete realisations of the uncertain parameter p are considered. The tree branches into all of them. The optimiser chooses one control trajectory per branch, with the constraint that branches sharing the same parent node must share the same control input (information cannot be used before it arrives).
Surrogate quality and closed-loop spread.
Robustness is not only a matter of the controller. It also depends on the surrogate. The histogram below shows the closed-loop cost over 100 runs per model on the 4-tank rig. Distributions concentrated to the left and narrow indicate a surrogate-controller pair that is reliably good. Wide tails are the warning sign that the closed loop has worst-case behaviour the mean does not show.
Index every node of the scenario tree by (k, j). Let p(j) denote the parent and r(j) the realisation of p at node j. The robust NLP is:
The scenario weights ωj can be set to equal probability or to a chosen worst-case bias. Branching is usually limited to Nrobust ≤ N to keep the tree size manageable; after the robust horizon, the uncertainty is held fixed.
Decompose the true state as xk = zk + ek, with z the nominal state and e the error driven by the bounded disturbance. A local feedback gain K drives e to a robustly positively invariant set ℰ. The MPC then controls only z through a nominal model, with constraints tightened by ℰ (Pontryagin difference).
Tube MPC gives clean theoretical guarantees and modest online cost (a nominal NLP plus a precomputed local gain). Computing ℰ exactly is hard in high dimensions; approximations (homothetic, elastic, parameterised tubes) are an active area.
Require P(aTx ≤ b) ≥ 1 − α. If x is Gaussian with mean μ and covariance Σ, the constraint becomes a deterministic, tightened linear inequality:
Φ-1 is the standard normal inverse CDF; α = 0.05 gives Φ-1(0.95) ≈ 1.645. Larger σ means a larger tightening, exactly the behaviour we want when the GP variance is high.
For the 4-tank rig with a well-identified linear surrogate, nominal MPC already gives tight closed-loop tracking. Robust methods become essential when (a) the constraints are operationally tight (a tank that must not overflow, a vehicle that must not leave a lane), (b) the disturbance has a known but persistent magnitude (a wind gust, an inlet temperature drift), or (c) the learned model is honest about its uncertainty and the controller should respect that honesty rather than overrule it.
What to read, what to try.
The classical theory of MPC is settled. The interesting research directions are at the seams with machine learning: learned models, learned solvers, learned policies that imitate the optimiser. Each entry below is a self-contained rabbit hole.
Learning-based MPC
Replace or augment the dynamics model with a learned component (GP, NN, Koopman). Open questions: safe exploration, statistical guarantees, distributional shift between training and deployment. See Hewing et al. (2020)[5] for the survey.
MPC ↔ reinforcement learning
Either direction works. MPC as a teacher: generate demonstrations, distill via behaviour cloning into a fast policy network (approximate MPC, see acados-amg, do-mpc.approximateMPC). MPC as a stabilising prior inside RL: actor-critic with a built-in safety filter.
Faster solvers
Code-generated NMPC (acados, FORCES Pro) reaches kilohertz on embedded targets. Differentiable optimisation (cvxpylayers, theseus) makes the solver a layer in a larger gradient pipeline. ProxQP and OSQP for linear MPC at microsecond latency.
Economic and hierarchical MPC
Drop the reference and optimise a performance index directly. Cascade slow economic decisions and fast tracking layers on the same plant. Active in process industries, autonomous racing, building energy management, and grid balancing.
Selected references
- [1] Rawlings, J. B., Mayne, D. Q., Diehl, M. Model Predictive Control: Theory, Computation, and Design. 2nd ed., Nob Hill, 2020. The standard graduate textbook.
- [2] Borrelli, F., Bemporad, A., Morari, M. Predictive Control for Linear and Hybrid Systems. Cambridge, 2017. Linear MPC done rigorously, plus hybrid extensions.
- [3] Johansson, K. H. The quadruple-tank process: a multivariable laboratory process with an adjustable zero. IEEE TCST, 2000. The original 4-tank rig used here.
- [4] Diehl, M., Bock, H. G., Schlöder, J. P. A real-time iteration scheme for nonlinear optimization in optimal feedback control. SIAM J. Control Optim., 2002. The one-Newton-step trick.
- [5] Hewing, L., Wabersich, K. P., Menner, M., Zeilinger, M. N. Learning-based model predictive control: toward safe learning in control. Annu. Rev. Control Robot. Auton. Syst., 2020. The reference survey.
- [6] Lucia, S., Finkler, T., Engell, S. Multi-stage nonlinear model predictive control applied to a semi-batch polymerization reactor under uncertainty. J. Process Control, 2013. The multi-stage approach in detail.
- [7] Andersson, J. A. E., Gillis, J., Horn, G., Rawlings, J. B., Diehl, M. CasADi: a software framework for nonlinear optimization and optimal control. Math. Program. Comput., 2019. The AD and modelling layer underneath most modern MPC stacks.
- [8] Fiedler, F., Karg, B., Lüken, L., Brandner, D., Heinlein, M., Brabender, F., Lucia, S. do-mpc: towards FAIR nonlinear and robust model predictive control. Control Eng. Pract., 2023. The toolbox used in §07's code snippet.