In notebooks 02 and 03, we used ABC-SMC and Bayesian Synthetic Likelihood to infer the parameters of the Basener-Sanford mutation-selection model. Both methods work by running new simulations during inference — every time they evaluate a candidate parameter combination, they must simulate an entire population forward in time. ABC-SMC required hundreds of thousands of simulations; BSL required thousands per likelihood evaluation. Worse, this computational cost must be paid again from scratch for every new dataset.
This notebook takes a fundamentally different approach: Neural Posterior Estimation (NPE) and Flow-Matching Posterior Estimation (FMPE). These methods train a neural network once on many simulated datasets, then use the trained network to produce posterior distributions instantly for any new observation. This is called amortized inference — the heavy cost is paid once during training, and every subsequent inference is nearly free.
ABC-SMC and BSL are like identifying an unknown species by catching specimens one at a time, comparing each to a field guide, and gradually narrowing down the possibilities. Every new specimen requires the same laborious process.
NPE and FMPE are like building an AI identification app by first showing it thousands of labeled photographs. After that one-time training effort, identification of any new specimen is instant.
The training process has three steps:
Why does instant inference matter for studying mutation-selection dynamics? The extended Fisher's Fundamental Theorem tells us that a population's fate depends on whether Var(m) > μ|Eg[s]|b̅ — whether selection can overcome mutational load. To map this critical boundary across the full parameter space (notebook 06), we need to evaluate posteriors at thousands of different parameter combinations.
This also enables "what-if" analyses: what if we observed a steeper fitness decline? A different population size? A longer time series? Each scenario gets an instant answer.
NPE uses a masked autoregressive flow (MAF) — a type of normalizing flow neural network. A normalizing flow learns an invertible transformation that warps a simple probability distribution (like a multidimensional bell curve) into the complex posterior distribution over model parameters. The "masked autoregressive" architecture processes parameters in sequence, each conditioned on the previous ones, which makes the transformation efficiently invertible.
After training on 10,000 simulated populations, the MAF has learned what fitness dynamics look like across the full range of plausible mutation rates, DFE shapes, and beneficial mutation fractions. Given our observed data, it can instantly tell us which parameter combinations are most likely.
FMPE uses a different training approach called flow matching, based on optimal transport theory. Where NPE's normalizing flow learns a fixed sequence of transformations (like folding a flat sheet into origami through prescribed steps), FMPE learns a continuous flow — a smooth path that gradually transports probability mass from a simple distribution to the posterior.
Think of optimal transport as finding the most efficient way to rearrange sand from one pile configuration into another. FMPE finds the most efficient way to transform a simple bell curve into the complex posterior shape, which often leads to more stable training and smoother posterior approximations.
Practical advantages over NPE:
How do these neural methods compare with the approaches in notebooks 02 and 03?
| Property | ABC-SMC (nb 02) | BSL (nb 03) | NPE / FMPE (this nb) |
|---|---|---|---|
| Simulations during inference | ~100,000+ | ~1,000 per evaluation | 0 (pre-trained) |
| Training cost | None | None | 10,000 sims (one-time) |
| Time per new dataset | Hours | Hours | Milliseconds |
| Scalability | Poor (>5 params) | Moderate | Good (especially FMPE) |
| Assumptions | Minimal | Gaussian summary stats | Network capacity sufficient |
The formal comparison of posterior distributions across all methods is in notebook 05. The amortized methods here enable the boundary analysis in notebook 06, which maps the selection/meltdown boundary across parameter space — a computation that would be impractical with ABC-SMC or BSL.