04c: Neural Ratio Estimation (NRE)

What is NRE? Neural Ratio Estimation is the fifth inference method applied to the Basener-Sanford mutation-selection model. While the other methods (ABC-SMC, BSL, NPE, FMPE) all answer the question "what parameters fit the data?", NRE can also answer a deeper question: "which model of mutation-selection best fits the data?"

How it works, for biologists: Imagine you observe a population's fitness trajectory declining over 200 generations. You want to know the mutation rate, the distribution of fitness effects (DFE), and the fraction of beneficial mutations. NRE approaches this by learning what makes a particular fitness trajectory more or less probable under different parameter values. Technically, it learns the likelihood-to-evidence ratio r(x,θ) = p(x|θ)/p(x) — a measure of how much more likely the observed data is under specific parameters compared to average. This is subtly different from NPE/FMPE, which directly learn the posterior (which parameters are most likely). The distinction matters because the likelihood ratio can be reused across different models, enabling model comparison.

After training, NRE draws posterior samples using MCMC (Markov Chain Monte Carlo) — a guided random walk through the five-dimensional parameter space. This is conceptually similar to BSL (notebook 03) but with a neural network replacing the expensive simulation-based likelihood, making sampling orders of magnitude faster.
Why Model Comparison Matters for Basener-Sanford: The original model assumes mutations contribute independently to fitness (additive effects). But in real biology, mutations interact — this is epistasis. Synergistic epistasis (where harmful mutations are worse in combination) could accelerate meltdown, while antagonistic epistasis could slow it. Kondrashov (1995) and Butcher (1995) showed that epistasis does not halt Muller's ratchet, but the quantitative dynamics change significantly.

NRE enables Bayesian model comparison: given two competing models (e.g., additive vs. epistatic fitness effects), the learned likelihood ratios can be combined to compute Bayes factors — a principled measure of which model the data favor — without retraining. This is a capability that NPE, FMPE, ABC-SMC, and BSL lack. It opens the door to testing whether the assumptions built into the extended FTNS equation (d(m̄)/dt ≈ Var(m) + μ·Eg[s]·b̄) actually hold for real populations.

1. NRE Posterior

NRE posterior distributions
Figure 1: Marginal posteriors from NRE with MCMC sampling. Red dashed lines show the true parameter values used to generate the synthetic fitness trajectory. The NRE classifier was trained on 10,000 simulations drawn from the prior, then posterior samples were drawn via MCMC (4 chains, 500 warmup steps each).

Biological interpretation: Each panel shows the range of plausible values for one of the five mutation-selection parameters, given the observed fitness trajectory. Narrow, peaked distributions indicate parameters that the fitness data strongly constrains. For example, if the mutation rate (mu) posterior is narrow and centered on the true value, this means the shape of the fitness trajectory alone is sufficient to estimate the mutation rate — a powerful result for populations where direct mutation rate measurements are unavailable.

Technical note: Because NRE uses MCMC for sampling, the posterior comes with standard convergence diagnostics (R-hat, effective sample size) that verify the samples are trustworthy. This is an advantage over NPE/FMPE, which sample directly from a learned flow and lack such diagnostics.

2. NRE vs NPE Comparison

NRE vs NPE posterior comparison
Figure 2: Head-to-head comparison of NRE and NPE posteriors. Both methods were trained on 10,000 simulations from the same prior distribution. NRE (blue) learns the likelihood ratio and samples via MCMC; NPE (orange) learns the posterior directly and samples from the learned flow. Red dashed lines mark the true parameter values.

What agreement means biologically: Where the blue and orange histograms overlap substantially, the parameter estimate is robust — it does not depend on which neural inference strategy was used. This gives us confidence that if we applied these methods to real fitness data from an evolving population, the inferred mutation rate, DFE parameters, and beneficial fraction would reflect genuine biological signal, not computational artifacts.

What disagreement means biologically: Where the histograms diverge (different peaks or different widths), the parameter is sensitive to the inference methodology. This signals that the fitness trajectory alone may not contain enough information to pin down that parameter. For the Basener-Sanford model, such parameters would need additional data — direct mutation rate measurements, fitness assays of individual mutants, or longer time series — before we could reliably determine whether the population sits in the selection-dominated or meltdown regime.

Method differences: