05: Inference Comparison — Mutation-Selection Model

Why Compare Methods? In science, confidence comes from convergence — when independent approaches reach the same answer, we trust that answer far more than any single method could justify. This notebook loads saved results from notebooks 02–04c and compares all inference methods head-to-head. Each method makes different computational assumptions: ABC-SMC uses distance thresholds, BSL assumes Gaussian summary statistics, NPE learns a normalizing flow, and NRE learns a likelihood ratio classifier. If these fundamentally different approaches agree on the mutation-selection parameters, we have strong evidence that those parameters are genuinely constrained by the fitness trajectory data — not artifacts of any particular computational choice.

This matters biologically because the five inferred parameters — mutation rate (μ), DFE shape (γ_shape), DFE scale (γ_scale), fraction beneficial (p_beneficial), and environmental noise (σ_env) — jointly determine whether a population adapts or melts down under the extended Fisher's Theorem: d(m̄)/dt ≈ Var(m) + μ·E_g[s]·b̄. Getting these parameters right is essential for predicting a population's evolutionary fate.

1. Head-to-Head Posterior Comparison

Figure 1: Overlaid marginal posteriors from all inference methods. Each panel shows one of the five mutation-selection parameters, with histograms from ABC-SMC (blue), BSL (orange), NPE (green), and SNPE (purple). Red dashed lines mark the true values used to generate the synthetic fitness trajectory.

How to read this figure:

Accuracy — Methods whose distributions are centered on (or near) the red dashed line are accurately recovering the true parameter value. This tells us the method is not systematically biased.
Precision — Narrower distributions indicate less uncertainty. A tight distribution means the fitness trajectory data is highly informative about that parameter; a wide one means the data alone cannot pin it down.
Agreement across methods — When all four colored histograms overlap substantially, the parameter estimate is robust. This is the strongest form of evidence: fundamentally different computational approaches converge on the same answer.
Disagreement — When histograms diverge, the parameter is sensitive to inference assumptions. Biologically, this signals the need for additional data types (direct mutation rate measurements, fitness assays, longer time series) to reliably estimate that parameter in a real population.

Biological interpretation for each parameter:

μ (mutation rate) — Controls the total mutational input per generation. Higher μ means more mutations entering the population each generation, increasing the mutational load term μ·|E_g[s]|·b̄ in the extended FTNS equation.
γ_shape (DFE shape) — Controls whether most mutations have similar or highly variable fitness effects. Low values produce an L-shaped DFE where most mutations are nearly neutral but rare ones are severe.
γ_scale (DFE scale) — Controls the average magnitude of mutational fitness effects. Larger values mean mutations are more damaging on average, pushing the population toward the meltdown regime.
p_beneficial (fraction beneficial) — The proportion of mutations that increase fitness. This is typically very small in nature; even modest changes can shift the balance between adaptation and decline.
σ_env (environmental noise) — Fitness variation not caused by genetics. High environmental noise makes it harder to detect the genetic signal, widening the posteriors for all other parameters.

Method	Strengths	Best biological use case
ABC-SMC	Well-established, PyMC ecosystem	Initial parameter exploration with limited compute
BSL	Proper MCMC diagnostics, no epsilon	Publication-quality estimates with uncertainty
NPE (amortized)	Instant posteriors after training	Parameter space mapping across many conditions
SNPE (sequential)	Tighter posteriors, fewer sims	Precise estimation for a single population
NRE	Enables model comparison via Bayes factors	Testing additive vs. epistatic fitness models

2. Parameter Space: Selection vs. Meltdown Boundary

Figure 2: The mutation-selection phase boundary — the central scientific result. A 20×20 grid scans mutation rate (μ) vs. DFE scale (γ_scale), running 5 replicate simulations per grid point (2,000 total simulations). This map reveals the fundamental structure of the Basener-Sanford mutation-selection landscape.

Left panel — Net fitness change over 200 generations:

Blue region (positive fitness change): Selection dominates. Here, the fitness variance term Var(m) in the extended FTNS equation is large enough to drive adaptation despite the mutational load. This is the regime where Fisher's original theorem holds — populations adapt, and mean fitness increases over time. Blue regions correspond to low mutation rates and/or small mutational effect sizes, where the genetic load remains manageable.
Red region (negative fitness change): Mutational meltdown. The mutational load term μ·|E_g[s]|·b̄ overwhelms selection, and mean fitness declines generation after generation. This is the regime that Basener & Sanford's (2018) extension of FTNS predicts when realistic, predominantly deleterious DFEs are used. Red regions correspond to high mutation rates and/or large mutational effect sizes, where selection cannot purge deleterious mutations fast enough.
Black contour (zero fitness change line): The phase transition boundary — the empirical, computational analog of the critical condition Var(m) = μ·|E_g[s]|·b̄ from the extended FTNS. This contour separates populations that can sustain themselves from those heading toward extinction. Its shape encodes how the balance between selection and mutation depends on the specific combination of mutation rate and effect size — the central question of the Basener-Sanford framework.
Black star: The true parameters (μ=0.1, γ_scale=0.003) used for the inference test. Its position on the map reveals whether our synthetic population sits in the selection-dominated or meltdown-dominated regime, and how close it is to the critical boundary. When applied to real data, the inferred posterior would place an uncertainty cloud around this star, quantifying how confident we are about which side of the boundary the population occupies.

Right panel — Discrete regime classification:

Green: Selection dominates — fitness increases by more than 0.005 over 200 generations
Yellow: Approximate balance — fitness changes less than 0.005 in either direction (the population is near the phase boundary)
Red: Mutational meltdown — fitness decreases by more than 0.005 over 200 generations

The sharp transition from green to red confirms that the phase boundary is a genuine feature of the model, not a gradual continuum. Populations are either above or below the critical threshold, with a narrow transitional zone between.

Biological Significance: Tying It All Together

The parameter space map above is the culmination of this entire notebook series. It answers the question that Basener & Sanford (2018) raised with their extension of Fisher's Fundamental Theorem: under what conditions does natural selection maintain population fitness, and under what conditions does mutational meltdown occur?

The extended FTNS gives the theoretical answer: fitness increases when Var(m) > μ·|E_g[s]|·b̄ and decreases when the inequality is reversed. The parameter space map provides the computational answer, showing exactly where this transition occurs across realistic ranges of mutation rate and DFE scale.

Why Bayesian inference matters here: The theoretical condition involves population-level quantities (fitness variance, mean birth rate) that are difficult to measure directly. But the five parameters we infer — μ, γ_shape, γ_scale, p_beneficial, σ_env — determine these quantities. By inferring the parameters from an observed fitness trajectory, we can place a population on this map and determine probabilistically whether it sits in the selection-dominated or meltdown-dominated regime.

The role of each inference method:

ABC-SMC and BSL provide reliable, well-understood parameter estimates that serve as baselines.
NPE enables the rapid parameter space scanning that produced this map — generating posteriors for thousands of parameter combinations in seconds rather than hours.
NRE opens the door to the next question: does an additive model or an epistatic model better explain real population data? Model comparison via Bayes factors could reveal whether the phase boundary shifts when mutational interactions are accounted for.

Together, these methods transform the Basener-Sanford model from a theoretical framework into a quantitative tool for analyzing real populations — one that provides not just point estimates but full probability distributions over the parameters that determine evolutionary fate.

05: Inference Comparison & Parameter Space Mapping

1. Head-to-Head Posterior Comparison

2. Parameter Space: Selection vs. Meltdown Boundary