The Tyranny of the Mean: Population-Based Optimization in Healthcare and AI

Modern healthcare and artificial intelligence face a common challenge in how they handle individual variation. Both systems rely on population-level statistics to guide optimization, which can inadvertently push individuals toward averages that may not serve them well. More interesting still, both fields are independently discovering similar solutions—a shift from standardized targets to personalized approaches that preserve beneficial diversity.

Population Averages as Universal Targets

Healthcare’s Reference Ranges

Traditional medical practice establishes “normal” ranges by measuring population distributions. Blood pressure guidelines from the American Heart Association define 120/80 mmHg as optimal. The World Health Organization sets body mass index between 18.5 and 24.9 as the normal range. The American Diabetes Association considers fasting glucose optimal when it falls between 70 and 100 mg/dL. These ranges serve an essential function in identifying pathology, but their origin as population statistics rather than individual optima creates tension in clinical practice.

Elite endurance athletes routinely maintain resting heart rates between 40 and 50 beats per minute, well below the standard range of 60 to 100 bpm. This bradycardia reflects cardiac adaptation rather than dysfunction—their hearts pump more efficiently per beat, requiring fewer beats to maintain circulation. Treating such athletes to “normalize” their heart rates would be counterproductive, yet this scenario illustrates how population-derived ranges can mislead when applied universally.

The feedback mechanism compounds over time. When clinicians routinely intervene to move patients toward reference ranges, the population distribution narrows. Subsequent range calculations derive from this more homogeneous population, potentially tightening targets further. Natural variation that was once common becomes statistically rare, then clinically suspicious.

Language Models and Statistical Patterns

Large language models demonstrate a parallel phenomenon in their optimization behavior. These systems learn probability distributions over sequences of text, effectively encoding which expressions are most common for conveying particular meanings. When users request improvements to their writing, the model suggests revisions that shift the text toward higher-probability regions of this learned distribution—toward the statistical mode of how millions of other people have expressed similar ideas.

This process systematically replaces less common stylistic choices with more typical alternatives. Unusual metaphors get smoothed into familiar comparisons. Regional variations in vocabulary and grammar get normalized to a global standard. Deliberate syntactic choices that create specific rhetorical effects get “corrected” to conventional structures. The model isn’t making errors in this behavior—it’s doing exactly what training optimizes it to do: maximize the probability of generating text that resembles its training distribution.

Similar feedback dynamics appear here. Models train on diverse human writing and learn statistical patterns. People use these models to refine their prose, shifting it toward common patterns. That AI-influenced writing becomes training data for subsequent models. With each iteration, the style space that models learn contracts around increasingly dominant modes.

Precision Medicine’s Response

The healthcare industry has recognized that population averages make poor universal targets and developed precision medicine as an alternative framework. Rather than asking whether a patient’s metrics match population norms, precision medicine asks whether those metrics are optimal given that individual’s genetic makeup, microbiome composition, environmental context, and lifestyle factors.

Commercial genetic testing services like 23andMe and AncestryDNA have made personal genomic data accessible to millions of people. This genetic information reveals how individuals metabolize medications differently, process nutrients through distinct biochemical pathways, and carry polymorphisms that alter their baseline risk profiles. A cholesterol level that predicts cardiovascular risk in one genetic context may carry different implications in another.

Microbiome analysis adds another layer of personalization. Research published by Zeevi et al. in Cell (2015) demonstrated that individuals show dramatically different glycemic responses to identical foods based on their gut bacterial composition. Companies like Viome and DayTwo now offer commercial services that analyze personal microbiomes to generate nutrition recommendations tailored to individual metabolic responses rather than population averages.

Continuous monitoring technologies shift the focus from population comparison to individual trend analysis. Continuous glucose monitors from Dexcom and Abbott’s FreeStyle Libre track glucose dynamics throughout the day. Smartwatches monitor heart rate variability as an indicator of autonomic nervous system function. These devices establish personal baselines and detect deviations from an individual’s normal patterns rather than measuring deviation from population norms.

Applying Precision Concepts to Language Models

The techniques that enable precision medicine suggest analogous approaches for language models. Current systems could be modified to learn and preserve individual stylistic signatures while still improving clarity and correctness. The technical foundations already exist in various forms across the machine learning literature.

Fine-tuning methodology, now standard for adapting models to specific domains, could be applied at the individual level. A model fine-tuned on a person’s past writing would learn their characteristic sentence rhythms, vocabulary preferences, and stylistic patterns. Rather than suggesting edits that move text toward a global statistical mode, such a model would optimize toward patterns characteristic of that individual writer.

Research on style transfer, including work by Lample et al. (2019) on multiple-attribute text rewriting, has shown that writing style can be represented as vectors in latent space. Conditioning text generation on these style vectors enables controlled variation in output characteristics. A system that extracted style embeddings from an author’s corpus could use those embeddings to preserve stylistic consistency while making other improvements.

Constrained generation techniques allow models to optimize for multiple objectives simultaneously. Constraints could maintain statistical properties of an individual’s writing—their typical vocabulary distribution, sentence length patterns, or syntactic structures—while still optimizing for clarity within those boundaries. This approach parallels precision medicine’s goal of optimizing health outcomes within the constraints of an individual’s genetic and metabolic context.

Reinforcement learning from human feedback, as described by Ouyang et al. (2022), currently aggregates preferences across users to train generally applicable models. Implementing RLHF at the individual level would allow models to learn person-specific preferences about which edits preserve voice and which introduce unwanted homogenization. The system would learn not just what makes text “better” in general, but what makes this particular person’s writing more effective without losing its distinctive character.

Training objectives could explicitly reward stylistic diversity rather than purely minimizing loss against a training distribution. Instead of convergence toward a single mode, such objectives would encourage models to maintain facility with a broad range of stylistic choices. This mirrors precision medicine’s recognition that healthy human variation spans a range rather than clustering around a single optimum.

Implementation Challenges

Precision medicine didn’t emerge from purely technical innovation. It developed through sustained institutional commitment, including recognition that population-based approaches were failing certain patients, substantial investment in genomic infrastructure and data systems, regulatory frameworks for handling personal genetic data, and cultural shifts in how clinicians think about treatment targets. Building precision language systems faces analogous challenges beyond the purely technical.

Data requirements differ significantly from current practice. Personalized models need sufficient examples of an individual’s writing to learn their patterns, raising questions about privacy and data ownership. Training infrastructure would need to support many distinct model variants rather than a single universal system. Evaluation metrics would need to measure style preservation alongside traditional measures of fluency and correctness.

More fundamentally, building such systems demands a shift from treating diversity as noise to be averaged away toward treating it as signal to be preserved. This parallels the conceptual shift in medicine from viewing outliers as problems requiring correction toward understanding them as potentially healthy variations. The technical capabilities exist, but deploying them intentionally requires first recognizing that convergence toward statistical modes, while appearing optimal locally, may be problematic globally.

Both healthcare and AI have built optimization systems that push toward population averages. Healthcare recognized the limitations of this approach and developed precision medicine as an alternative. AI can learn from that trajectory, building systems that help individuals optimize for their own patterns rather than converging everyone toward a statistical mean.

References

  • American Heart Association. Blood pressure guidelines. https://www.heart.org
  • World Health Organization. BMI Classification. https://www.who.int
  • American Diabetes Association. Standards of Medical Care in Diabetes.
  • Zeevi, D., Korem, T., Zmora, N., et al. (2015). Personalized Nutrition by Prediction of Glycemic Responses. Cell, 163(5), 1079-1094. DOI: 10.1016/j.cell.2015.11.001
  • Lample, G., Conneau, A., Ranzato, M., Denoyer, L., & Jégou, H. (2019). Multiple-Attribute Text Rewriting. International Conference on Learning Representations.
  • Ouyang, L., Wu, J., Jiang, X., et al. (2022). Training language models to follow instructions with human feedback. arXiv:2203.02155

Leave a Comment

Your email address will not be published. Required fields are marked *