Part 1: The Landscape

Where AI Is Delivering Value — and Where It Isn't

AI's track record across biology and health follows a clear pattern: it excels where data is standardised, high-volume, and governed by stable rules, and it struggles where human behaviour, institutional complexity, and context-dependent judgement are involved. Rather than a binary works / doesn't work picture, the evidence reveals a spectrum — from molecular biology, where AI has already produced Nobel Prize-winning results, to public health policy, where the most ambitious AI projects have collapsed. In this report, we analyse AI application in three domains of biology (molecular, diagnostic, populational) and take a special focus on AI progression in public health.

Section 01

Molecular Biology: AI's Strongest Domain

Key success: AlphaFold is the clearest AI success story in all of biology — predicting protein structures in minutes that previously took months to years, earning its creators the 2024 Nobel Prize in Chemistry.

AlphaFold — DeepMind's protein structure prediction system can determine a protein's 3D shape in minutes — a task that previously took experimental researchers months to years and cost upwards of $100,000 per structure. Its database now contains over 200 million predicted structures, has been used by more than 3 million researchers across 190 countries. AlphaFold has become a standard research tool that fundamentally changes how structural biology operates. But one limitation is that it only predicts static protein structures, not dynamic protein-drug interaction, hence its outputs aren't directly translatable to pharmaceutical industry, for example. When MIT researchers tested AlphaFold structures for drug-protein interaction predictions, results performed little better than chance. Hence now, newer tools are addressing this gap: Boltz-2, developed by MIT and Recursion in 2025, jointly models both structure and binding affinity, while Isomorphic Labs' Drug Design Engine (announced February 2026) more than doubles AlphaFold 3's accuracy on protein-ligand prediction. The field is progressing rapidly, but the gap between predicting structure and predicting function remains significant.

Drug discovery acceleration: Insilico Medicine took its IPF drug from target identification to Phase I clinical trials in 30 months (vs 6-8 years traditionally) at a preclinical cost of approximately $2.6 million. Exscientia cut compounds needed from 2,500-5,000 down to 150-250.

In drug discovery, AI has dramatically compressed early-stage timelines. As of early 2026, over 173 AI-discovered drug programs are in clinical development, with 15-20 expected to enter pivotal Phase III trials during the year.

Reality check: No AI-discovered drug has yet received FDA approval. Around 90% of drugs that pass preclinical testing still fail in clinical trials. 2026-2027 will be the real test.

This is not an indictment of AI — drug development inherently takes 10-15 years, and the first AI-designed drug only entered clinical trials in 2020. The real test comes now: 2026-2027 will reveal whether AI-discovered drugs have better Phase III success rates than traditionally discovered ones, or whether AI simply gets drugs to the point of failure faster. Currently, the high failure rate at clinical trials is largely because cellular models can't accurately predict organ-level and whole-body responses. This cell-to-organ gap is a biology problem, not an AI problem — and until it is solved through advances like organ-on-a-chip technology, AI drug discovery will continue to hit this wall.

Section 02

Clinical Diagnostics: Volume Without Validation

By the numbers, clinical AI looks like a triumph. The FDA's list of AI-enabled medical devices surpassed 1,450 by the end of 2025, with 295 new clearances that year alone — a record. Radiology dominates, accounting for 76% of all approvals, followed by cardiovascular (9.8%) and neurology (3.7%). But these headline numbers mask a serious quality problem.

MetricValue
FDA AI-enabled devices (end 2025)1,450+
New clearances in 2025295 (record)
Radiology share of approvals76%
Cleared via 510(k) pathway96.4%
Devices with clinical performance study~50%
Enterprise AI pilots failing ROI~95%
Series B healthcare AI funding decline (Q4 2021→Q4 2024)-84%

96.4% of AI medical devices were cleared through the FDA's 510(k) pathway, which requires only a demonstration of "substantial equivalence" to an existing device — not independent clinical trials proving the device improves patient outcomes. Only about half of approved devices had any clinical performance study reported. The 510(k) pathway was initially designed for static physical devices like surgical instruments, as it would be unnecessary to repeat clinical experiments for similar machines; applying it to dynamic, learning AI software creates a regulatory gap that the FDA is only beginning to address.

The real-world deployment picture is even more sobering. Most hospitals lack the GPU infrastructure to run AI tools; common radiology software cannot display AI results; and EHR integration requires expensive custom development. Roughly 95% of enterprise healthcare AI pilots fail to demonstrate measurable return on investment, and Series B healthcare AI funding declined 84% from Q4 2021 to Q4 2024. The most prominent cautionary tale is IBM Watson Health, which was marketed as a revolutionary oncology decision-support tool, consumed over $5 billion in acquisitions, and was eventually sold off for approximately $1 billion after its clinical recommendations proved wildly inconsistent.

Where clinical AI succeeds: AI-assisted breast cancer screening has the strongest real-world evidence base — 17.6% increase in detection (PRAIM study, 463,094 women) and 21.6% increase (ASSURE study, 579,583 women), with 63.6% reduction in radiologist workload.

Yet within this landscape, there are genuine successes on technical level. The PRAIM study in Germany (463,094 women, 119 radiologists, 12 sites) found that AI-supported mammography screening increased cancer detection by 17.6% without increasing the recall rate. The ASSURE study in the US (579,583 women, 109 sites) found a 21.6% increase in detection that was equitable across racial subgroups. A separate trial showed AI reduced radiologist workload by 63.6% while improving detection by 15.2%. Other proven applications include IDx-DR for autonomous diabetic retinopathy screening and AI-powered ECG analysis for atrial fibrillation. To conclude, clinical AI succeeds at well-defined imaging tasks with standardised data and clear endpoints.

Section 03

Population Health: Real Potential

At the population level, the evidence base is qualitatively different from the molecular and clinical zones — there is no simple measure of success. To be more specific, there is no regulatory framework requiring AI public health tools to demonstrate performance, no equivalent of FDA clearance, and far fewer controlled trials.

BlueDot: Detected unusual pneumonia cases in Wuhan on December 31, 2019 — nine days before the WHO issued its first warning — and correctly predicted 12 of 20 cities subsequently affected using airline ticketing data.

BlueDot had previously predicted Zika's spread to Florida six months in advance. The CDC has deployed AI across 54 operational use cases, including its National Syndromic Surveillance Program (ML-powered anomaly detection in emergency department data) and FluSight (AI-enhanced influenza forecasting). These are real, functioning systems.

Cautionary tale: Google Flu Trends overestimated flu incidence by up to 140% — the model could not distinguish people who were actually sick from those searching because of media coverage.

But the field's most instructive case is a failure. Google Flu Trends, launched in 2008, attempted to predict flu outbreaks from search query data. Initially celebrated as a breakthrough, it failed to predict the 2009 pandemic. Human behaviour contaminated the data signal in ways that no algorithm could correct — a fundamentally different challenge from reading a mammogram or predicting a protein structure.

The evidence gap in population health reflects the nature of the domain, not the under-development of AI tools. Public health involves human behaviour, political decisions, institutional capacity, and cultural factors that are inherently harder to predict and optimise than molecular structures or medical images. However, public health is not monolithic. Within it, data-heavy tasks like surveillance and outbreak detection are genuinely AI-amenable, while policy formulation and behavioural interventions are not. AI's most productive role at the population level is as a data assistant, flagging signals and processing information at scale, rather than as a decision-maker.

AI and Epidemiology: A Pipeline Perspective

I view the operation of public health as a pipeline from upstream to downstream with data collection, signal detection, epidemiological analysis, causal inference, interpretation, policy formulation, implementation, communication, and monitoring. To assess where AI offers the greatest growth potential, I evaluated each stage against five criteria drawn from a startup investment lens: whether clean data is available, whether current AI models can handle the task, whether outcomes are clearly measurable, the room for further development, and the efficiency gain over existing methods.

The U-shaped curve: AI potential is highest at the data-processing bookends (surveillance, logistics, monitoring) and lowest in the human-judgement middle (interpretation, risk assessment, policy formulation).

The data-processing bookends (surveillance, logistics, and monitoring) score highest across all criteria. These are tasks with standardised data, mature AI models, and quantifiable outcomes: the CDC processes 8,000 news articles and reports daily through natural language processing; machine learning models outperform conventional approaches in vaccine distribution simulations; and automated anomaly detection enables near-real-time outbreak monitoring. At the other end, implementation and logistics return to high AI potential because resource allocation and route optimisation are fundamentally data-driven problems.

The human-judgement middle (interpretation, risk assessment, and policy formulation) scores lowest. These stages involve value trade-offs, political negotiation, and stakeholder dynamics that AI cannot navigate. No amount of better algorithms will make policy decisions less political. Causal inference sits in an interesting middle ground: AI can reduce statistical bias in high-dimensional settings through methods like Targeted Maximum Likelihood Estimation, but the causal inference roadmap cannot yet be fully automated — it still requires an epidemiologist's judgement about confounding, study design, and what the numbers actually mean.

The practical implication is that investing in AI for public health means investing differently at each stage. The bookends need better tools and infrastructure. The middle needs better integration — connecting AI outputs to human decision-makers and building institutional capacity to act on AI-generated signals.

Why Epidemiology + AI Literacy Is the Combination That Matters

The combination is more than additive. Someone who understands both the public health pipeline and how to direct AI tools can operate across the full workflow — using AI to accelerate the data-heavy stages while applying domain expertise where human judgement is irreplaceable.

A pure AI specialist can build technically impressive tools for health, but without domain knowledge they are driving blind — unable to spot when an algorithm produces clinically nonsensical outputs, vulnerable to following AI hallucinations down obviously wrong tracks, and prone to solving the wrong problems entirely. A pure epidemiologist understands the domain deeply but operates within human-scale constraints, unable to leverage AI's capacity to process data, synthesise literature, or run analyses at a speed and scope that manual methods cannot match. In a field where AI's role fluctuates from essential to irrelevant depending on where you are in the pipeline, the person who knows when to trust the machine and when to overrule it is the one who delivers.


References

  1. Jumper, J. et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583-589.
  2. Drug Target Review (Feb 2026). AI in drug discovery: 2025 in review.
  3. Insilico Medicine / Ren, F. et al. (2025). Phase IIa results for rentosertib. Nature Medicine.
  4. Eisemann, N. et al. (2025). Nationwide real-world implementation of AI for cancer detection in population-based mammography screening. Nature Medicine.
  5. Lotter, W. et al. (2025). How AI is used in FDA-authorized medical devices. npj Digital Medicine.
  6. MIT Technology Review (2020). AI could help with the next pandemic — but not with this one.
  7. Lazer, D. et al. (2014). The parable of Google Flu: traps in big data analysis. Science, 343, 1203-1205.
  8. Petersen, M. et al. (2025). Integrating AI into causal research in epidemiology. Current Epidemiology Reports, 12:6.