Part 2: AI-Augmented Capability Demo

The simulator introduced above models a specific case: H. pylori, a stomach bacterium carried by roughly half the world's population, responsible for around three quarters of all gastric cancer. It is one of the clearest cases in medicine of an infection we know how to treat causing a cancer we struggle to cure — and therefore an ideal test case for asking where in the disease timeline intervention actually pays off.

Try the simulator — pick a population, adjust three intervention levers, and watch how cancer cases, deaths, and DALYs change.

The demo is inspired by the Burden of Communicable Diseases in Europe toolkit, developed by the ECDC to calculate disease burden across 117 infectious diseases. Their tool answers "how bad is it?" Mine answers a different question: "how much of this could we prevent, and at what point in the disease is it worth intervening?" That shift, from measurement to counterfactual, is where the demo earns its keep.

Three Inspection Points

My contribution to this project was building structures that made the AI's work inspectable. Every number in the model has a source tag and a confidence level. Every transition in the outcome tree has a named reference behind it. The parameter tables separate what was sourced from the literature, what was derived by calibration, what was my own judgement call, and what was borrowed from the other geographic context. Three moments in particular stand out.

The two-cohort split. The European toolkit that inspired this demo uses age-stratified parameters, and for three planning sessions I carried the same structure forward, with a childhood cohort and an adult cohort. Before we started coding I worked through what the two cohorts would actually produce. Without age-dependent transition rates on either side, they progress identically and yield the same DALYs. The split was decorative. I asked Claude to confirm, then dropped it.

The screening artefact. The Korean figure for intestinal metaplasia progressing to cancer is about half the Dutch figure. Taken at face value, this would suggest East Asians progress more slowly at that stage, which contradicts everything else in the literature: East Asia has higher H. pylori prevalence, faster gastritis-to-atrophy progression, faster atrophy-to-metaplasia progression, and several times the lifetime gastric cancer risk. The real explanation is that Korea's national endoscopic screening catches cancers early, so they never enter the progression count. The lower rate reflects more aggressive detection, not slower biology. Building it in as a biological fact would have been wrong.

The tree and the code drifting apart. V1 was built by vibe coding: I accepted what Claude produced, ran the demo, and checked the outputs for scientific plausibility. I never checked whether the code matched the outcome tree diagram. The bug only surfaced in v2, when Claude flagged that the post-IM eradication effect was a judgement call I needed to make. Deciding which transitions it should apply to forced me into the code, and that is when I saw transitions the diagram didn't have. Surface review would not have caught this. The judgement call was the vehicle through which the structural bug surfaced.