Issue 01Data discipline

Eve-Genesis (Clinical Edition) — a summary of the methodology.

A founder-voice summary of the synthetic clinical reasoning corpus that conditions Phi-4 to reason the way clinicians actually reason. Pointer to the procurement-grade whitepaper for the full detail.

By Eve-Healthcare·May 7, 2026·8 min read

This is a summary essay. The full procurement-grade methodology lives in the Eve-Genesis (Clinical Edition) whitepaper. This essay names the commitments at the level a CMIO or CISO will actually be asked to sign for, and points to the whitepaper for the rest.

What Eve-Genesis (Clinical Edition) is

Eve-Genesis (Clinical Edition) is the proprietary synthetic clinical reasoning corpus that conditions the F5/reasoner. It contains between 100,000 and 250,000 structured reasoning traces, expanded on a quarterly cadence. Each trace is a complete reasoning path — not just an answer — anchored to canonical clinical guidelines and standard clinical taxonomies.

The trace is the unit of training. The conclusion is an artifact of the trace. That choice is what lets the reasoner produce structurally correct reasoning at decision time rather than fluent-sounding pattern matches.

The shape of the corpus

Nine domains span the corpus, weighted by how much of clinical practice they represent. Differential diagnosis is the largest share. Radiology reasoning, treatment planning, lab interpretation, pharmacology, OM regulatory reasoning, documentation, risk stratification, and ethics fill the rest. The relative proportions reflect the workloads ChironAI sees in production, not an abstract academic balance.

Twelve canonical clinical guidelines anchor the corpus — USPSTF, NCCN, ACOG, AAFP, AHA/ACC, IDSA, ADA, GOLD, KDIGO, AASLD, MTUS, ACOEM. Six standard clinical taxonomies validate it: ICD-10, SNOMED CT, LOINC, RxNorm, CPT, HCPCS. Every trace cites the guideline that governs its workflow. Every coded element resolves to a canonical taxonomy entry.

What every trace records

Each trace records: the clinical question being reasoned about; synthetic context (demographics, comorbidities, medications, presentation); numbered structured reasoning steps with claim, evidence, and inferential move per step; guideline anchors; differentials with prior probabilities where appropriate; calibrated confidence; named red-flag findings; and the must-review-before-final gate that closes every trace.

Where guidelines disagree on a clinical question, the trace reflects the disagreement rather than collapsing it into a false consensus. Clinicians reason under guideline disagreement; the corpus does the same.

Why this is synthetic, not de-identified

The corpus is generated, not gathered. The generator takes published clinical guidelines, peer-reviewed literature, structured reasoning templates, and taxonomy ontologies as inputs. It synthesises reasoning traces by composing guideline-anchored decision trees with synthetic demographic and contextual variation. Validation runs against guideline citations and taxonomy correctness before any trace enters the training corpus.

This is a different category of data from a de-identified extract. There is no patient data being scrubbed; there is no patient data in the loop at all. The architectural no-PHI-in-training claim follows from the construction. It is enforceable by inspection of the dataset.

How it composes with the F5/reasoner

Eve-Genesis (Clinical Edition) trains Phi-4 using LoRA — parameter-efficient adapters that add small layers to the base model rather than retraining its full weights. Phi-4 retains its general capabilities; the adapters add clinical-reasoning depth. A core clinical-reasoning adapter handles the bulk of workloads. Specialty adapters — radiology framework awareness, OM regulatory reasoning — layer on when the Phi-3 classifier routes to their domain.

Adapters are re-fitted on the expanded corpus quarterly. Base Phi-4 weights stay stable. Every shipped adapter carries a corresponding training-corpus snapshot; the training lineage is auditable.

What this is not

Eve-Genesis (Clinical Edition) is not a real-patient-data corpus stripped of identifiers. It is not de-identified PHI repackaged for training. It is not a benchmark dataset. It is not a public corpus.

It is proprietary, in-house, synthetic, and continuously validated against published clinical guidelines. The architecturally-enforced no-customer-data-in-training claim is enforceable by inspection of the dataset itself.

Where to read more

For the procurement-grade detail — corpus structure, generation methodology, LoRA adapter strategy, continuous expansion cadence, and the explicit list of what the corpus is and is not — read the Eve-Genesis (Clinical Edition) whitepaper. For the architecture commitments that follow at the trust layer, see the Eve-Genesis page. For the runtime that consumes the corpus, see the F5 architecture whitepaper.