The clinical reasoner is not a chatbot.
A reasoner that is structurally constrained to reason — not a conversational agent borrowed for clinical work. Why the distinction is architectural, not branding.
I think the most consequential branding decision we made on Chiron was refusing to call the F5/reasoner a chatbot. The reasoner has a chat surface, technically. You can type into it. It will type back. Surface-wise it is indistinguishable from any other generative-AI clinical product on the market. And yet calling it a chatbot would be misleading in a way that matters at decision time, so we refuse the term. This is the essay where I explain why.
The shape of the cognitive work
Differential diagnosis is not a conversation. It is a structured abductive chain. Observations are gathered. Hypotheses are proposed. Red flags are scanned for. A ranked differential is produced with evidence-for and evidence-against per item. The clinician evaluates the chain and decides.
Conversation is the wrapper around that work, not the work itself. A clinician walking a student through a case will speak in sentences and the sentences will sound like conversation. The cognition the clinician is performing is not conversation. It is structural reasoning that the conversational surface communicates after the fact.
A platform that trains on the conversational surface learns to reproduce sentences that sound like clinical reasoning. A platform that trains on the structural form of the reasoning itself learns to produce the reasoning, and then expresses it. These are not the same target.
Why this distinction is architectural
If you train on conversation traces, you get a model that is fluent in clinical-sounding prose. The hard part of clinical work — the abductive move from observations to ranked hypotheses with discriminating features — is reconstructed by the model opportunistically, from whatever pattern in the trace happens to surface. Sometimes that works. Sometimes the model produces three sentences of fluent prose followed by a conclusion that is structurally wrong. The clinician sees the wrong conclusion, walks back, and stops trusting the system.
If you train on structured reasoning traces — observations, hypotheses, red flags, ranked differentials with evidence anchors — the model learns the abductive move at the substrate. The conversational surface is something the model expresses on top, not something it has to recover from below. The clinician sees output that holds up under cross-examination because the underlying reasoning was actually performed.
The branding cost
Calling it a reasoner instead of a chatbot is not just nicer marketing. It is a binding commitment to a more expensive engineering choice. The conversational chatbot path is well understood, well capitalised, and easy to ship. The reasoner path requires a reasoning-trace corpus, a reasoning-style adapter, and a refusal posture in the product when the reasoner is asked to operate as a conversational agent on a question it has no substrate for.
We chose the more expensive path because the cheaper one fails at the moment that matters. At 11:23 on a Tuesday, when a clinician is six minutes into a fifteen-minute encounter and the patient has palpitations, dizziness, and an irregular rhythm, the platform either produces a structurally correct ranked differential or it does not. There is no recovery from a fluent-sounding wrong answer. The reasoner has to be right because of how it is built, not because the clinician noticed when it was wrong.
What the surface looks like, then
The surface still looks like chat. We did not invent a new interaction modality. The clinician asks the reasoner about a case; the reasoner responds. What is different is what sits underneath the response. The response is the rendering of a structured reasoning artifact. Every claim has a source. Every differential has evidence anchors. The non-dismissible disclosure carries the AB 489 / SB 1120 gate. The chat is the surface; the reasoning is the substrate.
That is the invariant we bind ourselves to: behind every conversational exchange the clinician has with the reasoner is a structured artifact that the clinician can audit. If the surface ever drifts away from the structured artifact, the platform is broken — not stylistically broken, architecturally broken — and we treat it as such.
Why I keep refusing the term
Because terminology compresses. If the market remembers the platform as a clinical chatbot, the market will procure it like a chatbot, deploy it like a chatbot, and trust it like a chatbot. None of those commitments fit a substrate-grade reasoner. So I keep using the longer phrase. The F5/reasoner architecture. The clinical reasoner. The reasoning substrate. None of those are as catchy as “clinical chatbot.” They are catchy enough.
The full architecture of the F5/reasoner is described on the technology page, and the substrate that conditions it is in the Eve-Genesis (Clinical Edition) whitepaper.
- 8 min read
HIPAA by architecture.
The trust posture that follows when the no-PHI-in-training claim is enforceable by inspection of the dataset rather than by promise. What that lets a CMIO and a CISO sign comfortably.
- 9 min read
Composition, not stack — clinical edition.
Cooperating reasoning models, not a tower of bigger and bigger general-purpose models. Why the F5/reasoner architecture is composition, what that buys, and why the clinical edition needed it.