17 February 2026 - Xavier Flanagan, Founder

How exora’s AI pipeline works

What happens when you upload a document to exora?

From your perspective, it is straightforward: you drop in a PDF or photo of a medical document, wait a few moments, and your health data appears - structured, searchable, and linked back to the original. But behind that simplicity is a multi-pass AI pipeline that reads your documents the way a clinician would, not just scanning for keywords but understanding clinical meaning.

Here is how it works.

Why multiple passes?

The naive approach to document processing is to throw the whole thing at an AI model and say “extract everything.” That works poorly for medical documents. A discharge summary might contain medication lists, vital signs, lab results, diagnoses, procedure notes, and follow-up instructions all woven together across multiple pages. Asking one model to do everything at once leads to missed information, confused context, and lower accuracy.

Instead, exora breaks the work into focused stages. Each pass has a specific job, and each one builds on the results of the previous pass. Think of it like a team of specialists rather than one generalist - each one focuses on what they do best.

Stage 1: Document analysis and encounter discovery

The first thing the pipeline does is understand what it is looking at. Is this a pathology report? A discharge summary? A specialist note? A prescription? The document type determines how it should be read.

Then the pipeline identifies the healthcare encounters within the document. A single PDF might describe multiple visits - a hospital admission that included a surgery, a follow-up appointment, and a series of blood tests. Each encounter is logged with its date, provider, and facility, building the chronological backbone of your health timeline.

Stage 2: Entity detection

With the document structure understood, the pipeline scans for health entities - the individual facts that make up your medical record. This includes:

Conditions and diagnoses - from “Type 2 Diabetes Mellitus” to “mild osteoarthritis of the left knee”
Medications - drug names, doses, frequencies, routes of administration
Vital signs - blood pressure, heart rate, temperature, oxygen saturation
Laboratory results - blood tests, urine tests, pathology findings with reference ranges
Procedures - surgeries, imaging studies, biopsies
Allergies and adverse reactions
Immunizations

This is not simple keyword matching. When the pipeline sees “BP 120/80,” it understands that this represents two distinct measurements: a systolic blood pressure of 120 mmHg and a diastolic of 80 mmHg. When it sees “Amoxicillin 500mg TDS,” it knows that is amoxicillin, 500 milligrams, three times daily. Clinical context matters, and the pipeline is built to understand it.

Stage 3: Clinical extraction and structuring

Detected entities are then extracted into structured clinical data. This is where a mention of “metformin 500mg BD” becomes a proper medication record with the drug name, dose, frequency, and route all separated into discrete fields. Lab results get structured with their test names, values, units, and reference ranges.

Every extracted fact is linked to the healthcare encounter it belongs to, building a complete clinical picture organized by time and context rather than by document.

Stage 4: Medical coding

The final processing stage assigns internationally recognized medical codes to your health data. This is what makes the data truly interoperable - usable across different health systems, not just readable by humans.

Three coding systems are used:

SNOMED CT - the global standard for clinical terminology. It gives every condition, procedure, and finding a unique code that means the same thing in any health system worldwide. “Type 2 Diabetes Mellitus” becomes SNOMED code 44054006, unambiguous regardless of language or country.
RxNorm - the standard for medications. It normalises drug names across brands and generics, so “Panadol,” “Tylenol,” and “paracetamol 500mg tablet” all resolve to the same clinical concept.
LOINC - the standard for laboratory and clinical observations. It ensures that a “fasting blood glucose” test means the same thing whether it was ordered in Melbourne or Montreal.

Medical coding matters because it turns human-readable notes into machine-comparable data. When you want to see all your blood glucose results over time - across different labs, different doctors, different years - coding is what makes that possible.

Source provenance: every fact has a receipt

Throughout every stage, the pipeline tracks exactly where each piece of information came from. Not just which document, but the specific page and location within that page.

When you see a medication in your exora record, you can tap it and be taken directly to the exact spot in the original document where that medication was mentioned. This is not a summary or a paraphrase - it is a direct link to the source.

In healthcare, this matters enormously. AI systems can make mistakes. Documents can contain errors. The ability to verify any fact against its source is not optional - it is essential. We call it “every fact has a receipt” because that is exactly what it is: proof.

The AI providers

Our document processing AI runs primarily on Google Cloud Gemini Enterprise. We may also use OpenAI and Anthropic (Claude) for specific features. We continuously evaluate model performance and update our selections as providers release improvements.

How your data is handled by each provider is not the same:

Google Cloud Gemini Enterprise processes your documents under our signed Cloud Data Processing Addendum. We have explicitly disabled server-side data caching at the project level, so your inputs and outputs are not retained by Google after a request completes. We have also requested opt-out from Google’s safety abuse-review logging.
OpenAI and Anthropic, where used, process your data under their commercial API terms. They may retain API data for up to 30 days for safety and abuse monitoring under those terms.

What is consistent across all providers: your data is not used to train their AI models, and it is processed and returned to us, not stored long-term for any other purpose.

Provider terms verified May 2026. We re-verify these policies quarterly and will update this post if anything changes. Full retention details, including which provider handles which feature, are in our Privacy Policy (sections 6, 8, and 9).

AI is a tool, not a doctor

The pipeline is powerful, but it is not infallible. AI-extracted information can contain errors or miss nuances that a human clinician would catch. Medical codes are assigned algorithmically and have not been verified by a healthcare professional.

That is why source provenance is so central to exora. We do not ask you to trust the AI blindly. We give you the tools to verify everything it produces. The AI does the heavy lifting of reading, extracting, and organizing. You and your healthcare team make the clinical decisions.

exora is a tool that helps you understand and manage your health information. It does not diagnose, does not recommend treatment, and does not replace professional medical advice. It helps you be a more informed participant in your own care.

Xavier Flanagan

Doctor and founder of exora. Previously a hospital doctor in Sydney and Medical Director at HealthMatch.

More about Xavier

Back to blog

Language and Region

How exora’s AI pipeline works

Why multiple passes?

Stage 1: Document analysis and encounter discovery

Stage 2: Entity detection

Stage 3: Clinical extraction and structuring

Stage 4: Medical coding

Source provenance: every fact has a receipt

The AI providers

AI is a tool, not a doctor

Follow along as we build exora

Related posts

Why we built exora

Why patients have never owned their health data