Why we remove AI from the execution path
Here's an experiment. Ask GPT-4 to map the column "Nav Per Unit" to an Openfunds field three times. You'll get the right answer three times. Ask it thirty times. Somewhere around attempt 17, it'll give you a different answer. In fund data, that's not a quirk. That's a defect.
We use AI extensively in Kairo. It maps fields, suggests entity matches, classifies documents, detects anomalies. It's genuinely good at these tasks. But there's a critical distinction we enforce: AI advises, it doesn't execute.
The moment AI output flows directly into a production pipeline without a human confirmation step and a subsequent lock, you've introduced non-determinism into a system that absolutely requires determinism. Fund NAVs get published to Bloomberg. Performance numbers go into regulatory filings. This data cannot vary based on the stochastic whims of a language model.
The non-determinism problem
Language models are probabilistic. Given the same input, they produce slightly different outputs depending on temperature, sampling, context window state, and model version. For creative writing, that's a feature. For data pipelines, it's a catastrophe.
Consider a mapping pipeline that uses AI to classify each incoming field on every run:
- Monday: "Net Asset Value" maps to
OFST040010(NAV Per Share). Correct. - Tuesday: "Net Asset Value" maps to
OFST040010. Correct. - Wednesday: "Net Asset Value" maps to
OFST040020(Total Net Assets). Wrong. - Thursday: "Net Asset Value" maps to
OFST040010. Correct again.
Wednesday's NAV data just went into the wrong field. If nobody catches it, total net assets for 200 funds are wrong for a day. If the outbound pipe fires before anyone notices, that wrong data is now at Morningstar. Enjoy the correction process.
This isn't hypothetical. We tested it. Over 100 runs with identical input, even top-tier models produced inconsistent mappings 2-4% of the time. For a system processing thousands of fields daily, that's dozens of potential errors per day.
The lock pattern
Our solution is what we call advise-confirm-lock:
- Advise: AI analyses the incoming schema and suggests mappings with confidence scores. This runs once, when a new provider or schema is first encountered.
- Confirm: A human reviews the suggestions. High-confidence mappings can be batch-approved. Low-confidence ones get individual attention. This takes minutes, not hours.
- Lock: Confirmed mappings are saved as a deterministic configuration. From this point forward, the pipeline uses the locked mapping. No AI involved. Column A always maps to field X. Every time. Forever, until a human unlocks it.
The locked mapping is just a JSON object. Column name to Openfunds field ID. Dead simple. Executes in microseconds. No API call to an LLM. No chance of variation.
When does the lock break?
Schema changes. Provider adds a column, removes a column, renames a column. The ingestion pipeline detects the schema drift — the incoming columns don't match the locked mapping.
At this point, and only at this point, the AI re-engages. It analyses the delta: "Column 'Management Fee (%)' is new. Suggested mapping: OFST050110 (Management Fee Applied) with 0.92 confidence." The human confirms. The lock updates. Back to deterministic.
AI is brilliant at figuring things out the first time. It's terrible at doing the exact same thing the exact same way every subsequent time. So let it figure it out once, then take it out of the loop.
What this means for the architecture
This pattern — AI at configuration time, deterministic code at execution time — shapes our entire architecture. The ingestion pipeline has no LLM calls in its hot path. The validation engine runs rules, not prompts. The delivery pipes execute transforms, not inferences.
The benefits compound:
- Speed: No LLM latency in the processing pipeline. Files process in seconds, not minutes.
- Cost: No per-record API costs. AI costs are amortised across the lifetime of a mapping, not incurred on every file.
- Auditability: Every pipeline run produces identical output for identical input. You can reproduce any historical run exactly. Try that with a probabilistic system.
- Reliability: No dependency on external AI API availability during pipeline execution. If OpenAI goes down at 17:00, your NAV files still process.
The uncomfortable truth about AI-native
Calling a platform "AI-native" is fashionable. Usually it means "we call GPT-4 on every request and hope for the best." That's not AI-native. That's AI-dependent. And dependency on a non-deterministic system is a risk, not a feature.
Genuinely AI-native means AI is embedded in the design process, not the runtime. The AI shaped the mappings. The AI suggested the rules. The AI identified the patterns. Then the AI stepped aside and let deterministic code do the repetitive work reliably.
Use AI where it's strong: understanding ambiguity, suggesting solutions, handling novelty. Remove it where reliability matters: execution, transformation, delivery. The fund data industry can't afford "usually correct." It needs "always correct, and provably so."