Three layers of guardrails for AI-generated mappings
Using an LLM to map incoming fund data columns to standardised field IDs is genuinely useful. It turns a 2-hour manual task into a 30-second suggestion. It also introduces a failure mode that didn't exist before: the AI will occasionally invent field IDs that look perfectly real but don't exist.
We learned this early. Our first prototype of the AI mapper was impressive in demos. It correctly mapped Fund Name to OFST005010, ISIN to OFST020010, and NAV per Share to OFST101010. Then it mapped Swing Factor to OFST340020. Confident. Formatted correctly. Completely fabricated.
If that mapping had gone into production, every swing factor value for every fund in that pipeline would have been tagged with a non-existent field ID. Downstream systems would either reject the data silently or, worse, store it against a meaningless key where nobody could find it.
Layer 1: The deterministic pre-pass
Before the AI touches anything, we run every column header through a dictionary of known mappings. This dictionary has been built over years of onboarding fund data providers. It knows that ISIN, Isin Code, ISIN_CODE, isinCode, and International Securities Identification Number all map to OFST020010.
The dictionary currently has about 800 entries covering roughly 150 unique Openfunds fields. It handles case normalisation, common abbreviations, and language variations (German providers love sending Fondsname instead of Fund Name).
In a typical fund data file with 30-40 columns, this pre-pass resolves about 60% of mappings instantly. These get a confidence score of 1.0 because they're deterministic. No model involved, no hallucination possible.
The AI only sees the remaining 40% - the columns the dictionary doesn't recognise.
Layer 2: Registry validation
When the AI returns a suggested mapping, we don't trust it. We validate the suggested field ID against our Openfunds field registry - a structured, machine-readable version of the full Openfunds specification.
The validation checks three things:
- Existence: Does this field ID actually exist in the registry? If the AI suggests
OFST340020and it's not in the registry, it's rejected immediately. - Type compatibility: If the AI maps a column of numeric values to a field that expects an ISO country code, something is wrong.
- Entity level: If the data is at the share class level but the AI suggests a fund-level field, that's a mismatch worth flagging.
When a field ID fails validation, we don't just reject it. We give the AI a second chance with a constrained prompt: "The field ID you suggested doesn't exist. Here are the 5 closest valid fields based on the column name and sample data. Pick one or say you're unsure."
This retry step recovers about 70% of initial failures. The AI usually gets close on the first try - it picks up the right category but invents a specific ID within it. Given the actual options, it almost always picks correctly.
Layer 3: Confidence thresholds
Every mapping in the system carries a confidence score between 0 and 1:
1.0- Deterministic dictionary match. No review needed.0.85-0.99- AI suggestion that passed registry validation and has strong semantic similarity. Auto-approved but logged.0.70-0.84- AI suggestion that passed validation but the match isn't obvious. Approved with a visual flag in the UI.Below 0.70- Requires human review before the mapping is active. The pipeline won't process this field until someone confirms or corrects it.
The threshold of 0.70 isn't arbitrary. We calibrated it by running the mapper against 50 historical provider files where we already had confirmed mappings. At 0.70, the false positive rate dropped below 2%. Lowering it to 0.60 let more correct mappings through but also tripled the false positives. Not worth it.
The confidence score isn't about the AI's certainty. It's about how much evidence exists that this mapping is correct. A dictionary match has perfect evidence. An AI guess has probabilistic evidence. The score reflects the evidence, not the model's self-assessment.
Why defence in depth matters here
Any single layer would be insufficient. The dictionary alone can't handle novel column names. The registry check alone can't catch semantically wrong but structurally valid mappings. Confidence thresholds alone don't prevent hallucinated field IDs from entering the system.
Together, they form a pipeline where:
- 60% of mappings never touch AI at all
- 35% are AI-suggested, validated, and scored above threshold
- 5% require human review
That 5% is the right number. Low enough that it doesn't create a bottleneck. High enough that the system admits what it doesn't know.
The worst thing an AI system can do in a data pipeline is be confidently wrong. Our guardrails are designed around one principle: uncertainty should be visible, not hidden. If the system isn't sure, it says so. If the mapping might be wrong, a human sees it before it touches production data.
That's not a limitation. That's the feature.