Why we built our own Openfunds field registry

The Openfunds standard defines over 350 fields for describing funds and share classes. It's the closest thing our industry has to a universal schema. The spec is distributed as a PDF. That's the problem.

If you've ever tried to programmatically validate whether OFST010050 is a real Openfunds field ID, or whether it expects an ISO 4217 currency code or a free-text string, you'll understand the frustration. The official source of truth is a document you can read, but your code can't query.

From PDF to queryable registry

We extracted every field from the Openfunds specification and built a structured registry. Each entry contains:

The field ID (e.g., OFST005010 for Fund Legal Name)
The human-readable field name
Expected data type and format constraints
Allowed enumerated values where applicable
The entity level it applies to (fund, share class, umbrella)
Whether it's mandatory, recommended, or optional

This registry is versioned. When the Openfunds Association releases an update, we diff it against our current version and apply changes. We can tell you exactly which fields were added, deprecated, or had their constraints modified between any two versions.

The registry is stored as a JSON structure that any service in the pipeline can query at runtime. No PDF parsing. No guessing.

Why the AI mapper needs this

Here's where it gets interesting. When a new data provider sends us a file with column headers like Fund Name, ISIN Code, NAV per Share, and Management Fee %, our AI mapper suggests which Openfunds fields these should map to.

The problem is that LLMs are confidently creative. Ask GPT or Claude to suggest an Openfunds field ID for "Management Fee" and it might return OFST080010 - which looks right, follows the naming pattern, but doesn't exist. The model hallucinated a plausible-looking field ID.

An AI that confidently invents field IDs that look real is worse than an AI that says "I don't know." You can't validate data against a standard that doesn't exist.

The three-layer guardrail system

We don't trust the AI's field ID suggestions at face value. Every mapping goes through three layers of defence:

Layer 1: Deterministic pre-pass. Before the AI even sees a column header, we run it through a dictionary of known mappings. If the header is ISIN or Fund Currency or NAV Date, we already know the answer. No AI needed. About 60% of columns in a typical fund data file match known patterns. This is fast, cheap, and never hallucinates.

Layer 2: Registry validation. When the AI does suggest a mapping, we validate the suggested field ID against the registry. If it returns OFST999999 and that ID doesn't exist in our registry, we reject it immediately. The AI gets another chance with an explicit prompt: "That field ID doesn't exist. Here are the closest valid options." Usually it self-corrects.

Layer 3: Confidence thresholds. Every mapping gets a confidence score. The deterministic pre-pass scores 1.0. AI suggestions that pass registry validation typically score between 0.6 and 0.9 depending on the semantic similarity. Anything below 0.7 gets flagged for human review. It never goes into the pipeline silently.

What the registry enables beyond mapping

Once you have a machine-readable field registry, other things become possible:

Completeness scoring - we can tell you that a provider's file covers 45 of the 62 recommended Openfunds fields for share class static data. The gaps are visible.
Cross-provider comparison - Provider A sends OFST005010 (Fund Legal Name) but Provider B sends OFST005020 (Fund Legal Name Including Umbrella). Same concept, different granularity. The registry knows the relationship.
Downstream format translation - when delivering to Kurtosys or any other platform, we map from Openfunds field IDs to the destination's property codes. The registry is the Rosetta Stone.

The Openfunds standard is genuinely good work. It represents years of industry consensus on how to describe fund data. But a standard locked in a PDF is a standard that can't be enforced programmatically. We just gave it an API.

Atlas

Kairo Mapper Agent

See Kairo in action

We'll walk through your actual data workflow.

Request a Demo