Building a rules engine for fund data validation
Every fund data platform starts with a handful of validation checks. Is the ISIN 12 characters? Is the NAV positive? Is the currency a valid ISO 4217 code? Then reality sets in, and those handful of checks become 50, then 200, each with exceptions that make your original if-statements look naive.
We built hardcoded validation first. It lasted about three months before we ripped it out and replaced it with a configurable rules engine. Here's why, and what we learned about rule design in fund data.
Why hardcoded checks don't scale
The first rule was simple: NAV must be greater than 0. Reasonable. Then a client onboarded a fund that was in the process of liquidating, and its NAV was legitimately 0.00. Exception added.
The second rule: ISIN must match the pattern [A-Z]{2}[A-Z0-9]{9}[0-9]. Then we received a file from a German provider where some share classes used WKN codes instead of ISINs. Same column, different identifier type. Exception added.
The third rule: Fund domicile must be a valid ISO 3166-1 alpha-2 country code. Then a provider sent LUX instead of LU. Not technically wrong - it's the ISO 3166-1 alpha-3 code. Do you reject it or normalise it? Exception added, with a transformation step.
After 20 of these, the validation code was more exception handling than validation. Every new edge case required a code change, a deployment, and a prayer that you didn't break an existing rule for another client.
Anatomy of a validation rule
In our rules engine, every rule is a configuration object with these components:
- Field: Which Openfunds field this rule applies to (e.g.,
OFST020010for ISIN) - Check type: The validation function -
regex,range,enum,cross-field,temporal - Parameters: The check-specific configuration (pattern, min/max, allowed values)
- Severity:
error(blocks processing),warning(flags but continues),info(logged only) - Scope: Which entity types, domiciles, or provider contexts this rule applies to
- Exceptions: Specific conditions where the rule is suppressed
A real example. The NAV movement rule checks whether today's NAV deviates from yesterday's by more than a configurable threshold:
Field: OFST101010 (NAV per Share). Check: temporal_deviation. Threshold: 10%. Severity: warning at 5%, error at 10%. Exceptions: fund type = money market (threshold raised to 0.5% warning, 2% error). Scope: all share classes with daily pricing.
That single rule definition replaces about 40 lines of imperative code with nested conditions. And it can be modified without a deployment.
Jurisdictional exceptions are the real enemy
Fund data validation would be straightforward if every fund operated under the same rules. They don't.
A UCITS fund domiciled in Luxembourg has different mandatory fields than an AIF domiciled in Ireland. A UK-authorised fund has reporting requirements that don't apply to a Cayman-domiciled fund. A Swiss-registered fund has specific requirements around total expense ratios that Luxembourg funds don't.
Our rules engine handles this through scope conditions. A rule can be scoped to:
- Specific domiciles (
LU,IE,GB) - Fund types (UCITS, AIF, ETF)
- Regulatory frameworks (PRIIPs, MiFID II)
- Specific providers (because Provider X always sends dates in DD/MM/YYYY)
- Specific clients (because Client Y considers a 3% NAV swing normal for their hedge fund range)
The scoping system means we can say: "This rule applies to all UCITS funds domiciled in Luxembourg, except when the data comes from Provider X, in which case use this modified threshold." Without scoping, every exception is a code branch. With scoping, it's a configuration entry.
Cross-field validation
The most valuable rules aren't single-field checks. They're cross-field validations that catch inconsistencies humans would spot but simple checks miss:
- If
Fund Statusis "Liquidated" but there's a NAV dated yesterday, something is wrong - If
Share Class Currencyis USD but theNAVis 0.0043, it's probably a Japanese Yen fund with the wrong currency code - If
Launch Dateis in the future butNAV Historycontains past data points, the launch date is likely wrong - If
Management Feeis 15%, either it's a hedge fund performance fee mislabelled, or someone moved a decimal point
Cross-field rules catch the errors that pass individual field validation. Each field is valid on its own - the currency code is real, the NAV is positive, the date is well-formatted. But together they tell a story that doesn't make sense.
The 80/20 of rule design
We currently have about 85 active validation rules. Twenty of them catch 90% of data quality issues. The remaining 65 exist for edge cases that occur maybe once a quarter but cause significant problems when they do.
The top five rules by issue volume, if you're curious:
- Missing mandatory fields (ISIN, NAV, currency) - 35% of all flags
- Date format inconsistencies - 20%
- NAV movement exceeding threshold - 15%
- Duplicate records in same file - 10%
- Stale data (no update in expected window) - 8%
The rules engine isn't glamorous engineering. It's not AI, it's not real-time streaming, it's not distributed computing. It's a configuration-driven system that evaluates conditions against data. But it's the difference between "we think the data is probably fine" and "we know exactly which 14 fields across 3 share classes failed validation and why."
In fund data, probably fine isn't good enough.