Building a rules engine for fund data validation

Every fund data platform starts with a handful of validation checks. Is the ISIN 12 characters? Is the NAV positive? Is the currency a valid ISO 4217 code? Then reality sets in, and those handful of checks become 50, then 200, each with exceptions that make your original if-statements look naive.

We built hardcoded validation first. It lasted about three months before we ripped it out and replaced it with a configurable rules engine. Here's why, and what we learned about rule design in fund data.

Why hardcoded checks don't scale

The first rule was simple: NAV must be greater than 0. Reasonable. Then a client onboarded a fund that was in the process of liquidating, and its NAV was legitimately 0.00. Exception added.

The second rule: ISIN must match the pattern [A-Z]{2}[A-Z0-9]{9}[0-9]. Then we received a file from a German provider where some share classes used WKN codes instead of ISINs. Same column, different identifier type. Exception added.

The third rule: Fund domicile must be a valid ISO 3166-1 alpha-2 country code. Then a provider sent LUX instead of LU. Not technically wrong - it's the ISO 3166-1 alpha-3 code. Do you reject it or normalise it? Exception added, with a transformation step.

After 20 of these, the validation code was more exception handling than validation. Every new edge case required a code change, a deployment, and a prayer that you didn't break an existing rule for another client.

Anatomy of a validation rule

In our rules engine, every rule is a configuration object with these components:

Field: Which Openfunds field this rule applies to (e.g., OFST020010 for ISIN)
Check type: The validation function - regex, range, enum, cross-field, temporal
Parameters: The check-specific configuration (pattern, min/max, allowed values)
Severity: error (blocks processing), warning (flags but continues), info (logged only)
Scope: Which entity types, domiciles, or provider contexts this rule applies to
Exceptions: Specific conditions where the rule is suppressed

A real example. The NAV movement rule checks whether today's NAV deviates from yesterday's by more than a configurable threshold:

Field: OFST101010 (NAV per Share). Check: temporal_deviation. Threshold: 10%. Severity: warning at 5%, error at 10%. Exceptions: fund type = money market (threshold raised to 0.5% warning, 2% error). Scope: all share classes with daily pricing.

That single rule definition replaces about 40 lines of imperative code with nested conditions. And it can be modified without a deployment.

Jurisdictional exceptions are the real enemy

Fund data validation would be straightforward if every fund operated under the same rules. They don't.

A UCITS fund domiciled in Luxembourg has different mandatory fields than an AIF domiciled in Ireland. A UK-authorised fund has reporting requirements that don't apply to a Cayman-domiciled fund. A Swiss-registered fund has specific requirements around total expense ratios that Luxembourg funds don't.

Our rules engine handles this through scope conditions. A rule can be scoped to:

Specific domiciles (LU, IE, GB)
Fund types (UCITS, AIF, ETF)
Regulatory frameworks (PRIIPs, MiFID II)
Specific providers (because Provider X always sends dates in DD/MM/YYYY)
Specific clients (because Client Y considers a 3% NAV swing normal for their hedge fund range)

The scoping system means we can say: "This rule applies to all UCITS funds domiciled in Luxembourg, except when the data comes from Provider X, in which case use this modified threshold." Without scoping, every exception is a code branch. With scoping, it's a configuration entry.

Cross-field validation

The most valuable rules aren't single-field checks. They're cross-field validations that catch inconsistencies humans would spot but simple checks miss:

If Fund Status is "Liquidated" but there's a NAV dated yesterday, something is wrong
If Share Class Currency is USD but the NAV is 0.0043, it's probably a Japanese Yen fund with the wrong currency code
If Launch Date is in the future but NAV History contains past data points, the launch date is likely wrong
If Management Fee is 15%, either it's a hedge fund performance fee mislabelled, or someone moved a decimal point

Cross-field rules catch the errors that pass individual field validation. Each field is valid on its own - the currency code is real, the NAV is positive, the date is well-formatted. But together they tell a story that doesn't make sense.

The 80/20 of rule design

We currently have about 85 active validation rules. Twenty of them catch 90% of data quality issues. The remaining 65 exist for edge cases that occur maybe once a quarter but cause significant problems when they do.

The top five rules by issue volume, if you're curious:

Missing mandatory fields (ISIN, NAV, currency) - 35% of all flags
Date format inconsistencies - 20%
NAV movement exceeding threshold - 15%
Duplicate records in same file - 10%
Stale data (no update in expected window) - 8%

The rules engine isn't glamorous engineering. It's not AI, it's not real-time streaming, it's not distributed computing. It's a configuration-driven system that evaluates conditions against data. But it's the difference between "we think the data is probably fine" and "we know exactly which 14 fields across 3 share classes failed validation and why."

In fund data, probably fine isn't good enough.

Argus

Kairo Quality Agent

See Kairo in action

We'll walk through your actual data workflow.

Request a Demo