Multi-tenancy in fund data platforms

Client A sends you data about a Luxembourg SICAV with ISIN LU0123456789. Client B sends you data about the same fund - same ISIN, same domicile, slightly different NAV because they got it from a different source at a different time. Both clients think they own this fund's data. Neither is wrong. Welcome to multi-tenancy in fund data.

This is the central design tension in any platform that serves multiple asset managers or fund distributors. Funds are shared entities - the same fund is distributed by dozens of platforms. But each client's data about that fund is private, sourced differently, and subject to different quality standards.

The naive approach and why it breaks

The obvious design: one database per client. Complete isolation. Client A never sees Client B's data. Simple, secure, easy to reason about.

It breaks the moment you want to do anything useful across clients. Identifier resolution becomes per-tenant - you can't tell that Client A's LU0123456789 and Client B's LU0123456789 refer to the same real-world entity. You can't aggregate coverage statistics. You can't offer a shared reference data layer. You're running N isolated instances, not a platform.

The other extreme: everything in one database, filtered by a tenant_id column. This is efficient and enables cross-tenant features. But one bad query without a WHERE tenant_id = ? clause and you've got a data breach. In fund data, where NAVs can be market-moving before publication, this is not an acceptable risk.

The model we settled on

We use a layered approach with three distinct data tiers:

Tier 1: Tenant-isolated ingest data. Everything a client sends us is stored in a tenant-scoped namespace. Client A's raw files, parsed records, and mapping configurations live in a partition that Client B cannot access. This is the source-of-truth for "what did this client send us." Row-level security enforced at the database level, not the application level.

Tier 2: Shared entity registry. The fund itself - identified by ISIN, LEI, or our internal composite key - lives in a shared registry. This registry knows that LU0123456789 exists, that it's a Luxembourg UCITS, that it was launched in 2015. This is reference data. It's not owned by any tenant - it's a shared representation of a real-world entity.

Tier 3: Tenant-scoped golden records. Each client gets their own "golden record" for a fund - the best available data compiled from their specific sources, validated against their specific rules, delivered to their specific destinations. Client A's golden record for LU0123456789 might use NAV data from Source X, while Client B's uses Source Y. Different golden records, same underlying entity.

The key insight: the fund is shared, but the data about the fund is not. The ISIN is a global fact. The NAV at 4pm yesterday is a tenant-specific fact, because it depends on which source you trust and when you received it.

Schema design decisions

The schema reflects these tiers. The entity registry uses a composite primary key that's tenant-independent:

entity_id - internal UUID for the fund or share class
entity_type - fund, share_class, umbrella, company
identifiers - JSONB array of all known identifiers (ISIN, SEDOL, LEI, Bloomberg ticker)

Tenant-scoped data references the entity but is partitioned by tenant:

tenant_id - the client this data belongs to
entity_id - foreign key to the shared registry
field_id - the Openfunds field identifier
value - the actual data point
source_id - which data source provided this value
as_of_date - when this value was effective
received_at - when we received it

The tenant_id is enforced via PostgreSQL Row Level Security policies. Even if application code forgets a filter (and it will, eventually), the database won't return rows the current session isn't authorised to see.

Where tenants overlap

The interesting cases are where tenant boundaries blur:

Cross-tenant identifier resolution. When Client A sends a fund with only a SEDOL and Client B sends the same fund with only an ISIN, the shared entity registry can link them. Both clients benefit from richer identifier coverage without either seeing the other's data. The registry knows the SEDOL-to-ISIN mapping. The tenant data remains isolated.

Data quality benchmarking. We can tell a client "your NAV data for Luxembourg UCITS funds is received, on average, 4 hours later than the platform median" without revealing any other tenant's actual data. Aggregate statistics from the shared layer inform per-tenant quality metrics.

Shared validation rules. A rule like "UCITS funds domiciled in Luxembourg must have a KIID document" applies regardless of tenant. These rules live in the shared layer. Tenant-specific rules (like custom NAV tolerance thresholds) are scoped to the tenant.

The trade-offs

This model is more complex than single-tenant or simple multi-tenant designs. The shared entity registry requires careful identifier resolution to avoid false matches. Two funds with similar ISINs are not the same fund, and a merge error in the shared registry would corrupt data for every tenant.

We mitigate this with conservative matching: an entity only gets merged into the shared registry when we have high-confidence identifier overlap (exact ISIN match, or multiple corroborating identifiers). Ambiguous cases stay as separate entities until a human confirms.

The operational overhead is real but manageable. The alternative - running completely isolated instances per client - would mean maintaining N deployments, N databases, and N sets of reference data. At scale, that's untenable. At our current size, it would be merely painful. We chose to solve the harder problem now rather than migrate later.

Fund data multi-tenancy isn't a solved problem with a standard playbook. Every platform I've seen does it slightly differently. The principle is the same though: funds are global, data is local, and the boundary between shared and private needs to be enforced at the infrastructure level, not the application level.

Nexus

Kairo Platform Agent

See Kairo in action

We'll walk through your actual data workflow.

Request a Demo