Build vs buy in fund data automation
Every fund data team eventually faces the same decision: keep using Excel, build something internally, or buy a platform. Most teams get the decision wrong — not because they choose poorly, but because they evaluate the wrong criteria.
The four options
Let's be honest about what's actually on the table for most teams:
Option 1: Excel and manual processes. This is where everyone starts and where a surprising number of firms still live. An analyst downloads files, opens them in Excel, runs some VLOOKUP formulas, checks for outliers visually, and pastes the results into another system. It works. For a while.
Option 2: Bloomberg Terminal / Morningstar Direct. Data terminal with a subscription feed. Good for market data and basic fund information. Not designed for operational data workflows — ingestion, normalisation, validation, delivery. You get data out of it, but you're still doing the pipeline work yourself.
Option 3: Internal tools. An IT team builds a Python script, a database, maybe a simple UI. It works for the specific use case it was built for. Then the person who built it leaves. Then a new data source arrives with a different format. Then someone needs an audit trail.
Option 4: Purpose-built platform. A system designed specifically for fund data operations. Ingestion, normalisation, validation, delivery, audit trail. This is what we're building with Kairo.
Why "just use Excel" works until it doesn't
I have enormous respect for what ops teams accomplish with Excel. I've seen analysts build genuinely impressive validation frameworks using nothing but conditional formatting and pivot tables. For five funds and two data sources, it's honestly fine.
The breaking point is predictable. It comes when one of these things happens:
- You cross roughly 50 funds and the spreadsheet takes 30 seconds to recalculate
- A second analyst needs to do the same work and you realise the process lives in one person's head
- A regulator asks for an audit trail and you have to reconstruct it from email timestamps
- A data provider changes their column order and nobody notices for three days
- Someone accidentally overwrites the master file and there's no version history
The cost of these failures isn't theoretical. I've seen a single mismatched ISIN — IE00B4L5Y983 confused with IE00B4L5YC18 because both are iShares Core MSCI World funds — cause a week of reconciliation work across three teams.
The internal tools trap
Building internally is seductive because it starts cheap and fast. Your best engineer spins up a Python script in a week. It reads the CSV, maps the columns, loads the database. Done.
Six months later, that script handles fourteen edge cases, has no tests, no error handling beyond try/except: pass, and the engineer has moved to a different team. The ops team is now afraid to touch the code but completely dependent on it.
Every internal fund data tool I've seen follows the same arc: fast start, slow decay, eventual crisis when the original builder leaves.
This isn't a criticism of the engineers. It's a structural problem. Internal tools don't get maintained like products because they don't have product owners, roadmaps, or dedicated engineering time. They're side projects that became critical infrastructure by accident.
What to actually evaluate
When you're making this decision, here are the questions that actually matter:
- How many sources will you have in 18 months? Not today. Today is irrelevant. If you're going from 3 to 30, you need automation. If you're staying at 3, Excel is fine.
- Do you need an audit trail? If yes — and increasingly the answer is yes — you need something that logs every transformation, not just the final output.
- What happens when someone leaves? If the process can't survive a key person taking a two-week holiday, you have a single point of failure, not a system.
- How fast do you need to onboard new sources? If adding a new data provider takes two weeks of development time, your cost per source is too high.
The real cost calculation
Teams always undercount the cost of manual processes. They count the analyst's salary but not the opportunity cost of what that analyst could be doing instead. They count the hours spent on daily processing but not the hours spent on error recovery, re-runs, and manual reconciliation.
In my experience, the fully loaded cost of manual fund data operations — including error handling, audit prep, and key-person risk — is roughly 3x what teams estimate. That changes the build-vs-buy equation significantly.
The right answer depends on your specific situation. But if you're processing more than a handful of sources and you need regulatory-grade audit trails, the spreadsheet era has an expiry date. The only question is whether you hit it gracefully or in a crisis.