There's a version of building software where you solve a clear, defined problem you've researched thoroughly. Provenance came from the second version.
Why We Built Provenance: The Cloud Compliance Tool We Needed Before We Knew We Needed It
There's a version of building software where you solve a clear, defined problem you've researched thoroughly. Then there's the version where you're deep in something else entirely, hit a wall repeatedly, and eventually decide to just build the thing yourself.
Provenance came from the second version.
At Ceramic Works, we've been building Recorde — a privacy-first web analytics platform for UK businesses. The premise was simple: GDPR and the Data Use and Access Act 2025 (DUAA) had created a compliance landscape where Google Analytics required significant configuration to use legally, and most alternatives were American companies storing data in the US. Recorde was built to fix that — UK data centres, no personal data collection, aggregate-only stats, and automatic compliance with UK privacy law out of the box.
Building it properly meant going deep on data law in a way we hadn't anticipated. It wasn't enough to just choose providers we'd heard of. We needed to know where data would actually live, what legal jurisdiction applied, and whether each provider in our stack had the certifications to back up their compliance claims.
That process was brutal.
Here's what we kept running into: every provider's trust page says roughly the same thing. GDPR compliant. EU data centres available. SOC 2 certified. Data processed in accordance with applicable law.
None of it tells you what you actually need to know.
Does "GDPR compliant" mean they've appointed a Data Protection Officer, or does it mean someone added a GDPR section to their privacy policy? Is that SOC 2 self-attested or independently audited? And perhaps most importantly — are they a US-incorporated company? Because if they are, the CLOUD Act means US authorities can compel them to hand over your data regardless of where the servers are physically located.
That last point catches a lot of people out. You can have data centres in Frankfurt, Dublin, and Amsterdam and still be subject to US jurisdiction if you're a Delaware LLC. "EU-hosted" is not the same as "outside US legal reach."
For each provider we needed to evaluate, answering these questions meant visiting multiple pages, cross-referencing documentation, checking corporate registration details, and often making judgment calls on ambiguous or outdated information. For a single provider this takes maybe 30 minutes if you know what you're looking for. Across a full stack it becomes a serious time sink — and you're still not confident the information is current.
We got fed up with it and started keeping a spreadsheet. The spreadsheet became a structured dataset. The dataset needed an interface. And Provenance was born.
The first decision was to treat the data as the product, not the application. Before writing a single line of UI code we built an open-source dataset of cloud providers — JSON files covering compliance certifications, data residency regions, incorporation status, CLOUD Act exposure, and parent company relationships.
Every data point had to trace back to a primary source. Not a blog post, not a third-party comparison site — the provider's own official documentation, regulatory filings, or certification body records. That meant source URLs attached to every entry.
We ran two full audit rounds across all 55 providers. The corrections in that second pass were sobering. Some highlights:
Stripe doesn't offer HIPAA compliance. They say so explicitly — they won't sign a Business Associate Agreement. But information circulating online routinely lists them as HIPAA compliant.
SendGrid is the same. Twilio (who own SendGrid) explicitly document that SendGrid is not a HIPAA eligible service. Again, widely misreported.
Mailgun has SOC 2, ISO 27001, and HIPAA. All three were incorrectly marked as false in our initial pass. Their security page documents all three clearly.
Scaleway doesn't have SOC 2. They hold ISO 27001, HDS (a French healthcare data standard), and SecNumCloud — but no SOC 2. The distinction matters for US-facing compliance requirements.
Postmark was listed as HIPAA compliant in multiple places. Their own security page doesn't mention it, because they don't offer it.
Twenty-plus corrections across 17 providers. These aren't obscure tools — they're widely used, and the compliance information circulating about them is often wrong.
Once the dataset was solid, the application was relatively straightforward to build. The goal was to make the data useful — fast to navigate, clear in what it's saying, and honest about the limits of what it can tell you.
Provenance covers 55 cloud providers across 10 service categories: cloud IaaS, hosting, databases, payments, analytics, auth, CDN/edge, email, monitoring, and storage. For each one you can see:
It also covers 11 data privacy laws across 42 countries, including GDPR, CCPA, PIPEDA, and the Australian Privacy Act — with official citations and regulator references for each.
The whole thing is open source. The dataset lives in a separate git repository so it can be contributed to independently, consumed by other tools, and version-controlled without being tied to the application layer.
A few things stood out from the process.
Compliance marketing is not compliance documentation. Trust pages are written to reassure, not inform. The gap between what a provider's marketing implies and what their actual certifications cover can be significant. The only way to know is to read the primary source.
Data quality is harder than it looks. Getting 55 providers right across a consistent schema, with sources, required two full audit passes and still surfaced corrections we hadn't caught initially. Maintaining it will require ongoing attention as certifications lapse, companies get acquired, and laws change.
The CLOUD Act is underappreciated as a risk factor. Most compliance discussions focus on certifications. Far fewer focus on the jurisdictional question — whether a provider is legally subject to compelled disclosure regardless of where the data sits. For anyone building something where data sovereignty matters, this is often the more important question.
Open sourcing the right layer changes the value proposition. By separating the dataset from the application, Provenance can serve as infrastructure rather than just a tool. Other developers can build on the data, contribute corrections, or consume it directly without needing the UI.
Provenance is free and open source at provmap.com. If you find something wrong or missing in the dataset, pull requests are welcome — the data is only as good as the people willing to keep it accurate.
And if you're building something in the UK that needs to take privacy seriously from day one, Recorde is what we built before this.