Ceramic Works
March 2026 - Ongoing
What started as a frustrating gap in available tooling became a fully-featured, open-source compliance intelligence platform covering 55 cloud providers across 42 countries.
What started as a frustrating gap in available tooling became a fully-featured, open-source compliance intelligence platform covering 55 cloud providers across 42 countries.
While building Recorde — a privacy-first analytics platform for UK businesses, built around DUAA 2025 and GDPR compliance — the same question kept surfacing: where is this data actually going, and who has legal access to it? Every new provider evaluation meant manually trawling trust pages, privacy policies, and compliance documentation to piece together a picture that should have been obvious.
No single source existed that combined data residency, compliance certifications, jurisdiction risk, and CLOUD Act exposure in one place. So we built Provenance — the tool we needed while building everything else.
Building a GDPR and DUAA 2025-compliant product forces a level of due diligence that most developers never have to think about. It's not enough to know a provider has EU data centres. You need to know whether their parent company is US-incorporated, whether they're subject to the CLOUD Act, which certifications they actually hold (versus claim), and which data protection laws apply in each region they operate.
The manual process was unsustainable. Each provider audit meant visiting multiple pages, cross-referencing documentation, and making judgment calls on ambiguous or outdated information — with no audit trail and no easy way to compare across providers. For a single provider this takes time. For an entire stack, it becomes a significant compliance bottleneck.
The deeper problem was trust. Compliance pages are written by marketing teams. "GDPR compliant" can mean almost anything. "EU-hosted" doesn't account for US parent company jurisdiction. SOC 2 can be self-attested or independently audited. Without verified, sourced data, every compliance decision carries hidden risk.
The foundation had to be the data. Before writing a line of application code, we built an open-source dataset, provider JSON files covering compliance certifications, data residency, incorporation status, CLOUD Act exposure, and parent company relationships. Every data point was researched against primary sources: official trust pages, regulatory filings, certification bodies, and authoritative legal texts.
We ran two full audit rounds across all 55 providers, correcting 20+ inaccuracies in what we'd initially recorded and adding verified source URLs to every entry. Notable corrections included providers incorrectly claiming HIPAA compliance (Stripe, SendGrid, Postmark), missing certifications (Mailgun's SOC 2, ISO 27001, and HIPAA were all incorrectly marked false), and GDPR status errors across multiple database providers.
The dataset was structured as a git submodule - separating the data from the application so it could be contributed to independently, version-controlled, and consumed by other tools.
The application was built to make the dataset useful: fast, navigable, and clear enough for both developers and compliance teams to act on.
Provenance maps the full compliance picture for cloud providers, not just what they claim, but what they hold, where they operate, and what legal exposure they carry.
Core features:
The stack: TanStack Start, Drizzle ORM, PostgreSQL, Tailwind CSS — was chosen for type safety end-to-end and fast iteration. The dataset lives in a separate repository, making it independently usable by anyone who needs the raw data without the application layer.
Provenance launched as a free, open-source tool covering more ground than any comparable resource we found during development. The dataset is the most thoroughly sourced public collection of cloud provider compliance data available, every entry links back to the primary documentation it was verified against.
The two-round audit process surfaced significant inaccuracies across widely-used providers, reinforcing the original premise: compliance marketing and compliance reality frequently diverge, and the only way to close that gap is primary-source verification.
The project also validated the submodule architecture. By decoupling the dataset from the application, Provenance can serve as infrastructure for other compliance tooling — not just a standalone product.
Build what you actually need. Provenance exists because the gap was real and personally felt. Intimate knowledge of the problem — from building a compliance-sensitive product from scratch — led directly to a better solution than could have been designed in the abstract.
Data quality is the product. The application is a presentation layer. The value is in the accuracy of the underlying dataset. Two full audit rounds and source verification for every entry were not optional, they were the entire point.
Open source the right layer. Making the dataset a standalone, publicly accessible resource turns a personal tool into shared infrastructure. Other developers, compliance teams, and tools can use it without needing the application.
Compliance claims require verification. The audit revealed that many widely-used providers have inaccurate or outdated compliance information in circulation — including providers incorrectly listed as HIPAA-compliant that explicitly state they do not offer it. Trust pages are not ground truth.