Ceramic Works
Provenance - Cloud Provider Data Governance & Compliance analysis with world map

Provenance - Open-Source Cloud Compliance Intelligence

Client

Ceramic Works

Timeline

March 2026 - Ongoing

Services
ResearchData CurationWeb DevelopmentDev Ops
Technologies
TanStack StartDrizzle ORMPostgresTailwindTypescriptNetlifyNeon

What started as a frustrating gap in available tooling became a fully-featured, open-source compliance intelligence platform covering 55 cloud providers across 42 countries.

Overview

What started as a frustrating gap in available tooling became a fully-featured, open-source compliance intelligence platform covering 55 cloud providers across 42 countries.

While building Recorde — a privacy-first analytics platform for UK businesses, built around DUAA 2025 and GDPR compliance — the same question kept surfacing: where is this data actually going, and who has legal access to it? Every new provider evaluation meant manually trawling trust pages, privacy policies, and compliance documentation to piece together a picture that should have been obvious.

No single source existed that combined data residency, compliance certifications, jurisdiction risk, and CLOUD Act exposure in one place. So we built Provenance — the tool we needed while building everything else.

The Challenge

Building a GDPR and DUAA 2025-compliant product forces a level of due diligence that most developers never have to think about. It's not enough to know a provider has EU data centres. You need to know whether their parent company is US-incorporated, whether they're subject to the CLOUD Act, which certifications they actually hold (versus claim), and which data protection laws apply in each region they operate.

The manual process was unsustainable. Each provider audit meant visiting multiple pages, cross-referencing documentation, and making judgment calls on ambiguous or outdated information — with no audit trail and no easy way to compare across providers. For a single provider this takes time. For an entire stack, it becomes a significant compliance bottleneck.

The deeper problem was trust. Compliance pages are written by marketing teams. "GDPR compliant" can mean almost anything. "EU-hosted" doesn't account for US parent company jurisdiction. SOC 2 can be self-attested or independently audited. Without verified, sourced data, every compliance decision carries hidden risk.

The Approach

The foundation had to be the data. Before writing a line of application code, we built an open-source dataset, provider JSON files covering compliance certifications, data residency, incorporation status, CLOUD Act exposure, and parent company relationships. Every data point was researched against primary sources: official trust pages, regulatory filings, certification bodies, and authoritative legal texts.

We ran two full audit rounds across all 55 providers, correcting 20+ inaccuracies in what we'd initially recorded and adding verified source URLs to every entry. Notable corrections included providers incorrectly claiming HIPAA compliance (Stripe, SendGrid, Postmark), missing certifications (Mailgun's SOC 2, ISO 27001, and HIPAA were all incorrectly marked false), and GDPR status errors across multiple database providers.

The dataset was structured as a git submodule - separating the data from the application so it could be contributed to independently, version-controlled, and consumed by other tools.

The application was built to make the dataset useful: fast, navigable, and clear enough for both developers and compliance teams to act on.

The Solution

Provenance maps the full compliance picture for cloud providers, not just what they claim, but what they hold, where they operate, and what legal exposure they carry.

Core features:

  • 55 providers across 10 service categories: cloud IaaS, hosting, databases, payments, analytics, auth, CDN/edge, email, monitoring, and storage
  • Compliance certification tracking across SOC 2, ISO 27001, GDPR, HIPAA, and PCI DSS — verified against primary sources
  • Data residency mapping — data centre regions with country-level jurisdiction tagging
  • CLOUD Act exposure flags — identifying US-incorporated providers regardless of where their servers are located
  • Parent company relationships — surfacing when a nominally independent provider is owned by a larger entity subject to different jurisdictional rules
  • 11 privacy laws across 42 countries, mapped to the providers that fall under their scope
  • Intelligence alliance visibility — Five Eyes and Fourteen Eyes jurisdictional context
  • Fully open-source dataset — sourced, versioned, and open to community contribution

The stack: TanStack Start, Drizzle ORM, PostgreSQL, Tailwind CSS — was chosen for type safety end-to-end and fast iteration. The dataset lives in a separate repository, making it independently usable by anyone who needs the raw data without the application layer.

The Results

Provenance launched as a free, open-source tool covering more ground than any comparable resource we found during development. The dataset is the most thoroughly sourced public collection of cloud provider compliance data available, every entry links back to the primary documentation it was verified against.

The two-round audit process surfaced significant inaccuracies across widely-used providers, reinforcing the original premise: compliance marketing and compliance reality frequently diverge, and the only way to close that gap is primary-source verification.

The project also validated the submodule architecture. By decoupling the dataset from the application, Provenance can serve as infrastructure for other compliance tooling — not just a standalone product.

Key Takeaways

Build what you actually need. Provenance exists because the gap was real and personally felt. Intimate knowledge of the problem — from building a compliance-sensitive product from scratch — led directly to a better solution than could have been designed in the abstract.

Data quality is the product. The application is a presentation layer. The value is in the accuracy of the underlying dataset. Two full audit rounds and source verification for every entry were not optional, they were the entire point.

Open source the right layer. Making the dataset a standalone, publicly accessible resource turns a personal tool into shared infrastructure. Other developers, compliance teams, and tools can use it without needing the application.

Compliance claims require verification. The audit revealed that many widely-used providers have inaccurate or outdated compliance information in circulation — including providers incorrectly listed as HIPAA-compliant that explicitly state they do not offer it. Trust pages are not ground truth.