Building Law4Devs: Making 19 EU Regulations Queryable as Structured JSON
The source of truth for EU law is EUR-Lex. The format is PDF and HTML. The experience of finding a specific article, cross-referencing its obligations, and building a compliance system around it is worse than it should be. Law4Devs exists to fix that. Here is what we learned building it.
The Problem with Legal Text as a Data Source
EU regulations are published in the Official Journal of the European Union and mirrored on EUR-Lex. The legal text is authoritative. The format is not designed for programmatic consumption. A regulation like the GDPR is a single document with 99 articles, 173 recitals, and multiple annexes — connected by cross-references that require the reader to navigate non-linearly to understand the full scope of any specific obligation.
For a compliance officer reading the regulation once, this is a manageable friction. For a developer building a system that must implement the regulation's requirements, the friction is a genuine engineering problem. You cannot easily build a machine-readable compliance system on top of an HTML document with inconsistent cross-reference formatting and jurisdiction-specific amendments.
The alternatives — legal summaries, compliance platforms, SaaS compliance tools — solve the readability problem by introducing abstraction. They describe what the law requires without preserving what the law says. For use cases where the verbatim legal text matters — building compliance APIs, training legal AI systems, regulatory documentation — summaries are not a substitute for source text.
The Architecture: Verbatim Text, Structured as Data
The core design principle of Law4Devs is zero abstraction on the legal text. Every article, recital, and annex is stored verbatim — the text that appears in the Official Journal of the EU, in English, French, and German. We do not summarise, paraphrase, or interpret. The API returns the actual legal text, structured so that developers can query it programmatically.
Each regulation is structured as a hierarchy of queryable objects: regulation → chapter → section → article → paragraph → subparagraph. Every object has a unique stable identifier, a version history tracking amendments, and metadata including the article number as referenced in the official text, the effective date, and cross-references to related provisions in the same and other regulations.
The cross-reference layer is what makes the structure useful. When GDPR Article 6 references Article 9, Article 83, and Recital 51, those references are modelled as explicit links in the data schema — not as text strings. A query for GDPR Article 6(1)(a) returns the article text and all outbound cross-references as structured data, allowing a developer to traverse the full reference graph programmatically.
EU Frameworks in Law4Devs (selection)
Plus 9 additional frameworks. 19 in total across data protection, cybersecurity, financial regulation, and AI governance.
The EUR-Lex Ingestion Pipeline
Every regulation begins as a EUR-Lex document. The ingestion pipeline processes official HTML publications, normalises the structure across the varying formats used by different regulations and across different publication years, and maps the text into the Law4Devs schema. This is not a simple task — EU regulations are published by different directorates-general with inconsistent HTML structure, and the same regulation can have multiple official language versions with slightly different paragraph numbering.
The pipeline handles three categories of challenge: structural normalisation (mapping the varying HTML heading structures to a consistent article/paragraph hierarchy), cross-reference resolution (converting text-form references like "pursuant to Article 6" into structured links with stable identifiers), and amendment management (tracking consolidated versions versus original publications and maintaining the amendment history).
Amendment management is the most ongoing operational challenge. EU regulations are frequently amended by subsequent regulations, implementing acts, and delegated acts. The GDPR has been amended and supplemented by the ePrivacy Regulation, various adequacy decisions, and EDPB guidelines that modify interpretive guidance. The Law4Devs amendment model tracks each version with effective dates and surfaces the current consolidated text alongside the amendment history.
API Design for Legal Text
Legal text has properties that make API design non-trivial. Stability matters more than in typical data APIs — a developer building a compliance system on top of Law4Devs cannot afford to have article identifiers change between API versions. The reference schema uses a combination of the regulation's CELEX number (EUR-Lex's stable identifier system) and the article number as published in the official text. These identifiers are stable across API versions regardless of internal schema changes.
Search across legal text presents different challenges than search across general data. A developer querying "data minimisation" needs to find GDPR Article 5(1)(c), Recital 39, and the related Articles 25 and 89 — as well as equivalent provisions in NIS2 and the EU AI Act. The Law4Devs search endpoint supports semantic search across the full corpus and returns results ranked by relevance with the regulation, article, and paragraph context.
The multi-framework query capability is the feature that most differentiates Law4Devs from single-regulation tools. A query for "incident notification obligations" returns the relevant articles from GDPR (Article 33-34), NIS2 (Article 23), DORA (Articles 19-23), and the EU AI Act (Article 73) — normalised to a consistent response schema so the developer can compare obligations across frameworks without parsing different document structures.
SDKs in 6 Languages
The Law4Devs REST API is the foundation. The SDKs are the developer experience layer. We ship official client libraries in Python, TypeScript, Go, Java, PHP, and Ruby — the languages most commonly used in the European enterprise and startup stack. Each SDK wraps the REST API with idiomatic client code, typed response objects, and pagination handling.
The TypeScript SDK is the most used, reflecting the prevalence of Next.js and Node.js in European SaaS development. The Python SDK is close behind — driven by data science and legal-tech teams that use Python as their primary scripting language for compliance analysis. The Go SDK is used primarily by backend teams building compliance microservices where performance characteristics matter.
SDK maintenance is the highest ongoing engineering cost after the EUR-Lex ingestion pipeline. Each SDK must be kept synchronised with the API schema, tested against live API responses, and documented with working code examples. We use a shared OpenAPI specification as the source of truth for SDK generation, which reduces the manual synchronisation burden but does not eliminate it — idiomatic SDK design requires hand-crafted code that cannot be fully generated from a schema.
European Infrastructure as a Requirement
Law4Devs is hosted entirely on European infrastructure — a requirement, not a preference. A compliance API that stores or processes EU regulatory text on US infrastructure creates the same data sovereignty questions that the regulations themselves address. Clients in regulated industries who use Law4Devs as part of their compliance toolchain need to be able to confirm that the data in their processing chain remains in the EU.
All compute, storage, and CDN infrastructure runs on EU-region deployments with explicit EU data residency commitments. Data Processing Agreements under GDPR Article 28 are available for all clients. Sub-processor disclosure includes the full infrastructure chain. This is the minimum viable compliance posture for a platform that serves clients in Finance, Legal, and public sector.
The infrastructure choice also reflects a business position: European sovereign infrastructure is a differentiator in the European enterprise market in a way it is not in the US market. Clients that have data residency requirements baked into their contracts — which is increasingly common in public sector and financial services procurement — cannot use US-hosted infrastructure regardless of contractual protections. Building on European infrastructure from day one avoids the architectural migration cost that US-first platforms pay when they expand into regulated European markets.
Law4Devs is available at law4devs.eu. REST API, SDK documentation, and a free tier for development use are available directly from the platform.