How AI Systems Are Assessed Against 43 EU Obligations
AI compliance in the EU increasingly depends on whether you can show, at any time, how an AI system meets a defined set of legal obligations. That’s where an obligation tracking and compliance mapping engine becomes practical: it turns broad regulatory requirements into a structured, testable, evidence-backed assessment process.
This guide walks you through a workable method to assess AI systems against a catalog of 43 EU obligations using an obligation library, a mapping model, and an evidence workflow that can scale across teams and products.
Step 1: Define your “43 obligations” library as a structured catalog
Start by converting your obligation set into a machine- and human-readable library. Whether your obligations come from a regulation, internal policy, contractual controls, or a combination, the method is the same: standardize each obligation so it can be tracked, assigned, tested, and audited.
For each obligation, define the following fields:
- Obligation ID: stable identifier (e.g., EU-OB-01)
- Obligation statement: the exact requirement in plain language
- Applicability conditions: when it applies (system type, risk tier, user group, geography, lifecycle stage)
- Control objective: what “good” looks like (measurable outcome)
- Required evidence types: documents, logs, tests, approvals
- Owner role: who is responsible (Product, ML, Security, Legal, Compliance)
- Verification method: how it will be checked (review, automated test, monitoring)
- Frequency: one-time, per release, quarterly, continuous
- Severity / priority: based on legal impact and operational risk
Actionable tip: Avoid long narrative obligations. Rewrite each into a testable format: “We must do X for systems meeting Y, evidenced by Z.” If it can’t be evidenced, it can’t be closed.
Step 2: Build an applicability decision tree to reduce noise
A common failure mode is applying all obligations to every system. Your engine should first run an applicability filter so teams focus only on what matters.
Create an applicability questionnaire that feeds a rules engine:
- System purpose (recommendation, classification, generation, decision support)
- Domain (employment, finance, health, education, public sector, etc.)
- User context (internal-only vs customer-facing)
- Impact profile (safety, fundamental rights, financial)
- Autonomy level (human-in-the-loop, human-on-the-loop, fully automated)
- Data types (personal data, sensitive categories, children’s data)
- Deployment status (prototype, pilot, production)
The output should be:
- Applicable obligations list
- Not applicable obligations list with rationale
- Conditional obligations that depend on a future design choice (e.g., whether automated decisioning will be enabled)
Actionable tip: Store applicability outcomes as evidence. “Not applicable” needs the same discipline as “compliant,” or it will be challenged during review.
Step 3: Decompose each obligation into controls, tests, and evidence
An obligation rarely maps to a single artifact. A robust engine breaks it into:
- Controls: what you implement (policies, technical measures, process steps)
- Tests/Checks: how you verify it works (manual review, automated checks, monitoring)
- Evidence: what you retain (documents, logs, screenshots, approvals, metrics)
Example decomposition structure:
- Obligation EU-OB-12 (statement)
- Control C-12.1: access restriction for training data
- Control C-12.2: data retention rules applied
- Check T-12.1: quarterly access review completed
- Check T-12.2: retention job success logs
- Evidence E-12.x: approvals, logs, reports
Actionable tip: For each control, define a pass/fail criterion. Avoid “reviewed” as a status unless “reviewed” is tied to a decision and recorded outcome.
Step 4: Create the compliance mapping model (system → components → obligations)
The compliance mapping engine is essentially a graph:
- AI system (product)
- Use cases (what decisions it supports)
- Model(s) (versions, architecture, training runs)
- Data pipelines (sources, preprocessing, labeling)
- Runtime services (APIs, UI, logging, monitoring)
- Human processes (approvals, escalation, incident handling)
Map obligations to the level where they are actually controlled:
- Some obligations map to the system (user transparency)
- Some to the model (performance evaluation, drift monitoring)
- Some to the data pipeline (data quality, provenance)
- Some to operations (incident response, access governance)
- Some to vendor dependencies (third-party model provider obligations)
Actionable tip: Do not force all obligations to map only at system level. You’ll lose traceability and create duplicate evidence requests.
Step 5: Set up an evidence workflow that’s audit-ready
Compliance mapping fails when evidence is scattered. Your engine should standardize evidence capture with these attributes:
- Evidence ID
- Linked obligation(s) and control(s)
- Artifact type (policy, test report, monitoring dashboard export, model card, DPIA-like assessment, change request)
- Owner
- Creation date and validity period
- System/model version it applies to
- Integrity (who approved, immutable record if possible)
Establish evidence tiers:
- Tier 1 (must-have): required to ship or operate
- Tier 2 (supporting): strengthens defensibility
- Tier 3 (nice-to-have): improves maturity but not blocking
Actionable tip: Make evidence reusable across obligations. A single risk assessment can support multiple requirements if it’s structured and cross-referenced.
Step 6: Operationalize tracking with clear statuses and gates
Define a consistent set of statuses so dashboards are meaningful:
- Not applicable (with rationale)
- Planned (work item created)
- In progress
- Implemented
- Verified (check passed)
- Exception granted (time-bound, approved)
- Non-compliant (known gap with remediation plan)
Tie these statuses to lifecycle gates:
- Design gate: applicability + initial risk assessment complete
- Pre-build gate: data and model documentation initiated
- Pre-release gate: required controls implemented + verified
- Post-release gate: monitoring, incident processes active
Actionable tip: Treat “exception granted” as a first-class state with expiry. Exceptions without expiration become permanent gaps.
Step 7: Add automated checks where they reduce burden (and document the rest)
Not everything can be automated, but a practical engine uses automation to prevent regressions:
Automatable checks often include:
- Required documentation present (model cards, change logs)
- Dataset lineage fields filled (source, license/permission flags, retention tags)
- Logging enabled and validated (coverage checks)
- Access control checks (group membership, least-privilege rules)
- Monitoring alerts configured (drift, anomalies, safety filters)
- Release checks (no deployment without verified obligations)
For non-automatable obligations (e.g., human oversight design), standardize:
- Templates for review
- Approval workflows
- Recorded decisions and sign-offs
Actionable tip: Every automated check should produce evidence output (a report artifact) linked to obligations and versions.
Step 8: Handle shared services and vendors with dependency mapping
If you rely on third-party models, labeling vendors, or managed platforms, your engine must map obligations to dependencies and contracts.
Create a dependency record per vendor/component:
- What the vendor provides (model, infrastructure, data)
- What obligations are inherited, shared, or retained
- Evidence expected from the vendor (security reports, change notices, incident SLAs)
- Your internal compensating controls if vendor evidence is limited
Actionable tip: Mark each obligation with a responsibility model: Responsible, Accountable, Consulted, Informed. Many compliance gaps come from unclear boundaries.
Step 9: Build a reporting layer that answers the questions regulators and executives ask
A good compliance mapping engine produces reports that answer:
- Which obligations apply to this system and why?
- What is the current compliance status per obligation?
- What evidence proves compliance, and for which version?
- What changed since last release (models, data, features)?
- What exceptions exist, who approved them, and when do they expire?
- What incidents occurred, and how were they handled?
Design dashboards for different audiences:
- Engineering view: open tasks, failing checks, required evidence
- Compliance/legal view: obligation status, exceptions, audit pack readiness
- Executive view: risk posture, trend over time, high-severity gaps
Actionable tip: Include a “time travel” capability: reproduce the compliance state as-of a specific release or date.
Step 10: Run continuous improvement with gap reviews and control refinement
Once the engine is live, improve it systematically:
- Monthly review of recurring failed obligations (root cause)
- Simplify obligations that produce ambiguous results
- Merge duplicate evidence requests
- Add automation for high-frequency manual checks
- Update applicability logic when products evolve
Set a cadence for:
- Control testing (periodic verification)
- Evidence refresh (expired artifacts)
- Obligation library updates (new interpretations, product scope changes)
Actionable tip: Track “time-to-verified” per obligation as an operational metric (not as a performance score for individuals). It helps prioritize automation and template improvements.
An obligation tracking and compliance mapping engine works when it is version-aware, evidence-driven, and built around applicability. With a structured obligation library, a clear mapping model, and disciplined evidence workflows, professionals can assess AI systems against a defined set of 43 EU obligations in a way that is repeatable, scalable, and defensible under scrutiny.