What "rights-aware ingestion" actually means
Executive Briefing 03 · 3 pages · Updated March 2026
Definition
"Rights-aware ingestion" means that every piece of content entering your AI training pipeline has its rights status documented, verified, and respected before ingestion. It is not a policy statement — it is an engineering requirement with six measurable audit areas.
The six audit areas
- Source identification: Every training item has a documented source with rights-holder attribution.
- Licence verification: The legal basis for ingestion is documented — owned, licensed, public domain, or statutory exception.
- TDM opt-out checking: Machine-readable opt-out signals are checked before ingestion. Content carrying opt-outs is excluded unless separately licensed.
- CDR registration: Ingested content is cross-referenced with the CIP Rights Registry for active Core Data Records.
- Consent expiry monitoring: Time-limited consents are tracked. Content is removed from active training when consent expires.
- Audit trail: All ingestion decisions are logged with timestamps, decision rationale, and responsible person.
The 95% threshold
CIP Platform Certification Level 2 requires that 95% of your training corpus has documented rights coverage — meaning a CDR record, a valid licence, or a verified public-domain determination. The remaining 5% must have an active remediation plan with named deadlines.
This is not aspirational. Platforms that cannot demonstrate 95% coverage cannot certify at Level 2. The threshold reflects the operational reality that legacy content may take time to audit, but the vast majority of a responsible operator's corpus should be documented.