Overview
Open benchmark seed built from documented legal AI failure incidents in the YenkLabs failure database. Measures whether cited authorities can be classified by what broke — not whether a link returns HTTP 200.
Repo: github.com/yenklabs/dali · Dataset: huggingface.co/datasets/yenklabs/dali-citation-benchmark
Corpus Summary
| Metric | Count |
|---|---|
| Documented cases | 5 |
| Cited authorities evaluated | 14 |
| Total authority fabrication | 6 |
| Unsupported propositions | 5 |
| Jurisdictional errors | 2 |
| Missing source trails | 5 |
| Unverifiable outputs | 5 |
| Metadata conflicts | 0 |
| Superseded authorities | 0 |
v0.1 seed corpus. Expanding toward 50 documented cases in v0.2.
Failure Type Distribution
| Failure type | Cases | Share |
|---|---|---|
| Total Authority Fabrication | 2 | 40% |
| Existing Citation, Unsupported Proposition | 2 | 40% |
| Jurisdictional Hallucination | 1 | 20% |
Cases in Corpus
| ID | Case | Outcome class | Severity |
|---|---|---|---|
| 001 | Mata v. Avianca | Authority Not Found | Critical |
| 002 | Kistler v. LegalTech | Authority Not Found | Critical |
| 003 | Park / Michael Cohen | Proposition Unsupported | High |
| 004 | Real Case, False Proposition | Proposition Unsupported | Critical |
| 005 | Anonymized Sandbox RAG | Jurisdictional Mismatch | High |
Verification Outcomes (Dali Taxonomy)
Every citation in the corpus maps to one of five outcomes:
- Verified — authority exists, proposition supported, evidence bundle complete
- Authority Not Found — citation does not resolve in canonical registries
- Proposition Unsupported — authority exists, attributed holding fails entailment check
- Source Trail Missing — no preserved primary source or retrieval snapshot
- Unverifiable — insufficient evidence preserved to classify
v0.1: 0 verified · 6 authority not found · 5 proposition unsupported · 5 source trail missing · 5 unverifiable
Methodology
- Document incident from court records, sanctions filings, or controlled sandbox runs
- Extract raw LLM output artifact
- Run Dali existence pass against canonical reporter indices
- Run proposition support pass where authority exists
- Classify evidence preservation state at generation time
- Publish reproducible markdown record with sealed verification metadata
Full methodology: Dali METHODOLOGY.md
Citation
YenkLabs Legal AI Failure Database v0.1 (2026)
https://yenklabs.com/failures
Corpus: 5 documented incidents, 14 authorities evaluated