← yenklabs.com

// Artifact · v0.1

Dali Citation Benchmark v0.1

2026-06-08

Overview

Open benchmark seed built from documented legal AI failure incidents in the YenkLabs failure database. Measures whether cited authorities can be classified by what broke — not whether a link returns HTTP 200.

Repo: github.com/yenklabs/dali · Dataset: huggingface.co/datasets/yenklabs/dali-citation-benchmark


Corpus Summary

Metric Count
Documented cases 5
Cited authorities evaluated 14
Total authority fabrication 6
Unsupported propositions 5
Jurisdictional errors 2
Missing source trails 5
Unverifiable outputs 5
Metadata conflicts 0
Superseded authorities 0

v0.1 seed corpus. Expanding toward 50 documented cases in v0.2.


Failure Type Distribution

Failure type Cases Share
Total Authority Fabrication 2 40%
Existing Citation, Unsupported Proposition 2 40%
Jurisdictional Hallucination 1 20%

Cases in Corpus

ID Case Outcome class Severity
001 Mata v. Avianca Authority Not Found Critical
002 Kistler v. LegalTech Authority Not Found Critical
003 Park / Michael Cohen Proposition Unsupported High
004 Real Case, False Proposition Proposition Unsupported Critical
005 Anonymized Sandbox RAG Jurisdictional Mismatch High

Verification Outcomes (Dali Taxonomy)

Every citation in the corpus maps to one of five outcomes:

  1. Verified — authority exists, proposition supported, evidence bundle complete
  2. Authority Not Found — citation does not resolve in canonical registries
  3. Proposition Unsupported — authority exists, attributed holding fails entailment check
  4. Source Trail Missing — no preserved primary source or retrieval snapshot
  5. Unverifiable — insufficient evidence preserved to classify

v0.1: 0 verified · 6 authority not found · 5 proposition unsupported · 5 source trail missing · 5 unverifiable


Methodology

  1. Document incident from court records, sanctions filings, or controlled sandbox runs
  2. Extract raw LLM output artifact
  3. Run Dali existence pass against canonical reporter indices
  4. Run proposition support pass where authority exists
  5. Classify evidence preservation state at generation time
  6. Publish reproducible markdown record with sealed verification metadata

Full methodology: Dali METHODOLOGY.md


Citation

YenkLabs Legal AI Failure Database v0.1 (2026)
https://yenklabs.com/failures
Corpus: 5 documented incidents, 14 authorities evaluated
Artifact from the Dali open evidence corpus.