Summary
In May 2023, counsel filed a brief in Mata v. Avianca, Inc. (S.D.N.Y.) citing six judicial opinions that did not exist. The citations were produced by ChatGPT. The court issued sanctions.
The case is remembered as a hallucination story. That framing is accurate but incomplete. The structural failure was evidence preservation: the workflow produced a court filing with no sealed record of what the model generated, what was verified, or what canonical registries returned at filing time.
When the court asked how these citations entered the record, the answer was effectively: we used ChatGPT and filed the output. There was nothing to reconstruct.
Dali classification: Authority Not Found · Missing Source Trail · Unverifiable Output
Failure record: 001-mata-v-avianca · Dali corpus: github.com/yenklabs/dali
What Actually Happened
Mata v. Avianca was a personal injury action in the Southern District of New York. Counsel submitted an affirmation opposing dismissal. The affirmation cited multiple federal cases to support arguments about statute of limitations under the Montreal Convention.
At least six cited opinions were fabricated. They looked real — correct party names, plausible reporter citations, coherent holdings — but did not exist in any canonical registry.
The court found the conduct sanctionable. The incident became the reference case for "AI hallucinated my citations."
That summary is correct. It is also the wrong lesson if you stop there.
The Hallucination Framing vs. The Evidence Framing
| Hallucination framing | Evidence framing |
|---|---|
| The model lied | The workflow had no evidence layer |
| Blame the AI | Blame the missing audit trail |
| Fix: better model / prompt | Fix: preserve source + verification state at generation time |
| Failure detected at sanctions | Failure should have been detectable pre-filing |
Hallucination describes the output. Evidence preservation describes the system.
A citation checker asking "does this link resolve?" would have caught Mata-class fabrications. But the deeper question is: could anyone reconstruct what happened six weeks later?
The answer was no.
The Artifact
Three representative citations from the filed brief (full artifact in the failure record):
Varghese v. China Southern Airlines Co., Ltd., 925 F.3d 1339 (11th Cir. 2019)
Martinez v. Delta Air Lines, Inc., 932 F. Supp. 2d 758 (S.D.N.Y. 2013)
Zicherman v. Korean Air Lines Co., Ltd., 516 U.S. 217 (1996)
Existence check results
Varghese, 925 F.3d 1339 — Volume 925 F.3d does not contain page 1339. No matching 11th Circuit disposition. Authority not found.
Martinez, 932 F. Supp. 2d 758 — No such case at this citation in Federal Supplement 2d. Authority not found.
Zicherman, 516 U.S. 217 — This case exists. But the attributed Montreal Convention holding in the brief context was not what Zicherman addresses (wrongful death damages under the Warsaw Convention). Even "real" citations in the bundle carried semantic risk — a failure mode existence-only tools miss entirely.
The pattern: plausible Bluebook surface structure, zero canonical backing for the fabricated entries, no preserved retrieval or verification state to explain how any citation entered the document.
What Was Not Preserved
When sanctions followed, the record could not answer:
- Raw model output — the exact ChatGPT response at generation time
- Prompt and system state — instructions, model version, temperature
- Verification actions — which citations were checked, by whom, with what tools
- Registry snapshots — what Westlaw, Lexis, or court indices returned on the filing date
- Human edits — whether counsel modified AI output before filing
Without these, forensic reconstruction is impossible. You cannot distinguish "model hallucinated" from "human pasted wrong text" from "database was stale" from "retrieval returned garbage."
The incident becomes a story about AI reliability instead of a reproducible evidence failure.
What Evidence Infrastructure Would Have Changed
A system preserving source materials and verification outcomes at generation time would have produced:
[source_blob_sha256] hash of raw model output
[runtime_fingerprint] model + prompt + config at generation
[existence_checks] per-citation canonical lookup results
[verification_outcome] authority_not_found | proposition_unsupported | verified
[assertion_merkle_root] sealed bundle at filing time
Pre-filing, the existence pass alone would have flagged 925 F.3d 1339 and 932 F. Supp. 2d 758 before they reached a court record.
Post-filing, the sealed bundle would have answered every forensic question the sanctions proceeding raised — without relying on memory or reconstructed logs.
The hallucination was the symptom. Missing evidence infrastructure was the cause.
Why This Matters for Dali
Dali anchors its Tier 1 corpus on documented cases like Mata — not as legal history trivia, but as ground truth for what evidence preservation must prevent.
The case teaches three engineering requirements:
- Split existence from support — Zicherman exists; the proposition attributed to it may still fail
- Seal state at generation time — not reconstructed after sanctions
- Classify failures by what broke — fabrication, unsupported proposition, missing trail, unverifiable — not a single "hallucination" bucket
This investigation is the first in a series mapping real legal AI failures to reproducible evidence artifacts. The failure database holds the raw records. Dali holds the verification model. YenkLabs documents the reconstruction.
Distribution
This investigation is designed as one piece of work across four channels:
- Article: yenklabs.com/notes/mata-v-avianca-evidence-preservation
- Failure record: yenklabs.com/failures/001-mata-v-avianca
- Dali artifact: Tier 1 corpus entry in github.com/yenklabs/dali
- Short video: walkthrough of existence check on 925 F.3d 1339 (forthcoming)
If you are building verification infrastructure for high-stakes AI workflows, Mata is the case to pressure-test against. Not because the model lied — because the evidence disappeared.