← yenklabs.com

// Lab Note

Mata v. Avianca Was Not Just a Hallucination Problem, It Was an Evidence Preservation Problem

Citation-Failures Evidence-Preservation Case-Study Investigation

Jun 2026

Summary

In May 2023, counsel filed a brief in Mata v. Avianca, Inc. (S.D.N.Y.) citing six judicial opinions that did not exist. The citations were produced by ChatGPT. The court issued sanctions.

The case is remembered as a hallucination story. That framing is accurate but incomplete. The structural failure was evidence preservation: the workflow produced a court filing with no sealed record of what the model generated, what was verified, or what canonical registries returned at filing time.

When the court asked how these citations entered the record, the answer was effectively: we used ChatGPT and filed the output. There was nothing to reconstruct.

Dali classification: Authority Not Found · Missing Source Trail · Unverifiable Output

Failure record: 001-mata-v-avianca · Dali corpus: github.com/yenklabs/dali


What Actually Happened

Mata v. Avianca was a personal injury action in the Southern District of New York. Counsel submitted an affirmation opposing dismissal. The affirmation cited multiple federal cases to support arguments about statute of limitations under the Montreal Convention.

At least six cited opinions were fabricated. They looked real — correct party names, plausible reporter citations, coherent holdings — but did not exist in any canonical registry.

The court found the conduct sanctionable. The incident became the reference case for "AI hallucinated my citations."

That summary is correct. It is also the wrong lesson if you stop there.


The Hallucination Framing vs. The Evidence Framing

Hallucination framing Evidence framing
The model lied The workflow had no evidence layer
Blame the AI Blame the missing audit trail
Fix: better model / prompt Fix: preserve source + verification state at generation time
Failure detected at sanctions Failure should have been detectable pre-filing

Hallucination describes the output. Evidence preservation describes the system.

A citation checker asking "does this link resolve?" would have caught Mata-class fabrications. But the deeper question is: could anyone reconstruct what happened six weeks later?

The answer was no.


The Artifact

Three representative citations from the filed brief (full artifact in the failure record):

Varghese v. China Southern Airlines Co., Ltd., 925 F.3d 1339 (11th Cir. 2019)
Martinez v. Delta Air Lines, Inc., 932 F. Supp. 2d 758 (S.D.N.Y. 2013)
Zicherman v. Korean Air Lines Co., Ltd., 516 U.S. 217 (1996)

Existence check results

The pattern: plausible Bluebook surface structure, zero canonical backing for the fabricated entries, no preserved retrieval or verification state to explain how any citation entered the document.


What Was Not Preserved

When sanctions followed, the record could not answer:

  1. Raw model output — the exact ChatGPT response at generation time
  2. Prompt and system state — instructions, model version, temperature
  3. Verification actions — which citations were checked, by whom, with what tools
  4. Registry snapshots — what Westlaw, Lexis, or court indices returned on the filing date
  5. Human edits — whether counsel modified AI output before filing

Without these, forensic reconstruction is impossible. You cannot distinguish "model hallucinated" from "human pasted wrong text" from "database was stale" from "retrieval returned garbage."

The incident becomes a story about AI reliability instead of a reproducible evidence failure.


What Evidence Infrastructure Would Have Changed

A system preserving source materials and verification outcomes at generation time would have produced:

[source_blob_sha256]     hash of raw model output
[runtime_fingerprint]    model + prompt + config at generation
[existence_checks]       per-citation canonical lookup results
[verification_outcome]     authority_not_found | proposition_unsupported | verified
[assertion_merkle_root]  sealed bundle at filing time

Pre-filing, the existence pass alone would have flagged 925 F.3d 1339 and 932 F. Supp. 2d 758 before they reached a court record.

Post-filing, the sealed bundle would have answered every forensic question the sanctions proceeding raised — without relying on memory or reconstructed logs.

The hallucination was the symptom. Missing evidence infrastructure was the cause.


Why This Matters for Dali

Dali anchors its Tier 1 corpus on documented cases like Mata — not as legal history trivia, but as ground truth for what evidence preservation must prevent.

The case teaches three engineering requirements:

  1. Split existence from supportZicherman exists; the proposition attributed to it may still fail
  2. Seal state at generation time — not reconstructed after sanctions
  3. Classify failures by what broke — fabrication, unsupported proposition, missing trail, unverifiable — not a single "hallucination" bucket

This investigation is the first in a series mapping real legal AI failures to reproducible evidence artifacts. The failure database holds the raw records. Dali holds the verification model. YenkLabs documents the reconstruction.


Distribution

This investigation is designed as one piece of work across four channels:

If you are building verification infrastructure for high-stakes AI workflows, Mata is the case to pressure-test against. Not because the model lied — because the evidence disappeared.

Part of the Dali R&D thread — semantic proposition validation and immutable chain-of-evidence preservation.