Why Most AI Failures Cannot Be Reconstructed After the Fact

When a legal AI workflow produces a bad output, the incident is usually discovered days or weeks later — during opposing counsel review, a compliance audit, or a sanctions motion.

By then, the original interaction is gone.

What Gets Lost

Most production pipelines do not preserve:

The exact prompt sequence and system instructions active at generation time
The model version, temperature, and retrieval corpus snapshot used
The raw source documents the model actually saw (not what the user thought it saw)
Intermediate retrieval results before the generation layer assembled the final text
The verification state at the moment the output was accepted or filed

What remains is the output text itself — a polished artifact with no attached provenance.

Why This Makes Failure Analysis Impossible

Without the original evidence chain, you cannot answer basic forensic questions:

Did the model hallucinate the citation, or did retrieval return a corrupted chunk?
Was the source document updated after generation?
Did a model upgrade change behavior between draft and filing?
Did the user edit the output after generation, introducing the error?

Teams default to blaming "the AI hallucinated" because that is the only explanation left when the evidence trail does not exist.

The Engineering Requirement

Reconstruction is not a logging problem. Structured application logs capture events; they do not capture evidence state.

Dali treats every verification run as a sealed evidence bundle: source material hashes, runtime fingerprints, and verification outcomes bound together at generation time — not reconstructed from memory after a failure is discovered.

If you cannot replay the exact conditions under which an output was produced, you cannot defend it, improve it, or learn from it.