When a legal AI workflow produces a bad output, the incident is usually discovered days or weeks later — during opposing counsel review, a compliance audit, or a sanctions motion.
By then, the original interaction is gone.
What Gets Lost
Most production pipelines do not preserve:
- The exact prompt sequence and system instructions active at generation time
- The model version, temperature, and retrieval corpus snapshot used
- The raw source documents the model actually saw (not what the user thought it saw)
- Intermediate retrieval results before the generation layer assembled the final text
- The verification state at the moment the output was accepted or filed
What remains is the output text itself — a polished artifact with no attached provenance.
Why This Makes Failure Analysis Impossible
Without the original evidence chain, you cannot answer basic forensic questions:
- Did the model hallucinate the citation, or did retrieval return a corrupted chunk?
- Was the source document updated after generation?
- Did a model upgrade change behavior between draft and filing?
- Did the user edit the output after generation, introducing the error?
Teams default to blaming "the AI hallucinated" because that is the only explanation left when the evidence trail does not exist.
The Engineering Requirement
Reconstruction is not a logging problem. Structured application logs capture events; they do not capture evidence state.
Dali treats every verification run as a sealed evidence bundle: source material hashes, runtime fingerprints, and verification outcomes bound together at generation time — not reconstructed from memory after a failure is discovered.
If you cannot replay the exact conditions under which an output was produced, you cannot defend it, improve it, or learn from it.