BACK

How not to fail AI deployment. Testing non deterministic agents.

Share:

12:25 - 12:55, 20th of May (Wednesday) 2026 / AI & Architecture

Before AI there were unit tests, integration tests, e2e tests, regression tests, test evidence etc​.
Now people create agents software and run it without any testing or validation strategy.​
The agents work somehow and produce something.
Since agents are non deterministic they may fail even without changes.​

Many AI production deployments fail because of this.​

On this presentation I will show what to do in order to succeed:
1. Why LLMs are not deterministic even if temperature is 0​
2. Old testing vs gen AI agents testing​
3. Completely new field of skills and expertise​
4. Concrete examples of deterministic benchmarks​
5. Handling difficult cases​
6. Defeating the purpose? Using LLM based benchmarks​ with example
7. Reusing the intuitions from multi thread application testing and finance world
8. LLM biases to handle

LEVEL:
Basic Advanced Expert
TRACK:
AI Architecture & Software Future of Work