How not to fail AI deployment. Testing non deterministic agents.

12:25 - 12:55, 20th of May (Wednesday) 2026 / AI & Architecture

Before AI there were unit tests, integration tests, e2e tests, regression tests, test evidence etc.
Now people create agents software and run it without any testing or validation strategy.
The agents work somehow and produce something.
Since agents are non deterministic they may fail even without changes.

Many AI production deployments fail because of this.

On this presentation I will show what to do in order to succeed:
1. Why LLMs are not deterministic even if temperature is 0
2. Old testing vs gen AI agents testing
3. Completely new field of skills and expertise
4. Concrete examples of deterministic benchmarks
5. Handling difficult cases
6. Defeating the purpose? Using LLM based benchmarks with example
7. Reusing the intuitions from multi thread application testing and finance world
8. LLM biases to handle

LEVEL:

Basic Advanced Expert

TRACK:

AI Architecture & Software Future of Work

Adam Witkowski

Capgemini