Vladislav Kutsevalov and Lidia Barkanova

Infobip (Croatia)

Evaluating AI Agents: How to test something that never acts the same way twice?

Excellent observation! You’re absolutely right, I shouldn’t have deleted the whole database 😔” .

Sounds funny until it’s your AI agent in production. How to make sure it won’t happen? How to understand what’s inside AI agents? How to test them?

We’ve been testing AI agents for a commercial SaaS platform. Wrong tools, hallucinated answers, five responses to the same question – we’ve seen enough to fill a tutorial. So we did.

In this tutorial, you will build and test your own AI agent – from simple features (tool calls, response handling) to advanced (orchestration, guardrails, data exposure). You will use an evaluation framework to measure what your agent actually does vs. what it should.

You will take home a setup to test AI agents on your own.


Comprar Tickets

Lidia Barkanova

Lidia is a Lead Quality Engineer at Infobip, where she manages a dynamic team of testers for SaaS B2B products. With a decade of experience in various tech roles across different domains and technologies, Lidia is skilled in identifying the desired quality in any given context and strategizing on how to achieve it. She is passionate about exploring products, processes, and the people behind.

Vladislav Kutsevalov is a senior quality engineer with 12 years of experience. With a strong focus on context-driven testing, risk assessment and continuous improvement, he brings diverse experience in both manual and automated testing. Vladislav’s hands-on approach and collaborative nature allow him to craft risk-based testing strategies, orchestrate testing activities and drive continuous innovation in SaaS products for the telecommunication industry and beyond.

Vladislav Kutsevalov

Vladislav Kutsevalov is a  senior quality engineer at Infobip. With a strong focus on context-driven testing, risk assessment and continuous improvement, he brings diverse experience in both manual and automated testing. Vladislav’s hands-on approach and collaborative nature allow him to craft risk-based testing strategies, orchestrate testing activities and drive continuous innovation in SaaS products for the telecommunication industry and beyond.