Paul Hayes

Independent Consultant (Switzerland)

Correct, Fast, Harmful. Works as Designed, Not as Desired. Validating Automated Resilience Decisions in Embedded and Complex Systems

Modern embedded and complex systems are increasingly governed by automated decision systems operating at machine speed. Restart policies, autoscaling rules, circuit breakers, retry logic, failover mechanisms, watchdog timers, and AI-assisted remediation now make operational decisions that directly determine how incidents evolve.

Yet organisations rarely validate whether these automated interventions improve or degrade system outcomes. Traditional testing verifies that automated controls trigger and execute correctly. It does not verify whether their intervention produces a better system outcome. A restart policy may fire exactly as designed and still amplify an outage through cascading restarts or dependency overload.

This paper introduces a practical method for validating automated operational decisions. Using controlled counterfactual comparison, teams reproduce the same stress scenario with automation enabled and disabled, then measure the outcome difference.

The technique is not a universal framework but a practical investigative method applicable across embedded systems engineering, reliability engineering, resilience engineering, and distributed systems operations. The technique answers one question most teams never ask: did the automation actually help?


Buy Tickets now

AI systems are taking on increasing operational responsibility. They restart services, reroute traffic, shut down processes, and make decisions that determine how incidents evolve — at machine speed, without human oversight. When they work, recovery is faster. When they are wrong, they can amplify failures before anyone has time to intervene.

Most organisations have never tested which one is true.

Paul Hayes has spent 18 years testing critical systems across Swiss banking and financial infrastructure. His paper “Correct, Fast, Harmful” introduces counterfactual comparison to reliability engineering — a structured method for generating real evidence about automated decisions, before a production incident generates it for you.

Paul is a board member of the Association for Software Testing and a regular conference speaker.