Asimov's Zeroth Law of Robotics: testing and observing AI (ExpoQA 2026)

Grafana

Jun 5, 2026

Asimov's Three Laws of Robotics are missing one — and when it comes to testing and observing AI, Nicole van der Hoeven argues that missing rule changes everything: before a robot can avoid harm, obey orders, or protect itself, there has to be a Zeroth Law: a robot must be observable. Because if you can't see what a system is doing, you have no way of knowing whether it's following any rule at all.

In this talk, Nicole — Senior Developer Advocate at Grafana Labs and a tester at heart — makes the case that the AI industry is busy reinventing a discipline that already belongs to testers. "Evals" are tests. "Graders" are test oracles. "LLM-as-judge" is an automated test oracle. "Offline evals" are pre-prod testing; "online evals" are testing in production. The terminology is new; the methodology is decades old. Testing and observability are the same act, separated only by when you do it — and both depend on the same precondition: the system has to be observable.

Along the way: Speedy the mining robot stuck in a loop on Mercury, HAL 9000 refusing to open the pod bay doors, a working cheat sheet for translating AI jargon back into testing terms, and a live look at how she tests an AI Dungeon Master (with Data from Star Trek as the player) using k6, OpenTelemetry, and Grafana. Plus what AI specifically fails at — hallucination, toxicity, bias, and drift — and why testing the outcome alone is never enough when the reasoning can be wrong.

If you're a tester wondering whether you're being left behind by AI: you're not. You've been doing this all along.

This was a talk that Nicole gave at ExpoQA 2026 in Madrid.

Timestamps:

0:00 Isaac Asimov & the Three Laws of Robotics

6:03 The Zeroth Law — a robot must be observable

7:25 What makes AI different to test (non-determinism, subjectivity, context, cost, state)

13:35 The cheat sheet: AI terms ↔ testing terms

19:24 Five approaches to testing AI

21:27 Benchmarking & human evaluation (LMArena)

22:53 The demo: a D&D chatbot instrumented with k6, OpenTelemetry & Grafana

25:22 Writing an eval: code-based testing

27:46 LLM-as-judge: model-based evaluation