Asimov's Zeroth Law of Robotics: testing and observing AI (ExpoQA 2026)

Jun 5, 2026

Asimov's Three Laws of Robotics are missing one — and when it comes to testing and observing AI, Nicole van der Hoeven argues that missing rule changes everything: before a robot can avoid harm, obey orders, or protect itself, there has to be a Zeroth Law: a robot must be observable. Because if you can't see what a system is doing, you have no way of knowing whether it's following any rule at all.

In this talk, Nicole — Senior Developer Advocate at Grafana Labs and a tester at heart — makes the case that the AI industry is busy reinventing a discipline that already belongs to testers. "Evals" are tests. "Graders" are test oracles. "LLM-as-judge" is an automated test oracle. "Offline evals" are pre-prod testing; "online evals" are testing in production. The terminology is new; the methodology is decades old. Testing and observability are the same act, separated only by when you do it — and both depend on the same precondition: the system has to be observable.

Along the way: Speedy the mining robot stuck in a loop on Mercury, HAL 9000 refusing to open the pod bay doors, a working cheat sheet for translating AI jargon back into testing terms, and a live look at how she tests an AI Dungeon Master (with Data from Star Trek as the player) using k6, OpenTelemetry, and Grafana. Plus what AI specifically fails at — hallucination, toxicity, bias, and drift — and why testing the outcome alone is never enough when the reasoning can be wrong.

If you're a tester wondering whether you're being left behind by AI: you're not. You've been doing this all along.

This was a talk that Nicole gave at ExpoQA 2026 in Madrid.

Timestamps:

0:00 Isaac Asimov & the Three Laws of Robotics

6:03 The Zeroth Law — a robot must be observable

7:25 What makes AI different to test (non-determinism, subjectivity, context, cost, state)

13:35 The cheat sheet: AI terms ↔ testing terms

19:24 Five approaches to testing AI

21:27 Benchmarking & human evaluation (LMArena)

22:53 The demo: a D&D chatbot instrumented with k6, OpenTelemetry & Grafana

25:22 Writing an eval: code-based testing

27:46 LLM-as-judge: model-based evaluation

30:37 Comparing results & observability in production

34:22 What testers can bring that AI engineering doesn't (yet)

37:00 Back to Speedy: the Zeroth Law resolved

41:33 Testing's role in building AI

Links/resources:
Nicole's Asimov repo with the demo app she used: https://nicole.to/asimov
More about Nicole: https://nicolevanderhoeven.com
Learn about AI Observability: https://grafana.com/docs/grafana-cloud/monitor-applications/ai-observability/
Get started with the Grafana Cloud forever-free tier: https://grafana.com/g/cloud
Have a question? Ask Grot, your AI helper: https://grafana.com/grot/
Reach out in our community forums: https://gra.fan/communityyf

Thanks for watching!

👍 Was this video helpful? Like and subscribe to our channel for more videos.

Connect with Grafana Labs:
X: (https://www.twitter.com/grafana)
LinkedIn: (https://www.linkedin.com/company/grafana-labs/)
Facebook: (https://www.facebook.com/grafana)

#Grafana #Observability