Operations | Monitoring | ITSM | DevOps | Cloud

Observability for LLM Apps and Agents: OpenLIT SDK + VictoriaMetrics observability stack

Many “LLM observability with OpenTelemetry” tutorials stop at a single chat.completions span. That works for a demo, but it leaves gaps once an agent fans out into 30 tool calls, two vector-DB queries, three handoffs, and a 90-second tail latency you need to attribute. This post wires the OpenLIT SDK (50+ instrumentations, OTel GenAI semantic conventions, one line of code) into the full VictoriaMetrics observability stack and shows query examples that turn agent telemetry into decisions.

To learn and improve, we cannot be afraid to fail

“Deployment stress doesn’t just come from high-profile public outages. It often starts much earlier, when a fear of failure seeps into team culture.” Rob Richardson, Software Craftsman Rob certainly knows the stress and embarrassment of public deployment failures. "But overall" he reflects, "I’ve had more stress in my career from internal failures.