Operations | Monitoring | ITSM | DevOps | Cloud

Instrument zerocode observability for LLMs and agents on Kubernetes

Building AI services with large language models and agentic frameworks often means running complex microservices on Kubernetes. Observability is vital, but instrumenting every pod in a distributed system can quickly become a maintenance nightmare. OpenLIT Operator solves this problem by automatically injecting OpenTelemetry instrumentation into your AI workloads—no code changes or image rebuilds required.

Benchmarking Kubernetes Log Collectors: vlagent, Vector, Fluent Bit, OpenTelemetry Collector, and more

At VictoriaMetrics, we built vlagent as a high-performance log collector for VictoriaLogs. To validate its performance and correctness under a real production-like load, we developed a benchmark suite and ran it against 8 popular log collectors. This post covers the methodology, throughput results, resource usage, and delivery correctness. Collectors under the test: We’ve made all benchmark configurations and source code public, so you can reproduce and verify the results independently.

Claude Code + Lightrun MCP: Your AI Agent Now Has Live Runtime Vision

Claude Code, Anthropic’s coding agent, now integrates with Lightrun through MCP. AI code assistants have been flying blind. Google Dora’ 2025 report found it is causing, an almost 10% increase in code instability. Even with up to 1M tokens of context available in Claude, this powerful agenti cannot see how the code it writes actually behaves inside a live system under real traffic, real dependencies, and under a load of 10,000 requests per second.

Claude Code is running bash commands on your infrastructure. Here's how to watch it.

I’ve been staring at Claude Code telemetry for the past few weeks, and I keep noticing the same thing: most teams drop it into their environment, say “it’s amazing,” and have absolutely no idea what it’s actually doing at the system level. That’s fine for a personal dev tool. It’s not fine when you’ve rolled it out to 50 engineers.

How to Perform a Network Health Check: Step-by-Step Guide

Your apps are slow. Users are complaining. You're staring at a dashboard trying to figure out what broke and when. Sound familiar? This is the reality of reactive network monitoring. By the time someone opens a ticket, the issue has already been affecting performance for minutes, sometimes hours. A network health check flips that script. Instead of chasing problems after the fact, you're catching them before users ever notice.

You're probably overdue for a Sentry SDK upgrade

Session Replay. Structured logs. AI monitoring. Automatic OpenTelemetry tracing. Feature flag tracking. If you haven't seen these in your Sentry dashboard, your SDK version is probably the reason. Whether you're on @sentry/react, @sentry/nextjs, @sentry/vue, @sentry/angular, @sentry/sveltekit, or any other @sentry/* package, they all version together. When we say v10, we mean all of them.