Monthly Archive

Get Lightrun AI Skills: Expert Workflows for AI Agents

May 18, 2026 By Gidi Freud In Lightrun

Today we’re launching Lightrun AI Skills, structured, repeatable investigation workflows built for AI coding agents. With Lightrun MCP, agents like Claude Code, Codex, and Cursor can already instrument live production services and reason over live runtime evidence without a redeployment. But AI agents remain non-deterministic by design, using the same tool differently every session.

Read Post

Lightrun

Read more about Get Lightrun AI Skills: Expert Workflows for AI Agents

Why Alert Fatigue Solutions Still Miss the Root Cause

May 11, 2026 By Lightrun Team In Lightrun

Alert fatigue solutions have never been better, but on-call engineers are still burning out. Threshold tuning, AI triage, and alert correlation reduce the noise, but every alert that clears filtering lands with the same incomplete telemetry and triggers the same manual investigation cycle. This post explains why the evidence gap survives every fix, and how runtime context changes that.

Read Post

Lightrun

Read more about Why Alert Fatigue Solutions Still Miss the Root Cause

Why Blast Radius Analysis Does Not End When Alerts Fire

May 7, 2026 By Lightrun Team In Lightrun

Modern distributed systems fail in ways that can bypass even well-designed isolation patterns. When a failure is actively propagating across services at four in the morning, the question shifts from “how do we limit the blast radius” to “how do we confirm what it actually is.” Monitoring shows which services are in the impact zone, but it cannot show what code path caused the failure to spread, or whether it has stopped.

Read Post

Lightrun

Read more about Why Blast Radius Analysis Does Not End When Alerts Fire

How to Prevent AI Agents From Deleting Production Data

May 6, 2026 By Lightrun Team In Lightrun

There’s a new question teams are asking. How can we prevent AI agents from deleting production. When Cursor deleted PocketOS’s entire production database in nine seconds, the agent wasn’t malfunctioning. It had full technical capability, but it was inferring operational authority from static code rather than live environment state. That gap between capability and context is the root cause. This article breaks down exactly how that happens, and what runtime visibility does to stop it.

Read Post

Lightrun

Read more about How to Prevent AI Agents From Deleting Production Data

Why Does MTTD Stay High Despite Observability Tools Running?

May 4, 2026 By Lightrun Team In Lightrun

Monitoring coverage, anomaly detection, and SLO-based alerting have significantly narrowed detection windows for most failure types, but MTTD remains stubbornly high for a specific silent failure. This blog covers why type mismatches, swallowed exceptions, and values that pass validation without occurring without triggering errors, and what changes when your monitoring stack can generate those signals without waiting for a failure to surface them.

Read Post

Lightrun

Read more about Why Does MTTD Stay High Despite Observability Tools Running?

Operations | Monitoring | ITSM | DevOps | Cloud

Get Lightrun AI Skills: Expert Workflows for AI Agents

Why Alert Fatigue Solutions Still Miss the Root Cause

Why Blast Radius Analysis Does Not End When Alerts Fire

How to Prevent AI Agents From Deleting Production Data

Why Does MTTD Stay High Despite Observability Tools Running?

Monthly Archive

Follow Us