Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Connecting Agents for Real-Time Root Cause Analysis with Checkly's Rocky AI

Rocky, Checkly's AI agent, monitors production sites and provides an analysis for every failing check. Previously, a coding agent couldn't access this analysis, leaving incidents and agents disconnected. Now, you can access all the analyses via the Checkly CLI (or API) and tell your coding agent, "Hey, I got a Checkly alert. Please investigate!" With Rocky's structured analysis delivered inline, the coding agent can start with a strong hypothesis, fix issues, and propose a PR in one session.

From Vibes to Signals: Observing Your AI Coding Workflow

Agentic coding tools like Claude Code and Codex have taken centre stage and inserted themselves into the critical path of software development. This shift has happened fast, and for most teams, the visibility hasn’t caught up. Until now we’ve been evaluating our vibe coding the same way – on vibes. You might say “this feels faster” or “that seems like a better approach”. That’s not going to scale.

Sentry + Stripe Projects: From Zero to Error Monitoring in Two Commands

No signup form. No dashboard. No copy-pasting DSNs. Sentry is now a provider on Stripe Projects, which means you can provision a fully configured Sentry project — error monitoring, tracing, and session replay — straight from the CLI in two commands. In this demo, we walk through the full workflow: initializing a project, provisioning Sentry, upgrading and downgrading plans, using magic login to jump straight into your dashboard, and letting a coding agent (Claude Code) handle it all for you.

New in the Honeycomb Academy: Learn to Use the Honeycomb MCP

Two things happen when engineers first connect the Honeycomb MCP to their AI assistant. The first is the blank page problem. The Honeycomb UI gives you something to react to: a heatmap, a query builder, a trace to click into. An AI assistant gives you a cursor and nothing else. When you don't know where to start, that's a hard place to be. The second shows up right after you get past the first one. You ask a question, you get a confident-sounding answer, and you're not sure whether to trust it.

Two AI agents, one incident: Rocky AI comes to the terminal

A Playwright Check fails at 2 am. The login flow is broken. Until today, that alert triggered a human to get up, open the Checkly dashboard, copy Rocky AI root cause analysis (RCA), and then tell an agent to get to work. There were two AI agents, one incident, and no way for them to talk to each other. The extended checkly checks and new checkly rca CLI commands close that gap. Your coding agent can now pull Rocky AI's analysis into its ongoing work, read the diagnosis, and go fix the code.

How to run a proof of concept that de-risks your monitoring decision

Part 3, key insights from a fireside chat with Chris Yates. Read part 1 here, and part 2 here. Most database monitoring proof of concepts (POCs) answer the wrong questions. Here's how to structure a proof of concept that genuinely de-risks your vendor decision with the questions to ask during the process. A POC is often treated as the final hurdle in vendor evaluation, but too often, it becomes theatre. A guided tour of the flashiest features, run by one person, under unrealistic conditions.

End-to-End Trace Propagation Across SQS and Lambda with OpenTelemetry

SQS doesn't propagate trace context automatically. You instrument both sides, deploy, and get two disconnected traces. This post shows how to wire them into one waterfall — and the ESM format gotcha that silently breaks it every time. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.