Operations | Monitoring | ITSM | DevOps | Cloud

KWhy? MSP Webinar

Most MSPs are sitting on a goldmine of data across their tools. The problem isn’t access, it’s knowing what *actually* matters… and how to use it to drive better outcomes. Join Amanda Doucette-Lachapelle and Kyle Christensen (Empath) as they walk through how to use KPIs to make smarter, more confident decisions, with real examples you can apply right away.

What's New in the Updated OnPage Enterprise Management Console

Take a quick walkthrough of what’s new in the updated OnPage Enterprise Management Console. In this video, we highlight the latest updates designed to give admins more visibility, flexibility and self-service control across critical communication workflows. You’ll see what’s new across the console, including: The updated Enterprise Management Console helps teams manage on-call schedules, critical alerts, escalation workflows and Dedicated Lines more efficiently from one centralized place.

Creating Schedule Overrides in OnPage

Learn how override schedules work in OnPage and how admins can quickly manage temporary on-call coverage changes without rebuilding the entire schedule. With OnPage overrides, teams can adjust coverage for vacations, sick days, shift swaps, after-hours changes or last-minute availability issues. During the override window, alerts are automatically routed to the covering responder. Once the override ends, the schedule returns to the regular on-call rotation.

The Data Plane Reality: OTel Scales, While Topology UX Lags

OpenTelemetry won the architectural standards battle. At scale, though, telemetry breaks more like plumbing than code. It breaks quietly, across a graph, with a blast radius you don’t understand until it’s expensive. With over 65% of organizations now running more than 10 collectors in production, hybrid deployments across Kubernetes and VMs are accelerating fast. Telemetry standardization is no longer a project milestone. It is a baseline expectation.

Service Level Agreement (SLA) Templates: Examples, Metrics, and Best Practices

How quickly should your team resolve a critical ticket, and what are the consequences when it misses the target? That is exactly where Service Level Agreements (SLAs) come into play. An SLA turns service expectations into measurable commitments by defining clear response and resolution targets. Rather than starting from scratch, an SLA template provides a structured foundation for establishing those commitments and tracking performance against agreed standards. Why does that matter?

Agent Timeline Is Now Generally Available

A few weeks ago I wrote about a customer’s refund request that stopped halfway through at 11:47 p.m. on a Tuesday night. That post walked through the 40 minutes it took to work out what happened when an agentic application had a problem: a tool retried against a rate-limited payments API, the error responses filled up the context window, and the agent gave up. The whole reason we built Agent Timeline was to turn that 40 minutes into five. To reduce MTTR. To solve the problem and get back to sleep.

The Second Edition of Observability Engineering Is Here

IT’S HERE it’s here it’s here it’s here!!!! The second edition of Observability Engineering is available for download, and since Honeycomb is the sponsor, you can now download it from our website (the dead tree version will take another month). This is a strange time to be writing a book.

Cooldown policies - Block malicious packages at the index

Every dependency pull is a trust decision. Public registries don't vet what they serve. Cooldown policies give you a gate at the moment that matters most: when a package first enters your environment. Dan McKinney (Solutions Engineering Manager) walks through how Cloudsmith's cooldown policies work and how to configure one in under five minutes. What Dan covers.

Troubleshooting ActiveMQ Producer Flow Control Blocks

The alert comes in at 2 AM: your order processing service is unresponsive. The application is not crashed, threads are running, the JVM is healthy, but no messages are being sent. Your operations team traces it to a blocked send() call on an ActiveMQ connection. Hours later, after restarting the application, someone finds this line in the broker log from 11 PM the previous day.