Operations | Monitoring | ITSM | DevOps | Cloud

The Invisible IT Department: How to Deliver Friction-Free Experiences with Agentic AI

Every enterprise has bought AI, but many are still waiting for their investment to pay off. Ivanti’s 2026 AI Maturity Report found that only 2% of organizations say they currently have no AI use at all. As the majority of organizations move beyond the AI experimentation stage, the real competitive differentiator is if that AI is providing continuous, business value at scale.

So you need to add microcontrollers to your fleet: now what?

Your Ubuntu Core fleet is running beautifully. OTA updates roll out in minutes. Every device is strictly confined, cryptographically attested, and carrying a 10 to 15 year long term support (LTS) commitment. The operational team sleeps soundly. Then the product roadmap meeting happens. The industrial floor needs vibration sensors on every motor. The smart building needs temperature nodes in every room. The cold chain system requires dozens of low-power Bluetooth tags. And someone just said the words.

The Data Plane Reality: OTel Scales, While Topology UX Lags

OpenTelemetry won the architectural standards battle. At scale, though, telemetry breaks more like plumbing than code. It breaks quietly, across a graph, with a blast radius you don’t understand until it’s expensive. With over 65% of organizations now running more than 10 collectors in production, hybrid deployments across Kubernetes and VMs are accelerating fast. Telemetry standardization is no longer a project milestone. It is a baseline expectation.

Service Level Agreement (SLA) Templates: Examples, Metrics, and Best Practices

How quickly should your team resolve a critical ticket, and what are the consequences when it misses the target? That is exactly where Service Level Agreements (SLAs) come into play. An SLA turns service expectations into measurable commitments by defining clear response and resolution targets. Rather than starting from scratch, an SLA template provides a structured foundation for establishing those commitments and tracking performance against agreed standards. Why does that matter?

Agent Timeline Is Now Generally Available

A few weeks ago I wrote about a customer’s refund request that stopped halfway through at 11:47 p.m. on a Tuesday night. That post walked through the 40 minutes it took to work out what happened when an agentic application had a problem: a tool retried against a rate-limited payments API, the error responses filled up the context window, and the agent gave up. The whole reason we built Agent Timeline was to turn that 40 minutes into five. To reduce MTTR. To solve the problem and get back to sleep.

The Second Edition of Observability Engineering Is Here

IT’S HERE it’s here it’s here it’s here!!!! The second edition of Observability Engineering is available for download, and since Honeycomb is the sponsor, you can now download it from our website (the dead tree version will take another month). This is a strange time to be writing a book.

Troubleshooting ActiveMQ Producer Flow Control Blocks

The alert comes in at 2 AM: your order processing service is unresponsive. The application is not crashed, threads are running, the JVM is healthy, but no messages are being sent. Your operations team traces it to a blocked send() call on an ActiveMQ connection. Hours later, after restarting the application, someone finds this line in the broker log from 11 PM the previous day.

Cloud Storage vs Local Storage: Everything You Need to Know

In 2026, the world is expected to generate roughly 450 to 500+ million terabytes of data per day due to continued rapid growth in: All this data needs to be stored somewhere, but is cloud storage or local storage best to manage your data? Throughout this article, we will cover This way, you will gain a deeper understanding of both storage models and determine which best suits your personal, business, or enterprise use case.

5 Alternatives to Prometheus in 2026

Prometheus is a battle-tested, flexible and, most importantly, free tool that has long been the go-to open-source monitoring solution. Much of its popularity came down to its simplicity. A few years have gone by, though, and the APM space has gotten pretty crowded. Developers are now starting to move away from the complexity of self-hosting, and OpenTelemetry stands out as one of the CNCF’s fastest-expanding projects. In fact, it’s now among the most adopted telemetry frameworks out there.