Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on AIOps, alerting in complex systems and related technologies.

Identify recurring issues and reveal their root cause with BigPanda IT Problem Management

For many enterprises, incident response feels like déjà vu. The same issues keep happening over and over, eating up time, draining resources, and wearing down your teams. In fact, 20-40% of IT incidents are typically recurring issues, created by unresolved underlying problems. Teams prioritize speed over permanence, patching symptoms instead of addressing the root cause. They often lack the right context, documentation, or shared knowledge to permanently fix issues.

OpenTelemetry + ignio: The Foundation for Intelligent, Unified Observability

In the previous post, What is OpenTelemetry?, we went over the What, Why, and the How of OpenTelemetry. We also went over the telemetry data lifecycle (data generation à collection à storage à usage) and how telemetry data (MELT) could be put to use to troubleshoot a representative web application scenario.

BigPanda & Jira Service Management: Enterprise-wide visibility meets team-level autonomy

Business teams today move fast. Developers, site reliability engineers (SREs), and product owners expect to manage incidents, changes, and requests in a way that fits naturally into how they already work with tools like Jira and Confluence. Customers expect a seamless service experience powered by automation and AI. The result is a wave of teams adopting tools like Jira Service Management to get everything they need in one place without slowing down.

Agentic AIOps in Action: LogicMonitor, IBM, and Red Hat Deliver Self-Healing IT

Your most skilled engineers shouldn’t be spending nights and weekends piecing together root causes of outages. Yet many organizations still rely on manual incident response across sprawling hybrid and multi-cloud environments. The result: slower resolution times, frustrated customers and lost revenue that can reach up to $1 million per hour according to IDC. At LogicMonitor, we believe the answer isn’t just better monitoring. It is systems that can heal themselves.