Operations | Monitoring | ITSM | DevOps | Cloud

BigPanda

Domain-agnostic and here to stay: Gartner outlines the current state and future of AIOps

Coined by Gartner in 2016, the term ‘AIOps’ refers to the combining of big data AI and machine learning to automate and improve IT operations processes. Back then, this very broad definition led to some confusion, with different IT vendors characterizing AIOps differently, depending on what they were actually offering.

The true cost of IT Ops, the added value of AIOps

Today’s IT landscape is complex, hybrid, and fast-moving, and the adoption of multi-cloud infrastructure, applications, and new digital transformation initiatives is accelerating. IT operations teams, playing a vital role in enabling the delivery of uninterrupted services and creating business value for enterprises, are finding they need to constantly grow their resources to manage all the moving pieces in their IT stack. This can get expensive … but how much are they spending?

Incident triage: a key element in your MTTR

One of the key performance indicators for IT Ops is MTTR (Mean-Time-To-Resolution). MTTR essentially measures the length of your incident management lifecycle: from detection; through assignment, triage and investigation; to remediation and resolution. IT Ops teams strive to shorten their incident management lifecycle and lower their MTTR, to meet their SLAs and maintain healthy infrastructures and services. But that’s often easier said than done.

Phoenix Project: Sometimes you have to look back to look forward

It has been eight years since The Phoenix Project was published and a lot has changed since then! I started to think about what we’ve learned in that time. It starts with the theory of constraints. I still see it all the time. Organizations take actions which are merely temporary, putting out fires but not solving for the underlying causes of those fires.

Say goodbye to guessing: Introducing Automatic Incident Triage by BigPanda

Low MTTR is the much-desired nirvana-state in IT Operations. One of the most painful parts of the incident management lifecycle, which prevents the achievement of this nirvana, is triage: the time it takes first incident responders to determine the next action when facing a barrage of IT incidents. Why?

How to speed up incidents with a lot of cooks in the kitchen

In one of our recent webinars we discussed a substantial challenge IT Ops teams face in today’s complex IT environments: defining and clearly communicating incident/operational roles and processes, in an effort to create a well-coordinated incident management lifecycle. This lifecycle is essential for restoring service as quickly as possible when disruptions occur. Following are the highlights of that discussion, also recently published in an ApmDigest article.