Setting and tracking key performance indicators based on the right data can help incident management teams reduce the impact of incidents and strengthen the business. But what exactly is the right data? That can be a deceptively tricky question. Incidents are complex, and no two are exactly the same – and your KPIs must reflect this complexity.
Managing IT infrastructure today can feel like a game of Tetris. Operations staff are constantly managing the addition of new pieces, trying to quickly determine how to best position them while the clock is ticking before the next round drops. Ultimately, decisions made early on impact what comes later and vice versa.
When an outage hits your service, everybody starts talking. Your engineers are talking about what caused the problem, and how to fix it; your management is asking about when it’ll be fixed; and your customers are telling the world that they’re not happy. But there’s an even more important conversation you should be having: communicating with your users about the issue.
Most of us are familiar with the traditional farms that have existed since humans learned to sow and harvest crops—these farms have provided us with food for centuries. And for a long time, due to the lack of refrigeration and other technology, humans lived near their food sources. But industrialization has also led to centralization of farming systems, with farms getting larger and further from consumers and with distributors depending on preservatives or refrigeration to extend shelf life.
Having spoken with many companies, I’ve learned that while they all monitor their application performance, infrastructure, product usage, conversion rates and a variety of other user experience parameters, very few monitor the actual transactions from their payment provider.
IT operations management vendors are adding AI capabilities to their wares, but central AIOps platforms deliver the most value by coordinating all those domain-specific tools.
Much like the pagers of yore, PagerDuty immediately notifies the right person when something goes wrong. That means that no matter when there’s an issue in your application, the right people on your team will hear about it. But as much as we love PagerDuty, we’re not using valuable company time and resources just to tell you about it. We are, however, using valuable company time and resources to tell you all about our new integration with PagerDuty.