Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

The Art of Alert Management

With the ever-growing landscape of digital technology and the internet of things (IoT), businesses are becoming increasingly reliant on complex systems to monitor and manage their operations. This dependency has resulted in an explosion of alerts and notifications, overwhelming IT teams and affecting overall productivity. It’s never been more critical to have an effective alert management strategy in place to ensure the smooth running of your organization.

Announcing Catalog - the connected map of everything in your organization

One of the most painful parts of incident response is contextualizing the problem and understanding how and where it fits within your organization. If responders are unable to answer basic questions such as: Then you waste valuable time talking to the wrong people or solving the wrong problems — ultimately extending impact and hurting your response. It’s a common issue that, up until now, didn’t have a clear solution or workaround.

From Expense to Excellence: Transforming ITOps in 2023 through Strategic IT cost optimization

Most organizations view their tech and network operations center and their budgets as simply the cost of running their internal and external IT services. However, through IT cost optimization, you can improve how your Ops center team responds to service issues and save valuable resources too. So, what specifically is IT cost optimization?

Upgraded role-based access control brings more visibility - and control - to incident management at your organization

We’ve long believed that incidents (and technical team cultures) improve when more people are empowered to declare, follow, and contribute to their resolution. But not everyone in an organization needs to be able to manage the processes, rules, and settings a company enforces for their incident programs.

Welcome To xMatters - Ep3 - Sending Messages

There’s nothing better than a smoothly run operation but life is full of unexpected surprises. When things don’t go to plan, and help is urgently needed, no time can be wasted. Getting a message to a resolver on time is just as important as having a resolver to call in the first place! And letting people know that help is on the way is especially important to keep the situation calm until they arrive.

How our product team use Catalog

We recently introduced Catalog: the connected map of everything in your organization. In the process of building Catalog as a feature, we’ve also been building out the content of our own catalog. We'd flipped on the feature flag to give ourselves early access, and as we went along, we used this to test out the various features that Catalog powers.

Services are not special: Why Catalog is not just another service catalog

As you may have already seen, we’ve recently released a Catalog feature at incident.io. While designing and building it, we took an approach that’s a tangible departure from a traditional service catalog. Here’s how we’re different, and why.

Azure Incident Management with Escalation Policy

These days, businesses heavily rely on cloud services like Microsoft Azure to power their operations. While Azure provides robust infrastructure and services, occasional issues and incidents can still occur. Serverless360 provides enhanced capabilities to monitor and manage Azure incidents in a system. But to ensure seamless operations and timely resolution of problems, it is crucial to have a well-defined escalation policy in place for Azure Incident Management..

The Unplanned Show, Episode 3: LLMs and Incident Response

A software engineer, a data scientist, and a product manager walk into a generative AI project… Using technology that didn’t exist a year ago, they identify a customer pain point they might be able to solve, build on teammates’ experience with building AI features, and test how to feed inputs and constrain outputs into something useful. Hear the full conversation here.

incident.io Catalog hands on lab

The incident.io Catalog is a connected, navigable, map of "things" that exist in your organization. We can use it to describe an organization as a connected graph, and use that graph to drive powerful workflow automations during incidents. In this hands-on training session, we'll work through an example of building a catalog for a mock organization. We'll then use the catalog to solve some real business problems, including automated incident data attribution, and some realistic workflows which outline how it works and what it enables in the context of incident management.