Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Upgraded role-based access control brings more visibility - and control - to incident management at your organization

We’ve long believed that incidents (and technical team cultures) improve when more people are empowered to declare, follow, and contribute to their resolution. But not everyone in an organization needs to be able to manage the processes, rules, and settings a company enforces for their incident programs.

Welcome To xMatters - Ep3 - Sending Messages

There’s nothing better than a smoothly run operation but life is full of unexpected surprises. When things don’t go to plan, and help is urgently needed, no time can be wasted. Getting a message to a resolver on time is just as important as having a resolver to call in the first place! And letting people know that help is on the way is especially important to keep the situation calm until they arrive.

How our product team use Catalog

We recently introduced Catalog: the connected map of everything in your organization. In the process of building Catalog as a feature, we’ve also been building out the content of our own catalog. We'd flipped on the feature flag to give ourselves early access, and as we went along, we used this to test out the various features that Catalog powers.

Services are not special: Why Catalog is not just another service catalog

As you may have already seen, we’ve recently released a Catalog feature at incident.io. While designing and building it, we took an approach that’s a tangible departure from a traditional service catalog. Here’s how we’re different, and why.

Azure Incident Management with Escalation Policy

These days, businesses heavily rely on cloud services like Microsoft Azure to power their operations. While Azure provides robust infrastructure and services, occasional issues and incidents can still occur. Serverless360 provides enhanced capabilities to monitor and manage Azure incidents in a system. But to ensure seamless operations and timely resolution of problems, it is crucial to have a well-defined escalation policy in place for Azure Incident Management..

The Unplanned Show, Episode 3: LLMs and Incident Response

A software engineer, a data scientist, and a product manager walk into a generative AI project… Using technology that didn’t exist a year ago, they identify a customer pain point they might be able to solve, build on teammates’ experience with building AI features, and test how to feed inputs and constrain outputs into something useful. Hear the full conversation here.

incident.io Catalog hands on lab

The incident.io Catalog is a connected, navigable, map of "things" that exist in your organization. We can use it to describe an organization as a connected graph, and use that graph to drive powerful workflow automations during incidents. In this hands-on training session, we'll work through an example of building a catalog for a mock organization. We'll then use the catalog to solve some real business problems, including automated incident data attribution, and some realistic workflows which outline how it works and what it enables in the context of incident management.

How AIOps Revolutionizes Observability for TechOps Teams

Managing over 1000 services and applications is daunting for any organization’s IT and Tech operations team. With a diverse mix of on-premises legacy systems and modern cloud stacks, the sheer volume of activity can overwhelm even the most skilled ITOps teams. The task is made more difficult by the fact that observability is fragmented. On average, organizations depend on 21 systems that produce metrics, logs, traces, and alerts for various services.

Sponsored Post

Squadcast's Improved Mobile App for Better Incident Response

The 2020 pandemic has definitely changed the way teams operate across the globe. Many of you may have already experienced moving from 100% office work to 100% remote work, and now that it has been almost three years since the pandemic started many of us have resorted to hybrid models. We at Squadcast value the importance of efficient communication, reaching the right people during a crisis, and the freedom to resolve critical incidents from anywhere, anytime. Keeping that in mind, we have made major improvements to our mobile app to help you effectively partake in Incident Response activities anytime from across the globe.

Cyberattack Prevention with AI

Cyberattack prevention involves proactive steps organizations take to protect their digital assets, networks, and systems from potential cyber threats. Preventive measures, such as a combination of best practices, policies, and technologies, are employed to identify and mitigate security breaches before they can cause significant damage.