Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How to build workflows in OneUptime and integrate OneUptime with anything?

OneUptime is a complete open-source observability platform. It allows you to create workflows and integrate with over 5000 different services and products without writing any code. This integration capability allows OneUptime to connect with the rest of your software stack. Building workflows in OneUptime likely involves defining the sequence of operations that should occur based on certain triggers or conditions. These workflows can help automate processes, such as incident management, alerting the right people at the right time, and more.

When More Incident Commanders are Better

It has been lightly revised and reposted with his permission from the original article on Medium. Leading major incident responses can be extremely stressful. You have to quickly gather an ad-hoc team, figure out what went wrong, identify a fix and make sure this doesn't make things worse, all the while with senior leadership breathing down your neck. Are we having fun yet? Many people think having a dedicated incident commander role will solve the problem.

Captain's Log: Diving into our scheduling design

On-call scheduling is tricky. Like, really tricky. It was one of the scariest parts when we decided to build a modern alerting system earlier this year. We knew we couldn't cut any corners on Day One of our release because it needed to be a fully loaded feature for someone to realistically use our product (and replace an incumbent). This meant including windowed restrictions, coverage requests, and simple to complex rotations.

On-Call Management Models

In today's fast-paced digital landscape, incident management is crucial for maintaining operational excellence. During this process, on-call management models play a critical role in promptly addressing and resolving incidents. On-call management involves the organization of teams to ensure prompt response and resolution of incidents and is necessary to streamline incident resolution, ensure 24/7 availability, and allow for fair and transparent on-call rotations.

The Unplanned Show, Ep. 22: CSOps at PagerDuty with Arturo Suarez Martin

Even with the best monitoring in the world, some customer-impacting issues still go undetected and are ultimately reported by customers. In this episode, we'll hear from PagerDuty's Senior Director of Global Support, Arturo Suarez Martin, about the journey that PagerDuty has been on to tighten feedback loops between Customer Support and Engineering and mitigate the risk of poor customer experiences.

Ping Command: A Comprehensive Guide to Network Connectivity Tests

The ping network test, a core utility since the 80s, plays a crucial role in confirming connectivity between IP-networked devices. In this guide, we'll delve into what the ping command is, how to run a ping network test, common IP addresses to ping, interpreting results, and troubleshooting errors.