Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Grafana OnCall is now generally available on Grafana Cloud, with a generous free tier

Today we’re announcing the general availability of Grafana OnCall on Grafana Cloud for all paid and free plans. A big part of delivering great software is ensuring the right people get the right information when the inevitable incidents occur. We want to help you do that with Grafana OnCall, an easy-to-use, developer-first on-call management tool that’s built on top of the Grafana stack you know and love.

Announcing Grafana Incident, smart incident management for your teams

A huge challenge when dealing with incidents is the coordination and communication needed to put things right. What’s happened so far? Who has tried what query? Did we remember to keep stakeholders informed? What is the severity of the incident? Does this affect customers? Figuring this out requires a lot of back and forth as new team members join the incident.

Top tips to make Round Robin Scheduling successful for your team

You may have heard of Round Robin Scheduling before and thought to yourself, is this right for my team? Understanding how Round Robin Scheduling can be used and what teams it works best for is important when considering this method of on-call. Additionally, it comes with some pitfalls you’ll want to avoid, as well as best practices to adopt. In this blog post, we’ll share everything you need to know about Round Robin Scheduling within PagerDuty and how to get started.

No capes: the perils of being a hero-engineer

When I first started out as an engineer I really leant in to the idea of what’s often called “being a hero”; I would get to the office a bit early to make sure I could fix anything that had gone wrong overnight. I loved the camaraderie of someone outside engineering bringing their laptop over with a critical process broken for me to fix (even if I’d been the one to break it!). Being a hero feels really good for a while, but over time, it loses its shine.

Getting Started with Playbooks

It’s 2022: You’re good at your job, you’re maintaining modern systems, now you want to level up your team based on a solid foundation of their collective expertise. You want to standardize and centralize process documentation and make execution as easy and effective as possible so that everything runs smoothly, every time.

What's New: Updates to Event Intelligence, On-Call Management, Automation, Mobile, and More!

We’re excited to announce a new set of updates and enhancements to the PagerDuty platform. Recent updates from the product team include On-Call Management, Event Intelligence, and Mobile Products, to PagerDuty Community & Advocacy Events.

Reliability Through Automation for Your Infrastructure and Applications at Scale

As technology becomes more SaaS-based and organizations deploy applications in multiple clouds, there are requirements for more visibility into the cloud environment and better incident response and resolution automation capabilities. The two elements required to achieve this are integrations and workflows in an incident response software solution and effective experimentation, research, and testing in the cloud and on-premise.

Intelligent Service Design

Hello and welcome to the fourth post in our EI Architecture series focusing on Intelligent Alert Grouping. Previously we have talked about how to train Intelligent Alert Grouping using incident merges (here) and how to configure your alert titles to improve default matching. In this post, we’re going to cover how service design can also impact your experience with Intelligent Alert Grouping as well as the PagerDuty app in general.

Training Intelligent Alert Grouping

We’re continuing on with our third piece about how to utilize and improve your Intelligent Alert Grouping (IAG)! In case you missed it, the first two blog posts describe the feature (here) and explain how it uses merging to group alerts (here). We alluded to today’s post at the end of last: today we’ll be discussing how to use alert titles to improve IAG matches.