Operations | Monitoring | ITSM | DevOps | Cloud

FYI: Email Alerting Isn't Enough

Email alerting is an inefficient way to receive and address critical alerts. Email inboxes tend to get flooded with “clutter,” as irrelevant messages bury urgent incident notifications. Incident management procedures require incident management systems, ensuring that urgent issues are immediately addressed. Yet, some services are reluctant to say goodbye to email alerting and its inefficiencies. This is the case with Google Voice, which recently solidified its commitment to email alerting.

What is a Status Page? (& How Does It Benefit Companies/Customers)

There’s nothing worse than turning on your computer to start the work day and discovering the internet is down. We all know the frustration of tediously trying to figure out what’s wrong before finally breaking down and calling our service provider and waiting on hold, only to discover that it’s a known issue and it’s being addressed. What if there was a better way?

Keeping Your CMDB Up To Date in Distributed Times

The configuration management database (CMDB) is meant to be a single source of truth to link IT elements with the application processes that underlie the business services. In the age of ITIL, a common repository to store information about your hardware and software assets, made sense. But with today's dynamic and distributed hybrid IT infrastructure, how do you keep your CMDB up to date? Should you even try?

NHS on Its Final Leg of Pager Replacement

If you’ve been following the U.K. healthcare landscape, you would know that the country has been considering replacing pagers for the longest time. This may soon materialize, partly accelerated by the challenges that doctors are facing during the COVID-19 pandemic. The pager replacement initiative not only signifies a pivotal shift from the aging infrastructure, but it also indicates how pagers have failed to thrive in today’s unprecedented times.

How to use check aggregates in Sensu Go

Aggregates, which allow you to monitor groups of checks or entities, were a much-beloved feature in Sensu Core (the predecessor to Sensu Go) — Ben Abrams describes them as “awesome” in his post on alert fatigue, noting that aggregates are like having “a bunch of nodes behind a load balancer where each node is healthchecked, and if a node drops out it may not be worth waking someone up in the middle of the night.”

Best practices for alerting on Kubernetes

A step by step cookbook on best practices for alerting on Kubernetes platform and orchestration, including PromQL alerts examples. If you are new to Kubernetes and monitoring, we recommend that you first read Monitoring Kubernetes in production, in which we cover monitoring fundamentals and open-source tools. Interested in Kubernetes monitoring?

Deploy ChatOps with Microsoft Teams + Resolve Automation to Modernize Your Service Desk!

Looking for a chat capability for your ITSM tool, and already have Microsoft Teams? Why not use Resolve Automation to power chat ops for Service Desk Transformation. Join Brent Hunter to see how you can quickly integrate common tasks.

Postmortems and More With J. Paul Reed

PagerDuty sat down with J. Paul Reed, a Senior Applied Resilience Engineer at Netflix, for an Ask Me Anything (AMA) to discuss best practices around postmortems. Reed is a prominent speaker and advocate of DevOps and operations complexity, and has over 15 years of experience in release engineering. His background in tech, along with his previous work at companies like Mozilla and VMware, give him a unique perspective into the inner workings of innovative organizations.