Operations | Monitoring | ITSM | DevOps | Cloud

What's New: Related Incidents, Business Response, Mobile Status Dashboard, & New Integrations

An always-on world requires a proactive and preventative approach to managing your digital operations. PagerDuty is proud to announce our latest release, which helps streamline remote remediation by providing an at-a-glance overview of your system’s health. While we’re known for on-call management and incident response, PagerDuty does much more, including providing visibility into the business impact of an incident.

Extracting Insights from Metrics with AIOps for Better Observability

In this second installment of this blog series, we’ll discuss the importance of analyzing metrics, and how AIOps helps you with this fundamental pillar of observability. Without proper metrics analysis, you’re left blind to potential outages, or possibly worse — inundated with false positive anomalies, leading to alert fatigue and ultimately business impacts. Automated discovery and analysis can’t be achieved with legacy tools nor will it scale with humans.

Using observability tools to set SLOs for Kubernetes Applications

You deployed a service to your Kubernetes cluster. How do you it is working as expected? In this blog, Gigi Sayfan, author of “Mastering Kubernetes” talks about Kubernetes observability tools like Prometheus, Grafana and Jaeger, how to utilize them to set proper SLOs and make sure the service meets its objectives.

IT Teams Under "High Stress" Resolving Faster Than Ever Before

Seemingly simple digital moments, like checking into a flight, trigger a complex technical flow of events under the IT covers. A simple swipe or click relies on a complex IT ecosystem made up of millions of lines of code, spanning multiple software applications, hybrid and multi-cloud technologies, state-of-the-art IT infrastructure, security apps, and more.

Modern ITSM Solutions: Flexibility in Incident Response

We no longer live in a world where a few tools determine the way organizations structure their processes. From IT Service Delivery to Incident Response, Modern IT Operation Solutions need to embody the flexibility that most Enterprises require. The dynamic ITOps ecosystem has shifted to put choice back in the hands of the user. Now, IT Solutions must follow suit. Modern Incident Response platforms, in particular, need the flexibility that enterprises need to mirror their enterprise architecture.

Advice for On-call Teams During COVID-19

I’ve offered some tips up for folks who are oncall during the COVID-19 crisis, but I thought it would be helpful to get some more ideas from people with different perspectives. So I reached out to some people I trust to see what they had to say. They all have different viewpoints, but some themes emerge, like managing alerts, having empathy, and practicing self-care. The participants, in alphabetical order: Aaron Aldrich is a Developer Advocate at LaunchDarkly, with a focus on DevOps.