Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Synthetic monitoring as Code with Checkly and ilert

This post will introduce Checkly, the synthetic monitoring solution, and their monitoring as code approach. This guest post was written by Hannes Lenke, the CEO, and co-founder of Checkly. ‍ First, thanks to Birol and the ilert team for the opportunity to introduce Checkly. ilert recently announced discontinuing its uptime monitoring feature and worked with us on an integration to ensure that existing customers could migrate seamlessly. ‍ So, what is monitoring as code and Checkly?

Top 5 Use Cases for Custom Fields on Incidents

Chasing down critical information in disparate systems of record while trying to resolve an incident can make an already stressful situation even more taxing. Extra clicks, extra logins, copy/paste, socializing that information with other responders–it all wastes time and introduces more room for human error. Now PagerDuty customers can use Custom Fields on Incidents to enrich their incident data.

New related incidents functionality brings order to the chaos of highly complex incidents

We’ve all been there. You’re working through some rather frustrating blockers during an incident only to discover that you don’t own the dependency at fault. Or, you’ve been pounding away at an issue when a fellow engineer reaches out and asks if your service is affected by some particularly gnarly database failure they’re seeing. But then what? Do you merge efforts and work in parallel or head for a coffee break while the issue gets attacked upstream?

Kubernetes Simplified: Understanding its Inner Workings

Kubernetes has revolutionized the world of container orchestration, providing organizations with a powerful solution for deploying, managing, and scaling applications. However, the complexity of Kubernetes can be daunting for newcomers. In this blog, we will demystify Kubernetes by breaking down its core components, revealing its operational principles, and guiding you through the process of running a pod.

What is Zero Trust Security and Why Should You Care?

Automation has become a game changer for businesses seeking efficiency and scalability in a rather unclear and volatile macroeconomic landscape. Streamlining processes, improving productivity, and reducing incidence for human error are just a few benefits that automation brings. However, as organizations embrace automation, it’s crucial to ensure modern security measures are in place to protect these new and evolving assets.

The Unplanned Show, Episode 2: Hadijah Creary Demystifies Customer Success vs Customer Service

In this episode, Hadijah Creary breaks down what Customer Service teams are versus Customer Success teams. What do they care about? How can they each get more proactive to improve the overall customer experience? And why is it PagerDuty Customer Service Operations and not Customer Success Operations?

We can now notify you through PagerDuty

When we detect a problem with your site, we can notify you via mail, a Slack message, a webhook, or any of our other notifications channels. This is enough for most of our users, but those who work in larger teams often need more flexibility. Today, we are launching our PagerDuty integration. PagerDuty is a cloud-based incident management platform that helps organizations improve operational reliability by providing real-time alerts, on-call scheduling, and incident tracking.

What is MTTR? Calculation and Reduction Strategies

In the fast-paced world of software development, every minute counts. When disruptions occur, whether there are minor or major system failures, organizations need to bounce back to maintain seamless operations. That's where MTTR (Mean Time to Repair) steps onto the stage as a game-changing metric. Are you ready to unlock the secrets behind reducing downtime, boosting performance, and ensuring software reliability?