Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Site Reliability Engineer (SRE) Interview Questions

In this article we will cover the top 25 SRE interview questions to help you prepare for you next SRE interview. As customer demand for reliable and high-performing services continues to grow, the role of Site Reliability Engineers (SRE’s) continues to grow in importance. Whether you are a seasoned SRE or a recent graduate preparing for an SRE interview, these questions will be invaluable for determining your level of expertise and understanding where you need to grow.

The Engineer's Roadmap to Building Resilient Systems in High Growth Environments

In the past, software development was all about hitting deadlines and budgets. But times have changed. Today, users expect flawless, 24/7 experiences that drive business value. That's why building reliable and resilient systems is no longer a luxury - it's a necessity.

Build Operational Excellence with New Innovations on the PagerDuty Operations Cloud

The PagerDuty Operations Cloud empowers modern enterprises to tackle critical operations work and deliver on top strategic initiatives. From transforming incident management to modernizing NOC operations, streamlining automation, and improving customer experience, the PagerDuty Operations Cloud enables organizations to augment their workforce with AI and automation. This approach ensures our customers can operate more efficiently, accelerate innovation velocity, and sustain seamless digital experiences.

Drive Operational Excellence with PagerDuty

Build operational excellence with PagerDuty. Watch this demo to see how the latest innovations for the PagerDuty Operations Cloud come together to help a team tackle a major incident related to a database upgrade. You’ll see how PagerDuty Copilot capabilities work in concert with new functionality built for modernizing operations centers, standardizing automation at scale, and transforming incident management. The result? Improved innovation velocity, reduced operating costs, and better customer experiences.

May 2024 Update - New shift scheduling brings increased productivity and improved user experience, along with revamped stand-in functionality

Our May update includes a newly revamped shift scheduling for your SIGNL4 teams. It is now much easier to run your shift model in SIGNL4 and schedule team members into shifts. It also includes a new calendar view and a fundamental revision of our substitute function for the scheduled colleagues on duty. All details are as always available in this blog article.

Accelerate incident resolution with Advanced Insight

The common thread among teams responsible for maintaining IT services is their reliance on a deep understanding of the IT environment. Teams need access to all types of critical data to keep systems running. While it seems straightforward, ITOps teams face many challenges in locating, accessing, and synthesizing enough data to fully understand an incident’s cause and establish a remediation plan.

How to Build an Effective OnCall Schedule in 2024

When it comes to oncall scheduling, your enterprise must plan as much as possible. Fortunately, with the right processes and tools, you can effectively implement and manage an oncall schedule. You can also use this schedule to quickly identify and resolve incidents and prevent them from causing long-lasting damage to your organization and its stakeholders.