Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Beyond SLAs: Rethinking Service Level Objectives in Incident Response

Apr 24, 2024 By Vishal Padghan In Squadcast

In the context of IT service management, Service Level Agreements (SLAs) have long been the cornerstone for measuring and ensuring the quality of services provided to customers. However, as technology evolves and incidents become more complex, relying solely on SLAs may not be sufficient. This is where Service Level Objectives (SLOs) come into play, offering a more nuanced approach to Incident Response.

Read Post

Squadcast

Read more about Beyond SLAs: Rethinking Service Level Objectives in Incident Response

Operational Excellence at the New York Stock Exchange: Our Q&A with NYSE's President

Apr 24, 2024 By Jesse Purewal In PagerDuty

Mitigating the risk of operational failure is top of mind—and a top budget priority—for executives. A single unplanned event can have a disruptive effect across the organization, an outcome management teams work hard to avoid. For the New York Stock Exchange (NYSE), operational resilience is critical given the role it plays in the global economy and capital flows.

Read Post

PagerDuty

Read more about Operational Excellence at the New York Stock Exchange: Our Q&A with NYSE's President

Process Automation Release Notes v5.2

Apr 24, 2024 By PagerDuty In PagerDuty

Chat with the PagerDuty Process Automation product management team. Join us to learn more about what's new in the new release - v5.2 - and what's coming for automation!

View Video

PagerDuty

Read more about Process Automation Release Notes v5.2

Streamlining Incident Management with Squadcast's Workflows

Apr 24, 2024 By Squadcast In Squadcast

Watch this Webinar to understand how automating with Squadcast's 'Workflows' can save your team over 1000+ productive hours. Learn about the power of automation in the Incident lifecycle and see a live demo on setting up and tailoring Workflows to boost efficiency. 🛠️

View Video

Squadcast

Read more about Streamlining Incident Management with Squadcast's Workflows

SRE and the Enterprise: Building a Culture of Reliability at Scale

Apr 23, 2024 By Vishal Padghan In Squadcast

As the digital landscape evolves at breakneck speed, enterprises face an increasingly complex challenge: how to ensure their systems remain reliable and available amidst the chaos of modern technology. In this journey, Site Reliability Engineering (SRE) emerges as a beacon of hope, offering a pragmatic approach to building a culture of reliability at scale.

Read Post

Squadcast

Read more about SRE and the Enterprise: Building a Culture of Reliability at Scale

Reduce MTTR with BigPanda Similar Incidents

Apr 23, 2024 By Elli Dugger In BigPanda

There’s wisdom in past experiences — if you can access it. During live incidents, teams often look for parallels to past situations in their investigation process. Finding the answers is a time-consuming and manual process. You first have to identify similar incidents, then review historical data for insights and details on how previous teams resolved them. There’s no time to waste when SLAs are at stake. Yet that’s how many operators spend their time.

Read Post

BigPanda

Read more about Reduce MTTR with BigPanda Similar Incidents

Takeaways from BigPanda 24

Apr 23, 2024 By Assaf Resnick In BigPanda

Last week saw several big milestones for BigPanda. We launched several new AI-driven capabilities (see below). And we had the privilege of meeting with more than 40 IT operations leaders from customers, including Disney, Nvidia, Autodesk, Lucid Motors, Intel, and Blue Shield, at our customer event, BigPanda 24. Representing some of the most innovative organizations in business and technology, these influencers joined us as part of our customer and technical advisory boards.

Read Post

BigPanda

Read more about Takeaways from BigPanda 24

xMatters Vanguard Release

Apr 23, 2024 By xMatters In xMatters

When all systems are firing, managing your incident management processes can feel a little out of this world. For this release, we've packed in more features than can fit into the City of Mystery. But never fear! You don't need to be part of a space program to join this intergalactic quest. All xMatters instances now include powerful new features and updates from our latest release: Learn more about these features and all the other exciting updates in our ‍ Vanguard Release Overview‍.

View Video

xMatters

Incident Management

Read more about xMatters Vanguard Release

Beginner's Guide to Kubernetes Troubleshooting

Apr 22, 2024 By Ritika Bramhe In OnPage

Kubernetes troubleshooting is a critical skill for developers and system administrators managing containerized applications. It involves diagnosing and resolving issues within a Kubernetes cluster, ensuring that applications run smoothly and efficiently. Troubleshooting can range from simple configuration errors to complex networking issues, requiring a deep understanding of Kubernetes architecture and components.

Read Post

OnPage

Read more about Beginner's Guide to Kubernetes Troubleshooting

Status Page automation with Playbooks

Apr 19, 2024 By Spike In Spike

"🚀 Automate Your Status Pages with Playbooks! 🚀 In this video, we're diving deep into the world of incident response automation. Join us as we explore how you can streamline your status page updates with Spike's powerful Playbooks feature. Learn step-by-step how to create and configure Playbooks to automate your status page notifications, ensuring your stakeholders are always kept in the loop during incidents. With a live demo and practical insights, you'll discover how easy it is to set up automated responses tailored to your organization's needs.

View Video