Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Bridging the ITIL vs DevOps Mindset: CI/CD Best Practices for ITIL Organizations

Oct 9, 2023 By Elik Eizenberg In BigPanda

DevOps practices in software development have revolutionized the way updates are released. However, many companies entrenched in ITIL practices find it challenging to seamlessly integrate with the DevOps practice of Continuous Integration and Continuous Delivery/Deployment (CI/CD). This is because ITIL focuses on stability, which suits older systems, while DevOps is ideal for modern setups with its agile, automated practices.

Read Post

BigPanda

Read more about Bridging the ITIL vs DevOps Mindset: CI/CD Best Practices for ITIL Organizations

Revolutionizing your Grafana setup with intelligent alerting

Oct 9, 2023 By emily In SIGNL4

Once upon a time, in the bustling city of DataVille, lived a team of dedicated IT professionals tirelessly working to maintain the city’s digital heartbeat. Their mission was to ensure the smooth operation of their city’s digital infrastructure, which was not limited to the daytime operations but extended beyond business hours. They were the unsung heroes, the guardians of the city’s data. Their tool of choice? Grafana, a powerful open-source platform for observability.

Read Post

SIGNL4

Read more about Revolutionizing your Grafana setup with intelligent alerting

What is HCAHPS: A Comprehensive Overview

Oct 9, 2023 By Halle Katz In OnPage

In the realm of hospitals and healthcare organizations, the term “HCAHPS survey” is a recurrent presence: Hospital Administrator A: “The latest HCAHPS survey results just came out, and patients seem satisfied with…” Hospital Administrator B: “Some of our past patients participated in the HCAHPS survey, but they expressed disappointment with…” You might be left wondering, “What exactly is the HCAHPS survey?” Allow me to elucidate.

Read Post

OnPage

Read more about What is HCAHPS: A Comprehensive Overview

Unified Incident Management: Merits of Combined On-Call and Incident Response | Squadcast

Oct 6, 2023 By Squadcast In Squadcast

In this session, we explore the crucial aspects of effective on-call management and incident response in product organizations. Squadcast combines On-Call and Incident Response into a single platform using automation capabilities for enhanced reliability, continuous learning, and better productivity. 🔍 Timestamps.

View Video

Squadcast

Read more about Unified Incident Management: Merits of Combined On-Call and Incident Response | Squadcast

Choosing the Right Career Path in Tech: Software Engineering vs. Site Reliability Engineering (SRE)

Oct 6, 2023 By Anjali Udasi In Zenduty

The tech industry is booming, and there are many different career paths. But, two of the most popular and in-demand roles are Software Engineering and Site Reliability Engineering (SRE). Site Reliability Engineering (SRE) blends elements of software engineering with IT operations, focusing on reliability. On the other hand, SWE Software Engineering involves designing, developing, testing, and deploying software applications.

Read Post

Zenduty

Read more about Choosing the Right Career Path in Tech: Software Engineering vs. Site Reliability Engineering (SRE)

October 2023 Update - New layout, additional cross links, improved event filtering and much more

Oct 5, 2023 By René In SIGNL4

Our October update brings a new layout in the web portal, new additional cross-references from Signl details to linked entities, and improved grouping options for conditions in the distribution rules. As always, all the details are in this blog article.

Read Post

SIGNL4

Read more about October 2023 Update - New layout, additional cross links, improved event filtering and much more

What is Mean Time Between Failures - and why does it matter for service availability

Oct 5, 2023 By Amy Brennen In BigPanda

Mean Time Between Failures (MTBF) measures the average duration between repairable failures of a system or product. MTBF helps us anticipate how likely a system, application or service will fail within a specific period or how often a particular type of failure may occur. In short, MTBF is a vital incident metric that indicates product or service availability (i.e. uptime) and reliability.

Read Post

BigPanda

Read more about What is Mean Time Between Failures - and why does it matter for service availability

Enhance Your Customer Service with PagerDuty for ServiceNow CSM

Oct 5, 2023 By Hadijah Creary In PagerDuty

In today’s fast-paced, digital-first landscape, delivering exceptional customer experience is paramount to business success. For customer service teams, that means maintaining service level agreements (SLAs) and ensuring swift responses to customer issues that can make or break your company’s reputation. Fortunately, PagerDuty has improved the way companies handle customer service teams and has built applications into ServiceNow’s CSM platform.

Read Post

PagerDuty

Read more about Enhance Your Customer Service with PagerDuty for ServiceNow CSM

The Rise of Generative AI

Oct 5, 2023 By Blameless In Blameless

Revolutionizing Business: The Rise of Generative AI - Actionable Strategies to Integrate Advanced AI Seamlessly into Your Engineering Operations.

View Video

Blameless

Read more about The Rise of Generative AI

Alerting, Incident Management and the SDLC | Better Incidents Podcast Ep. 8

Oct 5, 2023 By FireHydrant In FireHydrant

In this episode we chat with veteran cloud architect Masaru Hoshi about the challenges of alert fatigue, the importance of effective alerting systems, and fostering ownership in software teams. Masaru shares insights from his 30-year career, emphasizing the need for balance, trust, and collaboration in incident response.

View Video