%term

Automated Runbooks = Faster Recovery

Nov 11, 2019 By Shreyash Naithani In Squadcast

Traditional Runbooks can become 10x more useful if they were automated or at least made executable (partly, if not fully). Shreyash Naithani from Microsoft Azure SRE team and author of "Practical Site Reliability Engineering" talks about how to take advantage of runbooks to eliminate toil.

Read Post

Squadcast

Read more about Automated Runbooks = Faster Recovery

Severity Matrix Updates

Nov 11, 2019 By Bobby Tables In FireHydrant

We’re on a mission to make responding to incidents a bit less chaotic. One of the best features we offer (we’re definitely not biased, no way) is a simple way to define how a severity gets determined when you open an incident. We call it the severity matrix, and today it has a new look. Previously, we had a preset list of conditions and impact that allowed you to pick a severity that matched them.

Read Post

FireHydrant

Read more about Severity Matrix Updates

From Mayhem to Modernization: The Evolution of Critical Incident Management

Nov 11, 2019 By Noam Morginstin In Exigence

Let’s face it, managing a critical incident has never been a walk in the park. Even, in the “good old days,” before the great cloud revolution and the onslaught of digital transformations, an incident often meant mayhem. Processes were manual, time consuming, difficult to execute, document, and learn from. Getting all the right people in the “same room” at the right time – was nearly impossible. Lots of time was wasted chasing down the right folks.

Read Post

Exigence

Read more about From Mayhem to Modernization: The Evolution of Critical Incident Management

Three Essential Truths to Delivering Great Customer Experiences

Nov 8, 2019 By Adam Frank In Moogsoft

Employing AIOps for observability, monitoring and service assurance frees developers to focus on building better services.

Read Post

Moogsoft

Read more about Three Essential Truths to Delivering Great Customer Experiences

Top 10 I&O Technologies for a successful 2020, 2021, 2022, 2023 & 2024

Nov 7, 2019 By Olaf Schouws In StackState

Each year there comes a time to look forward and think about next year and maybe even further. This can be a daunting task, especially in the fast-changing IT industry. Luckily, Gartner prepared a list of the top 10 technologies that will drive the future of Infrastructure and Operations up through 2024. This list might come in handy when you’re preparing your 2020 roadmap and beyond.

Read Post

StackState

Read more about Top 10 I&O Technologies for a successful 2020, 2021, 2022, 2023 & 2024

Smart SLO Alerting With Wavefront

Nov 7, 2019 By Pontus Rydin In PagerDuty

Back in the good old days of monolithic applications, most developers and application owners relied on tribal knowledge for what performance to expect. Although applications could be incredibly complex, the understanding of their inner workings usually resided within a relative few in the organization. Application performance was managed informally and measured casually. However, this model falls apart in a microservices world.

Read Post

PagerDuty

Read more about Smart SLO Alerting With Wavefront

The State of Unplanned Work: Key Findings

Nov 6, 2019 By Evelyn Chea In PagerDuty

It’s a new world order: Skynet has taken over. Just kidding. But it sometimes feels that way, doesn’t it? In the words of Marc Andreessen, software is eating the world, and technology problems are now business problems. This means developers are now the architects of the digital experience and, by extension, the customer experience—and when said developers are unable to innovate quickly, companies are more exposed to competitive threats.

Read Post

PagerDuty

Read more about The State of Unplanned Work: Key Findings

Why Escalations are Important to Clinical Communications

Nov 6, 2019 By Ritika Bramhe In OnPage

Unexpected events make the healthcare profession one of the most challenging industries to navigate and plan for. Sudden, abrupt patient situations tend to occur, increasing the workload of healthcare providers. Similar, process efficiencies and productivity are a reflection of the care team’s ability to communicate. When teams are on the same page, patient wait times are significantly reduced and results are improved.

Read Post

OnPage

Read more about Why Escalations are Important to Clinical Communications

RetroDuty: How We Scale Continuous Improvement Beyond Engineering at PagerDuty

Nov 5, 2019 By Derek Ralston In PagerDuty

If you’ve worked on a team that has adopted Agile techniques, you’ve probably heard of a retrospective. If not, here’s the TL;DR: A retrospective is a meeting in which a team connects regularly to reflect on what happens throughout a project and continuously improve how they work moving forward.

Read Post

PagerDuty

Read more about RetroDuty: How We Scale Continuous Improvement Beyond Engineering at PagerDuty

Meet Root Cause Changes from BigPanda - IT Ops, NOC and DevOps Teams' Best friend For Supporting Fast-Moving IT Stacks

Nov 5, 2019 By Mohan Kompella In BigPanda

TL;DR: Fast-moving IT stacks see frequent, long and painful outages. Thousands of changes – planned, unplanned and shadow changes – are one of the main reasons behind this. Until now, IT Ops, NOC & DevOps teams didn’t have an easy way to get a real-time answer to the “What Changed?” question – the answer that can help reduce the duration of outages and incidents in these fast-moving IT stacks. Now, with BigPanda Root Cause Changes, they do.

Read Post

BigPanda

Read more about Meet Root Cause Changes from BigPanda - IT Ops, NOC and DevOps Teams' Best friend For Supporting Fast-Moving IT Stacks

Operations | Monitoring | ITSM | DevOps | Cloud

Automated Runbooks = Faster Recovery

Severity Matrix Updates

From Mayhem to Modernization: The Evolution of Critical Incident Management

Three Essential Truths to Delivering Great Customer Experiences

Top 10 I&O Technologies for a successful 2020, 2021, 2022, 2023 & 2024

Smart SLO Alerting With Wavefront

The State of Unplanned Work: Key Findings

Why Escalations are Important to Clinical Communications

RetroDuty: How We Scale Continuous Improvement Beyond Engineering at PagerDuty

Meet Root Cause Changes from BigPanda - IT Ops, NOC and DevOps Teams' Best friend For Supporting Fast-Moving IT Stacks

Monthly Archive

Follow Us