%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Why SREs Need to Embrace Chaos Engineering

Jul 20, 2022 By xMatters In xMatters

Reliability and chaos might seem like opposite ideas. But, as Netflix learned in 2010, introducing a bit of chaos—and carefully measuring the results of that chaos—can be a great recipe for reliability. Although most software is created in a tightly controlled environment and carefully tested before release, the production environment is harsher and much less controlled.

Read Post

xMatters

Read more about Why SREs Need to Embrace Chaos Engineering

Episode 5: Mooving to... Practical Postmortems

Jul 20, 2022 By BJ Maldonado In Moogsoft

Episode 5, Mooving to… Practical Postmortems covers how to leverage postmortems to effectively learn from failure. Postmortems are a commonplace reference and are now considered a best practice in most modern engineering teams. However, there’s still a lot of confusion on what postmortems should be – and more importantly, what they should NOT be. Thom Duran, Senior Manager of Productivity from Panther walks us through all that and more in the latest Mooving To.. episode!

Read Post

Moogsoft

Read more about Episode 5: Mooving to... Practical Postmortems

Top Incident Response Metrics & How to Use Them

Jul 19, 2022 By Stephen Watts In Splunk

Two categories a software organization should always strive to improve in are: Data analysis is one way that your organization can improve the efficiency of incident management and overall application quality. However, the questions remain – which metrics should be collected? How can analysis of these metrics facilitate these improvements? Read on to hear about five key metrics essential to incident response.

Read Post

Splunk

Read more about Top Incident Response Metrics & How to Use Them

Our fully-redesigned incident response experience delivers a more intuitive workflow

Jul 19, 2022 By Dylan Nielsen In FireHydrant

Today we’re releasing fully redesigned Slack and Command Center experiences for FireHydrant so anyone on your team can intuitively navigate the incident response process — in the app or on the web. There are many things you can do ahead of an incident to help things run smoothly: design and document your process, automate predictable steps, train the team, and run drills.

Read Post

FireHydrant

Read more about Our fully-redesigned incident response experience delivers a more intuitive workflow

The Next Evolution in Customer Service

Jul 18, 2022 By Justin Shie In PagerDuty

“Customer service software has evolved so much these past ten years, but they all seem to be solving the same problems!” This was a statement made by a Customer Service leader in a recent brainstorming conversation around decreasing overall Response Times and Resolution Times.

Read Post

PagerDuty

Read more about The Next Evolution in Customer Service

Don't Let Outages Ruin Your Reputation - Prevent Them With AIOps

Jul 18, 2022 By Richard Whitehead In Moogsoft

The world is increasingly digital. The U.S. Census Bureau estimates e-commerce grew 14.2% from 2020 to 2021, for a total of $870.8 billion in sales. And just look at the trends in remote work. According to a FlexJob and Global Workplace Analytics report, remote work has grown 44% over the last five years and an astonishing 159% over the last 12. Indeed, much of America relies on a slew of digital apps and services to get business done every day. So what does this mean for businesses?

Read Post

Moogsoft

Read more about Don't Let Outages Ruin Your Reputation - Prevent Them With AIOps

SecOps tools - SecOps & incident management for 2022.

Jul 18, 2022 By AlertOps In AlertOps

Importance of secOps tools – The threats in the cyber world are becoming more and more complicated and sophisticated with each passing day, while the rapid expansion of digital operations, with more nodes, networks, and servers has resulted in more vulnerabilities. This situation demands efficient SecOps teams as well as practices so that threats are thwarted, and networks and data are always protected. What is SecOps & Best SecOps tools?

Read Post

AlertOps

Read more about SecOps tools - SecOps & incident management for 2022.

MTTR vs. MTTA vs. MTBF: A Complete Set of Common Incident Management Metrics

Jul 18, 2022 By ScienceLogic In ScienceLogic

There are a common set of key performance indicators for incident management, such as MTTR and MTTA. What do these metrics mean, and why are they important?

Read Post

ScienceLogic

Read more about MTTR vs. MTTA vs. MTBF: A Complete Set of Common Incident Management Metrics

AWS outage? A better way to monitor outages in Amazon Web Services

Jul 17, 2022 By isDown In isDown

Amazon Web Services (AWS) needs no introduction. It's one of the most popular services in the world. Or actually, the most popular cloud infrastructure provider (34%) according to this study. Like in any other service, there are outages. For people running their infrastructures, there's a good chance that outages have impacted your business in the past. And the reality for AWS (or any other service) is that there's a good chance it will happen again.

Read Post

isDown

Read more about AWS outage? A better way to monitor outages in Amazon Web Services

A deeper dive into the Rogers outage

Jul 15, 2022 By Doug Madory In Kentik

Beginning at 8:44 UTC (4:44am EDT) on July 8, 2022, Canadian telecommunications giant Rogers Communications suffered a catastrophic outage taking down nearly all services for its 11 million customers in what is arguably the largest internet outage in Canadian history. Internet services began to return after 15 hours of downtime and were still being restored throughout the following day.

Read Post