%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

MTTJ - What is Mean Time to Join (MTTJ)?

Aug 5, 2022 By AlertOps In AlertOps

MTTJ – The time taken to join a meeting, and delays caused in ensuring right people are available, can be avoided using software automation and tools. This is not an often talked about topic, but am sure everyone is affected directly from this. We discuss this in detail here. What, why and how it can be avoided?

Read Post

AlertOps

Read more about MTTJ - What is Mean Time to Join (MTTJ)?

Driving a customer-focused incident response process

Aug 4, 2022 By Martha Lambert In Incident.io

Deep into an incident, Slack firing, up to your ears in decisions, not sure where to turn next? It’s easy for external communication with your customers to fall far down the list of priorities in these moments. However, these are the exact situations where comms are vital, and where underestimating their importance can having damaging and lasting effects on your organisation.

Read Post

Incident.io

Read more about Driving a customer-focused incident response process

New! Common Automated Diagnostics for AWS Users

Aug 3, 2022 By Jake Cohen In PagerDuty

Today’s modern cloud architectures centered on AWS are typically a composite of ~250 AWS services and workflows implemented by over 25,000 SaaS services, house-developed services, and legacy systems. When incidents fire off in these environments—whether or not a company has built out a centralized cloud platform—distinct expertise is often a necessity.

Read Post

PagerDuty

Read more about New! Common Automated Diagnostics for AWS Users

The Do's and Don'ts of Blameless Incident Postmortems

Aug 3, 2022 By xMatters In xMatters

When an incident inevitably occurs, many organizations have a well-prepared incident management team that springs into action. Whether it’s a power outage or security breach, an incident can damage your company’s operations if not handled properly. A strong incident response team is critical to mitigating any negative impacts successfully. Furthermore, once your team resolves the problem, you should initiate a postmortem to detail the incident and record any lessons learned.

Read Post

xMatters

Read more about The Do's and Don'ts of Blameless Incident Postmortems

RESOLVE '22: Incident management automation

Aug 3, 2022 By Ryan Taylor In BigPanda

“Make life easier” isn’t a mantra for the lazy—it’s a way to drill down on important automation in the IT Ops room. When Ryan Taylor, VP of solutions engineering at Transposit, talks about his experience and outlook in the IT Ops chair, people tend to listen.

Read Post

BigPanda

Read more about RESOLVE '22: Incident management automation

Episode 6: Mooving to... Real release strategies with Jake Laverty

Aug 3, 2022 By Richard Whitehead In Moogsoft

Every product or application needs a release strategy. It’s how you can double check that everything in your deployment is appropriately tested, validated and verified. Having a standardized release strategy in place allows your team to follow a protocol and reduce the number of unknowns they must face in the product life cycle. However, there are a few considerations to make this critical process run smoothly.

Read Post

Moogsoft

Read more about Episode 6: Mooving to... Real release strategies with Jake Laverty

Automate incident response workflows with Eventarc and Datadog

Aug 2, 2022 By Thomas Sobolik In Datadog

Eventarc is a Google Cloud offering that ingests and routes events between GCP products, such as Cloud Run, Cloud Functions, and Pub/Sub, making it easy to build automated, event-driven workflows in complex environments. By taking care of event ingestion, delivery, authorization, and error handling, Eventarc reduces the development overhead that is required to build and maintain these workflows and helps you improve application resilience.

Read Post

Datadog

Read more about Automate incident response workflows with Eventarc and Datadog

What is Event Orchestration? 7 ways to start using this powerful new feature from PagerDuty to reduce noise and automate away manual toil today

Aug 2, 2022 By Vivian Chan In PagerDuty

Does your team deal with too much noise? Does your heart sink a bit when you think about how much your rulesets have sprawled in order to manage your event processing needs? That’s why we released Event Orchestration earlier this year to help teams reduce the amount of manual work that goes into event management. Event Orchestration is the next evolution of our Event Rules feature set, which helps to route, enrich, and modify events on ingest to remove noise and automate processes.

Read Post

PagerDuty

Read more about What is Event Orchestration? 7 ways to start using this powerful new feature from PagerDuty to reduce noise and automate away manual toil today

Dedicated Incident Channel Improvements for Slack on Webhooks V3 - Early Access

Aug 2, 2022 By Jorge Villamariona In PagerDuty

Today, we are excited to open Early Access for our improved Dedicated Incident Slack Channel. These improvements include: In order to take advantage of this feature you need to upgrade to Slack on WebHooks V3 and request Early Access from PagerDuty support. Once you are on the right version and have early access, there are two ways to create a dedicated incident channel.

Read Post

PagerDuty

Read more about Dedicated Incident Channel Improvements for Slack on Webhooks V3 - Early Access

Tell the story of your incident with timeline curation

Aug 2, 2022 By Martha Lambert In Incident.io

It isn’t the first time you’ve heard us say this and it won’t be the last: getting your post-incident process right is a game-changer. Being able to run effective debriefs and create useful postmortems helps us learn from our mistakes, respond better to future incidents and identify how we can build resilience in our product and teams. In short, it’s the thing the shifts the dial from just “fixing” to actually improving.

Read Post