Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Adaptable Incident Response With Splunk Phantom Modular Workbooks

Splunk Phantom is a security orchestration, automation and response (SOAR) technology that lets customers automate repetitive security tasks, accelerate alert triage, and improve SOC efficiency. Case management features are also built into Phantom, including “workbooks,” that allow you to codify your security standard operating procedures into reusable templates.

No More False Alerts at Night

Do you know this situation? You are on-call and in the middle of the night you get a phone call. Loud enough to wake you up. Loud enough to wake your wife up, as well. You stand up and check your emails to see what the problem is. OK, you got it. Then you log on to the console of your monitoring tool and – green. Green? False alert? Why did you get the call then? After double-checking, still a bit sleepy, you recognize that the problem has been recovered automatically.

Better Than 'Business As Usual': Rethinking How PagerDuty Works in a Post-COVID-19 World

Earlier this year, as COVID-19 appeared, our global community of almost 800 employees became a fully remote workforce—effectively overnight. Now, all of us have had a taste of what it’s like to work from home all the time, from embracing the benefits of less time commuting and more time with our families, to the downsides of feeling isolated and missing seeing our colleagues in “real life.”

Using Observability to Inspect and Adapt CI/CD Pipelines

In this blog post series, I’ve explored the relationship between observability and a set of software delivery lifecycle practices that help organizations adopt DevOps practices and change their ways of working from being project centric to product-centric. I started with Site Reliability Engineering, then considered Value Stream Management (VSM) and finish with this post on Continuous Integration and Delivery (CI/CD). Defining Continuous Integration

Thales accelerates incident resolution & decreases downtime with Exigence

Thales Cloud Protection & Licensing, part of the Thales Group, was looking to improve how it handles critical incidents. Whenever an incident hit just gathering up the incident team would be a cumbersome and time-consuming task that involved a lot of manual work . Multiple calendar invites would be sent to different people in and outside of the organization, multiple times, urging them to join calls and meetings.

Improved Pagerduty Integration with Detailed Alerts

AppSignal now supports the next API version of PagerDuty. 🎉 One of our devs was on support rotation the other day, and a customer asked whether we could add support for the next API version of PagerDuty. We won’t tell you who it was, but this developer typically answers questions by solving things as quickly as he can. So, two days later, boom! The improved integration for Pagerduty went live.

More Chatbots - Slack, Mattermost, Microsoft Teams, and Google Chat

Today, we are excited to announce PagerTree has added 3 new chatbot services including Mattermost, Microsoft Teams and Google Hangouts Chat (this is in addition to our core Slack notification channel). Chatbots are available on all pricing tiers free of charge! :) If you don’t already have an account, sign up for a free-trial now. Our chatbots are will post alert details to a “channel” of your choice.

Let's Talk AIOps: Part 1: What IS AIOps, Exactly?

This is the first in a two-part blog series deconstructing AIOps for ITOps leaders. If you gave me a dollar for every company that claims that they use “A.I.,” I’d be doing pretty well. But as a marketer, I can’t help but be a little skeptical about those claims. Let me explain.

Working with multiple on-call teams using Zabbix and iLert

This post outlines how to use Zabbix and iLert with multiple on-call teams, where each team is responsible for a set of host groups in Zabbix, and therefore, will only receive alerts for the services it is responsible for. But first, let’s start with the basic needs when being on-call.

Retail Industry Trends 2020: All-In on Digital Since COVID-19

This is the first in a series of posts we’ll be publishing on trends we’re seeing in the retail industry and how IT organizations tasked with deploying and maintaining flawless digital customer experiences can take advantage of PagerDuty to ensure always-on reliability. It’s been a tough year for retail.