Incident Management


2019 Hurricane Season: Solidify a Business Continuity Plan With a Mass Notification Solution

Summer is typically synonymous with beach days, outdoor barbecues and fulfilling weekend getaways. Unfortunately, the summer months aren’t only about enjoyable moments and exciting vacations. It’s also tropical storm season, with higher risks of destruction, community displacement and business operation disruption. With this potential for human and business peril, it’s important for organizations to implement a business continuity plan, equipped with a robust communication strategy.


ChatOps and Using Hubot for Incident Response

ChatOps is a method for using tools to execute commands, surface alert context and take action directly through chat. You can align human workflows with application and infrastructure health, making it easier to communicate and fix problems from a single tool. DevOps and IT teams are using chatbots like Hubot to execute commands directly from chat tools like Slack or Microsoft Teams.


Best Practices for Managing Multiple On-Call Teams

Alerting has come a long way from the days of paging an on-call administrator in the middle of the night, to multiple on-call teams that run and manage incident response around the clock. This is because as organizations grow and scale, responding to incidents also gets more complex and you often need more than one team to get involved to successfully resolve an incident.


Keep stakeholders in the know with Incident Timeline from Opsgenie

Technology is changing the world faster than ever. Thanks in part to the rise of the Software-as-a-Service (SaaS) model, customers have come to expect the apps they use to be accessible at all times. As a result, companies are transforming the way their teams operate in order to meet these demands. And perhaps no team experiences the impact of a transformation like this more than IT.


Serverless Event-Driven Workflows with PagerDuty and Amazon EventBridge

This week’s AWS Summit in New York was an exciting one for both AWS and PagerDuty. The AWS team rolled out Amazon EventBridge, a set of APIs for AWS CloudWatch Events that makes it easy for AWS SaaS partners to inject events for their customers to process in AWS. PagerDuty is excited to continue and deepen our long partnership with AWS by supporting EventBridge as a launch partner.


No CMDB? No problem. Not for BigPanda.

I hear it all the time when talking to future BigPanda customers; “I’m not sure BigPanda can really help me correlate all these alerts together because our CMDB is very immature.” Or sometimes, they don’t even have a CMDB, and incorrectly assume this disqualifies them from meaningful noise reduction and alert correlation. I’m happy to tell you the same thing I tell the folks who are looking at BigPanda for the first time. “No CMDB? No problem!”.


Proactive Incident Response With Contextual Monitoring and Alerting

In a world of rapid software delivery and CI/CD (continuous integration and delivery), things break. Servers go down, third-party services fail and new code in production can cause unforeseen incidents. So, effective monitoring and alerting are imperative to maintaining highly reliable services. With context appended to alerts, on-call responders can quickly identify the services that are having issues and get those alerts to the right people.


Assessing the Per-Minute Cost of an Outage for YOUR Company

Software vendors and analysts love to rattle off scary numbers about how many thousands of dollars per minute or hour an infrastructure outage will cost the typical company. Those numbers can be scary indeed; for example, Gartner quotes $5,400 per minute as the cost borne by a medium to large-sized retailer. Your company, however, is most likely not identical to the “typical” company on which the numbers are based.