Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

New Event Source - Website Monitoring

Enterprise Alert is constantly evolving to provide our customers with new ways to implement event sources and use new features. With version 9, several new features have been implemented that make it easier for customers to create alerts for specific processes and events. These include the new “Website Monitoring” event source.

Self-Service for Teams in Enterprise Alert

A few days ago I had an insightful conversation with one of our customers who inspired me to write this blog. He, like so many other customers, was facing the problem that his Enterprise Alert management overhead was increasing with each new team he added, as he had been managing resources such as event sources, notification channels and alert policies for the new teams as well. His question to us, therefore, was whether he could not also put these management tasks in the hands of the teams.

SRE Availability Metrics

How available is your website, service, or platform? What must you monitor and measure to ensure availability? How do you translate uptime into availability? This chart has numbers that every Site Reliability Engineer (SRE) should know. Below the chart, you will find answers to commonly asked questions about SRE and associated metrics.

Understanding a Microsoft Service Outage

Maintaining business continuity when an issue arises has proven to be a challenge many organizations struggle with. A global pandemic being thrown into the mix in Q1 of 2020 (one that many businesses are still navigating through) introduced a new set of problems for both service providers and businesses reliant on those services.

Enhance NOC Alerts With Incident Management and Alert Automation

In a network operations center (NOC), alerts originating from hundreds of servers, application monitoring systems, emails and ticketing services compete to catch a NOC analyst’s attention. NOCs face many challenges in parsing through alerts to identify actionable notifications and mobilize the right response team into action.

Celebrities Explain WTF is Incident Management

Our friends Felicia Day, Steve Wozniak, and Brian Baumgartner help us explain what the heck incident management is. FireHydrant is the only comprehensive incident management platform that allows you to create consistency for the entire incident response lifecycle to focus on fighting fires faster. From alert to retrospective, tracking, communicating, and reporting on results: FireHydrant will automate the process so you can focus on resolution. Visit firehydrant.io to learn how you can manage the mayhem.