Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

What the Big Brother Approach to IT Monitoring and Incident Management May Be Missing

We asked in a recent poll which popular TV show your IT team resembles the most. Big Brother came out on top, with almost 40% of respondents saying that their incident resolution process most resembled this show. Would you compare your incident management process to an episode of Big Brother? If so, it's likely that your IT environment is highly monitored, but incidents still seem to slip through the cracks.

SLA vs SLI vs SLO: Know the differences between them.

SLA basically means a Service Level Agreement. It’s a formal agreement between you and your customer. It basically describes the reliability of your product/service so you can have a formal agreement which basically says our product will be online 99 percent of the time annually and if we fail to achieve that objective we will give 30% of your annual license fee back. SLA’s also include penalties in the contract.

The U.S. COVID Vaccine Distribution Plan: Challenges and Solutions

As coronavirus (COVID-19) continues to spread and new virus strains emerge, the public is frantically looking for answers regarding the U.S. government’s vaccine distribution plan. A sound vaccine distribution plan is especially crucial in times like these. All U.S. states, stretching from both coasts, are experiencing a vast number of COVID-related deaths and hospitalizations. The dire situation underscores the importance of having an effective, accelerated vaccine delivery process.

New Feature: Incident types

Incidents are inevitable, and the reality is some of them are inevitably going to repeat themselves. FireHydrant has always strived to make the entire incident response lifecycle smooth, but up until today, common incident types were slightly burdensome for our customers. We decided it was time to help people make it easy to declare incidents using easy-to-use templates, which we’re deeming Incident types.

Who Else Wants to Increase Development Velocity?

Implementing SRE is fundamentally about shifting culture, but it often means adding new tooling and processes to your team's workflows to support that cultural change. Teams add new steps and checks to incident response procedures. Incident responders write retrospectives and create new meetings to review them. Engineers consult new tools like monitoring dashboards and SLOs. In other words, SRE creates another layer of consideration in development and operations.