Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

The Incident Retrospective Ground Rules

I joined Honeycomb as a Staff Site Reliability Engineer (SRE) midway through September, and it’s been a wild ride so far. One thing I was especially excited about was the opportunity to see Honeycomb’s incident retrospective process from the inside. I wasn’t disappointed! The first retrospective I took part in was for our ingestion delays incident on September 8th.

Global bank transforms incident alert management & communications

One of the top 10 largest financial services companies in the world 200,000+ employees worldwide. Serving tens of millions of customers. With operations in more than 60 countries, the Interlink Incident Alert Management app serves an audience of thousands of service owners and business stakeholders - across 20+ global markets.

Building a metrics backend (time series db) with PostgreSQL and Rust

At ilert customers are already benefitting from our easy to setup private or public status pages and auto generated SLA uptime graphs for their business services. However, we decided to push the graph topic a bit further with custom metrics. Using ilert metrics customers can showcase additional business data and insights into their services on their status pages.

Services with a Smile: Service Graph and Service Standards

Principal Product Manager Davis Godbout joined the HowTo Happy Hour to talk about the Service Graph and Service Standards features in the PagerDuty platform. Service Graph gives your teams a visual representation of the relationships among your technical and business services. Service Standards guides teams to the benefits of PagerDuty’s features like integrations and configurable incident urgency.

Integrations on Rails: How we build and deploy integrations at FireHydrant

Implementing integrations without a mountain of technical debt can be challenging. But it doesn’t have to be all bugs, burn out, and outages when shipping integrations at a high volume. We’ve unlocked a pattern at FireHydrant to rapidly build and release integrations without swiping the technical debt credit card each time — and that gave us a fastlane to building premier integrations.

New features + new CI: Metrics, Status Page Widget, PandoraFMS, Automation rules, Alert report export

This post highlights some of the features and improvements that we have released in the last month. If you want to submit your own ideas or vote on existing feature requests, you can now use our new public roadmap at roadmap.ilert.com. ‍