Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

The Debrief: Build vs buy

Almost every organization around will eventually face an important crossroad: should I build the tooling I need, or buy it? But more often that not, the decision to buy is the most sensible one that'll save you the most time, effort, and even money. But there are some edge cases where building can be the right choice. In this chat with Isaac, product engineer at incident.io, we dive into this nuanced debate and explain why buying is your best bet...most of the time.

SLA vs. SLO vs. SLI: What's the Difference?

When it comes to managing services effectively, terms like SLA, SLO, and SLI are often thrown around like confetti at a parade. They’re in meetings, in documents, and even in casual office conversations. But if you’re new to the field or simply haven’t had the chance to dig into these acronyms, they can feel like a bewildering alphabet soup. And they can’t be missing on an uptime monitoring blog such as ours! So, what do these terms really mean?

A guide to post-mortem meetings and how we run them at incident.io

You've just made it through a particularly tough incident. It was a short outage affecting a subset of customers, so not exactly the end of the world, but bad enough that it involved multiple people across a number of teams to resolve. Either way, the incident was well managed, and the dust has settled. Now what? Most guidance would say that putting together a post-mortem document is a good idea, given the severity of the incident. You've also done this, so what's next?

Three Ways to Better Appreciate your SREs and DevOps Engineers

DevOps engineers and Site Reliability Engineers are vitally important to the continued health of your product and business. We all know it’s true, and yet people in these roles often feel underappreciated and undervalued. This sort of work runs into the issue of “when process and infrastructure break, it gets shoved in the spotlight; but when everything works perfectly, no one notices.” ‍

How AIOps modernizes CMDBs to drive accuracy and value

Maintaining your Configuration Management Database’s (CMDB) accuracy, keeping it fully updated, and improving its performance is a frustrating and elusive goal for ITOps and IT leaders. Aiming for this ‘golden’ CMDB standard can feel like running on a treadmill where you’re putting in a lot of work, but remain as distant as ever from your goal. Can IT leaders ever catch up?

Bridging the ITIL vs DevOps Mindset: CI/CD Best Practices for ITIL Organizations

DevOps practices in software development have revolutionized the way updates are released. However, many companies entrenched in ITIL practices find it challenging to seamlessly integrate with the DevOps practice of Continuous Integration and Continuous Delivery/Deployment (CI/CD). This is because ITIL focuses on stability, which suits older systems, while DevOps is ideal for modern setups with its agile, automated practices.

Revolutionizing your Grafana setup with intelligent alerting

Once upon a time, in the bustling city of DataVille, lived a team of dedicated IT professionals tirelessly working to maintain the city’s digital heartbeat. Their mission was to ensure the smooth operation of their city’s digital infrastructure, which was not limited to the daytime operations but extended beyond business hours. They were the unsung heroes, the guardians of the city’s data. Their tool of choice? Grafana, a powerful open-source platform for observability.