Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

The Unplanned Show, Episode 5: DataOps with Snowflake

Long gone are the days when data is batch loaded into a data warehouse for business intelligence reports that are looked at periodically and if something is broken, a few internal people would have to wait. Today, data pipelines are “infinitely more complicated”, with more sources from cloud services to on premises systems, and supporting data applications that are critical parts of a business’ ecosystem.

Critical Incident Management - Roles and Responsibilities

Critical Incident Management is designed to handle disruptive and unexpected events that threaten to harm an organization or its stakeholders. These incidents range from cyber attacks and system failures to natural disasters and global pandemics. The importance of critical incident management cannot be overstated, as it is a pivotal process that maintains business continuity and ensures smooth operations despite adversities.

How we leverage our product responder role to push our pace of development

Like many of our own customers, at its heart, incident.io is a software company. Because of this, it means that our work is never truly “done." One of our primary goals is to help people coordinate their response to situations where things haven’t gone well, and make it easy to always do the right thing. But we know that there will always be bugs to fix, features to be introduced and improvements to be made, as evidenced by our changelog.

How Incident Tracking Can Benefit Your IT Organization

In the dynamic world of Information Technology (IT), incident tracking is a critical process within the realm of incident management that can significantly influence an organization’s operational efficiency and service quality. Incident management refers to the identification, recording, and management of incidents—unplanned events or disruptions—that can impact IT services.

How our engineering team uses Polish Parties to maintain quality at pace

It’s fair to say that delivering software faster has never been more relevant. But in doing so, it’s easy to let your bar for quality slip. Often, the guardrail to avoid this is to hire dedicated QA Engineers, whose sole job is to ensure your software works as it should and to spot any issues that arise. Seems sensible, right? Well, at incident.io, we take a different approach.

What Is Site Reliability Engineering? Understanding the complexities of this crucial function

Site reliability engineers manage a lot, and often in incredibly high-stakes environments. Remember that scene from "The Matrix" where Neo dodges bullets in slow motion? Of course you do. As an SRE, it can feel like you're the person getting hit by those bullets, frantically trying to investigate performance issues, automate away toil, and support the engineers around you, all before the next wave of attacks.

How we achieved pixel-perfect polish during our Status Pages launch

A few months ago, we released Status Pages. This project was quite different from anything we’ve approached before, given that: And our goals were a departure from one's we had set in the past: With this in mind, we worked closely with our designer throughout the process of building Status Pages. Here is how we approached it and a few lessons we learned along the way!