Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Time to Upgrade? Why Traditional Pagers Are No Longer Enough

When it comes to time-sensitive events, instant, reliable communication is key. In the past, pagers were relied on for quick communications as they allowed people to communicate on the go and without access to a landline. But today, the availability of cellphones has made the portability of communication devices a standard feature, and communication technology has advanced significantly, begging the question – What is the use for pagers today?

Create a service catalog that grows with you

When your incident response process is centered around a service catalog, responders are able to more quickly pinpoint the service or functionality that’s down, bring in the team or experts, and then get to solving the problem faster. Saving even a few minutes can have a big impact on decreasing the costs around incidents and outages, so having up-to-date service details at your fingertips can make all the difference.

Squadcast + HaloPSA Integration: Enabling Streamlined Incident Response & Alerting

HaloPSA is a modern and intuitive all-in-one professional services automation (PSA) solution, designed for service providers. HaloPSA’s cloud platform helps you manage your entire business, modernize customer experience and automate your service. If you use HaloPSA for PSA requirements, you can integrate it with Squadcast, an end-to-end Incident Response and Reliability Workflow platform, to route detailed alerts from HaloPSA to the right users in Squadcast.

Developer environments should be cattle, not pets

Cattle, not pets is a DevOps phrase referring to servers that are disposable and automatically replaced (cattle) as opposed to indispensable and manually managed (pets). Local development environments should be treated the same way, and your tooling should make that as easy as possible. Here, I’ll walk through an example from one of my first projects at incident.io, where I reset my local environment a few times to keep us moving quickly.

Admin Panel - General Settings - xMatters Support

You can define the details for a company using the General Settings page accessed via the Admin menu. Depending on your permission level, you may not be able to view the General Settings screen. In addition, the settings you see on this page depend on both your role permissions and the features available in your product plan.

Redundancy for IT resilience: The backup guide for a disaster-proof network

Around six years ago on a Wednesday morning, software professionals worldwide were startled by a tweet from GitLab stating that they had accidentally deleted their production data, causing their site to go offline. Unfortunately, at that point in time, the open-source code repository giant had no idea that it would take them another 36 hours to restore their systems only to learn that 5,000 projects and 700 new user accounts were affected while they were fixing the outage.

The Guide to SRE Principles

Site reliability engineering (SRE) is a discipline in which automated software systems are built to manage the development operations (DevOps) of a product or service. In other words, SRE automates the functions of an operations team via software systems. The main purpose of SRE is to encourage the deployment and proper maintenance of large-scale systems.