Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

stackpulse

SRE Culture: How to Put Reliability First

Unreliable services can affect businesses in myriad ways, from slowed development velocity, to unhappy users, to impacted revenue streams. Reliability often takes a backseat to feature releases and other business initiatives that drive development requirements. This post will discuss key elements of SRE practice that you can use to instill a reliability-first culture in your organization, while also meeting business requirements and keeping your users happy.

onpage

Using OnPage to Deliver Exceptional Customer Support

The OnPage Customer Support team consists of knowledgeable, friendly technicians that offer 24/7 assistance. Support recognizes the importance of client relationships and always aims to achieve maximum customer satisfaction. The OnPage incident management system is at the center of Support’s quality service delivery. OnPage triggers instant, critical mobile alerts to technicians whenever customer-initiated tickets are created.

alertops

What is DevOps?

What is DevOps? DevOps is a term for a cluster of concepts that has become a movement, “a cross-disciplinary practice dedicated to the study of building, evolving and operating, rapidly-changing resilient systems at scale.” (Jez Humble) The definition of DevOps is not agreed upon by everyone because of the complex processes attached to the term, however, the benefits to teams are universally agreed upon.

blameless

SRE as Organizational Transformation: Lessons from Activist Organizers

In the software industry’s recent past, the biggest disruptive wave was Agile methodologies. While Site Reliability Engineering is still early in its adoption, those of us who experienced the disruptive transformation of Agile see the writing on the wall: SRE will impact everyone. Any kind of major transformation like this requires a change in culture, which is a catch-all term for changing people’s principles and behaviors.

Sponsored Post

Top 5 Benefits of a Site Reliability Platform

One of the most important aspects of a software system is its reliability - and for good reason. With so many digital options available in every industry, customers have little reason to continue utilizing applications or services that experience frequent issues with quality or availability. Therefore, it's critical that organizations invest in the processes and tools that are necessary to ensure system reliability. Utilizing reliability platforms is one way to increase or maintain the quality and reliability of an application. Keep reading for an overview of the functionality provided by reliability platforms and the specific ways in which such platforms provide value to the business.

datadog

Accelerate your logs investigations with Watchdog Insights

If you’re investigating an incident, every minute means degraded performance or even downtime for customers. The causes of an issue often come from parts of your systems and applications that you would not think to check, and the sooner you can bring these to light, the better.

blameless

SRE2AUX: How Flight Controllers were the first SREs

In the beginning, there were flight controllers. These were a strange breed. In the early days of the US Manned Space Program, most american households, regardless of class or race, knew the names of the astronauts. John Glen, Alan Shepard, Neil Armstrong. The manned space program was a unifying force of national pride. But no-one knew the names of the anonymous men and later, women, who got the astronauts to orbit, to the moon, and most importantly, got them back to earth.

stackpulse

Top 5 Tools for the Best SRE Stack

Site Reliability Engineering (SRE) can mean different things to different companies; and operators that are responsible for reliability typically use a DevOps toolset. However, one thing is certain: SREs combine the skills of software engineers and production and operations management, to achieve high reliability and ensure that SLO/SLA targets are met. So SREs not only need to get a firm grip on the technologies involved in the system, but also on the intricacies of production deployments.

6 incident management hacks to implement using ServiceDesk Plus

Ever wondered how enterprises like Zoho, with over 50 SaaS applications and more than 180,000 customers, handle the spectrum of IT incidents they face? Download this free e-book now to get an insider look into the incident response and management processes that Zoho has perfected over the years.

6 incident management hacks to implement using ServiceDesk Plus Cloud

Ever wondered how enterprises like Zoho, with over 50 SaaS applications and more than 180,000 customers, handle the spectrum of IT incidents they face? Download this free e-book now to get an insider look into the incident response and management processes that Zoho has perfected over the years.