%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

The 2024 List of Incident Management Resources

Nov 18, 2024 By Hrishikesh Barua In IncidentHub

This article is an attempt to list the best incident management material and guides available for free on the internet. If I've missed something you think should be here, do let me know and I'll be happy to add it.

Read Post

IncidentHub

Read more about The 2024 List of Incident Management Resources

Salesforce Outage Disrupts Services Globally: Updates and Timeline

Nov 15, 2024 By Nuno Tomas In isDown

Today, November 15, 2024, Salesforce customers worldwide faced significant disruptions due to a service outage that began early in the morning (UTC). The outage affected multiple Salesforce instances and a range of other production and sandbox environments. This incident has left many businesses unable to access critical services, causing widespread frustration and operational delays. Here’s a detailed breakdown of the situation, what’s being done, and where you can find the latest updates.

Read Post

isDown

Read more about Salesforce Outage Disrupts Services Globally: Updates and Timeline

Enhance observability with AI-powered IT operations

Nov 14, 2024 By Sam Osborn In BigPanda

Your organization probably relies on a collection of observability tools to track specific elements of its IT stack. You’re not alone; a recent survey from Enterprise Strategy Group showed that most organizations have six or more observability solutions. Our research found that the average BigPanda customer uses 20 observability and monitoring data sources!

Read Post

BigPanda

Read more about Enhance observability with AI-powered IT operations

Ask the Expert: Insights from Paula Thrasher, Senior Director of Infrastructure and Platform, PagerDuty

Nov 14, 2024 By PagerDuty In PagerDuty

In this blog post, Paul Thrasher, Senior Director of Infrastructure and Platform at PagerDuty, provides her takes on the challenges and opportunities facing tech leaders today. From managing complexity to driving operational resilience, Thrasher shares expert insights on how executives can get ahead of disruptions.

Read Post

PagerDuty

Read more about Ask the Expert: Insights from Paula Thrasher, Senior Director of Infrastructure and Platform, PagerDuty

The Ultimate Guide for Enterprise DevOps

Nov 14, 2024 By xMatters In xMatters

Speed and reliability in incident management have always been the formula for many businesses’ success. But what happens when this already demanding workflow needs to be done at scale? The answer is adopting enterprise DevOps methodologies to scale operations efficiently. DevOps benefits are magnified when they are correctly scaled across an entire enterprise. In this comprehensive guide, we’ll explore enterprise DevOps’s fundamental principles, challenges, and components.

Read Post

xMatters

Read more about The Ultimate Guide for Enterprise DevOps

How we handle sensitive data in BigQuery

Nov 14, 2024 By Lambert Le Manh In Incident.io

As a provider of incident management software, we at incident.io manage sensitive data regarding our customers. This includes Personally Identifiable Information (PII) about their employees, such as emails, first names, and last names, as well as confidential details regarding customer incidents, such as names and summaries. Consequently, we approach the management of this data with a great deal of care.

Read Post

Incident.io

Read more about How we handle sensitive data in BigQuery

How to Configure a Remote Data Store for Prometheus

Nov 13, 2024 By Hrishikesh Barua In IncidentHub

The Prometheus monitoring tool can store its metrics either locally or remotely. You can configure a remote data store using the remote_write configuration. This article describes the various data store options available as well as how to set up a remote store.

Read Post

IncidentHub

Read more about How to Configure a Remote Data Store for Prometheus

New BigPanda features accelerate IT incident response

Nov 13, 2024 By Nathan Bao In BigPanda

ITOps teams are inundated with a significant volume of alerts each day. Sifting through these alerts to discern which ones are harmless and which could lead to major incidents is a time-consuming and tedious task. This process often involves hunting for information across disparate data sources, tools, and workflows. As a result, the investigation can slow down incident response times, negatively affecting service reliability and customer satisfaction.

Read Post

BigPanda

Read more about New BigPanda features accelerate IT incident response

3 Ways to Streamline Kubernetes Operations with PagerDuty Automation

Nov 11, 2024 By Joseph Mandros In PagerDuty

Kubernetes popularity continues to grow, with over 60% of organizations maintaining multiple Kubernetes across diverse environments and teams in some capacity. However, as clusters multiply, so do operational challenges: from monitoring hundreds of microservices to responding to and escalating incidents across distributed systems.

Read Post

PagerDuty

Read more about 3 Ways to Streamline Kubernetes Operations with PagerDuty Automation

Building an AI Chatbot Playground with React and Vite

Nov 11, 2024 By Marko Simon In iLert

Read how we set up an experimental chatbot environment that allows us to switch LLMs dynamically and enhances the predictability of AI-assisted features' behavior within the ilert platform. The article includes a guide on how you can build something similar if you plan to add AI features with a chatbot interface to your product.

Read Post