Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Carrier reduced MTTR and gained visibility across multiple IT environments

Hear Rich Johnston, Director of Hosting Platforms, describe Carrier’s observability goals to create a unified view of their IT environment for predictive monitoring. Rich describes Carrier’s desire to see issues before customer complaints, and how LogicMonitor implemented extensive visibility on a single platform, including multiple cloud platforms, networking, compute, storage, and more. LogicMonitor helped Carrier quickly and easily deploy dashboards to see how their technology performed, while reducing root cause analysis and shortening resolution time.

Tips on making on-call manageable

On-call responsibilities are a crucial part of many industries, ensuring that businesses can provide round-the-clock support to their customers. However, the demanding nature of on-call duty can lead to burnout and reduced productivity if not managed effectively. In this article, we will explore various strategies and tips to make on-call more manageable, enabling professionals to maintain a healthy work-life balance and deliver exceptional service.

Docker Compose Logs: Guide & Best Practices

Docker Compose is a tool for defining and running multi-container Docker applications. It allows developers to streamline the process of configuring, building, and running multiple containers as a single unit with a docker-compose.yml. This configuration file specifies the services, networks, and volumes required for an application, and their relationships and dependencies. The docker-compose logs command displays the logs of all services defined in the docker-compose.yml file.

How Schneider Electric reduced MTTI and alert noise by consolidating monitoring tools

Hear Observability and Monitoring Strategist, Arun Mandayam, describe challenges that Schneider Electric faced around data interpretation and difficulties when using multiple monitoring tools. Arun describes how LogicMonitor helped consolidate monitoring tools, enabled them to onboard new cloud accounts, network devices, and on-prem systems on a unified platform, and helped significantly reduce MTTI and alert noise.

Incident Management vs Problem Management

In the dynamic landscape of IT service management, ITSM, two concepts reign supreme - Incident Management and Problem Management. They might seem similar, and many use these terms interchangeably, but they serve distinct purposes. Through this article, we’ll navigate the nuanced differences between Incident Management and Problem Management, and apply these concepts in our own approach to incident management.

Synchronizing mental models

In the heat of an incident, having a clear and shared understanding of what’s going on is absolutely crucial to effective response. But often what actually happens is that people involved in incidents build their own picture and narrative of the event, shaped by their own expertise, their past experiences, and what they’re seeing and hearing as the incident develops. The pictures and perspective people build is often referred to as a mental model.

Strengthen Your DORA Metrics with PagerDuty

For technical teams, the findings from DORA provide a model for measuring and improving performance. With almost a decade of data gathered from more than 33,000 professionals worldwide, the capabilities and frameworks detailed by the research help teams pinpoint areas for improvement and areas to celebrate. The team at DORA categorizes capabilities into three sections: Technical Capabilities, Process Capabilities and Cultural Capabilities.

The Art of Alert Management

With the ever-growing landscape of digital technology and the internet of things (IoT), businesses are becoming increasingly reliant on complex systems to monitor and manage their operations. This dependency has resulted in an explosion of alerts and notifications, overwhelming IT teams and affecting overall productivity. It’s never been more critical to have an effective alert management strategy in place to ensure the smooth running of your organization.

Announcing Catalog - the connected map of everything in your organization

One of the most painful parts of incident response is contextualizing the problem and understanding how and where it fits within your organization. If responders are unable to answer basic questions such as: Then you waste valuable time talking to the wrong people or solving the wrong problems — ultimately extending impact and hurting your response. It’s a common issue that, up until now, didn’t have a clear solution or workaround.

From Expense to Excellence: Transforming ITOps in 2023 through Strategic IT cost optimization

Most organizations view their tech and network operations center and their budgets as simply the cost of running their internal and external IT services. However, through IT cost optimization, you can improve how your Ops center team responds to service issues and save valuable resources too. So, what specifically is IT cost optimization?