Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Using Observability to Inspect and Adapt CI/CD Pipelines

In this blog post series, I’ve explored the relationship between observability and a set of software delivery lifecycle practices that help organizations adopt DevOps practices and change their ways of working from being project centric to product-centric. I started with Site Reliability Engineering, then considered Value Stream Management (VSM) and finish with this post on Continuous Integration and Delivery (CI/CD). Defining Continuous Integration

Thales accelerates incident resolution & decreases downtime with Exigence

Thales Cloud Protection & Licensing, part of the Thales Group, was looking to improve how it handles critical incidents. Whenever an incident hit just gathering up the incident team would be a cumbersome and time-consuming task that involved a lot of manual work . Multiple calendar invites would be sent to different people in and outside of the organization, multiple times, urging them to join calls and meetings.

How to maximize the value of SCOM - Monitoring, Alerts, Incidents & Visualization

In SCOM you can see the monitoring that generates your alerts (the contents of Health Explorer). While SCOM doesn’t always make it easy to get at the valuable context that this monitoring data provides, it is there and can help answer the "why" questions that often come up when looking at an alert in isolation.

Let's Talk AIOps: Part 1: What IS AIOps, Exactly?

This is the first in a two-part blog series deconstructing AIOps for ITOps leaders. If you gave me a dollar for every company that claims that they use “A.I.,” I’d be doing pretty well. But as a marketer, I can’t help but be a little skeptical about those claims. Let me explain.

Working with multiple on-call teams using Zabbix and iLert

This post outlines how to use Zabbix and iLert with multiple on-call teams, where each team is responsible for a set of host groups in Zabbix, and therefore, will only receive alerts for the services it is responsible for. But first, let’s start with the basic needs when being on-call.

Alerts vs Incidents vs ITSM

In order to effectively address production issues in your application, you need to have a strong incident response strategy. Incident response starts with an alert which leads to mobilization and response, and finally results in a record of all that happened and was learned from addressing issues. In this session of Dissecting DevOps, learn about the lifecycle of incidents from alert to post mortem and why incident response is as much a strategy as a process.

Retail Industry Trends 2020: All-In on Digital Since COVID-19

This is the first in a series of posts we’ll be publishing on trends we’re seeing in the retail industry and how IT organizations tasked with deploying and maintaining flawless digital customer experiences can take advantage of PagerDuty to ensure always-on reliability. It’s been a tough year for retail.

Fiserv Eliminates Ticket Overload with AIOps

Fiserv, the Fortune 500 payments and financial technology provider, needed to streamline and automate its IT incident management process to detect and fix issues earlier and more quickly. The incident management workflow was complex, primarily because mergers and acquisitions over the years had made Fiserv’s IT environment very heterogeneous. “The challenges we were facing were enormous,” IT Director Chris Kreps says.

Monitoring IoT devices using heartbeats and MQTT gateways

When working with IoT (internet of things) devices one of the key issues is to keep track of the health of all installations. Most of the time, especially with smaller devices, the applications (firmwares) are flashed for a single time during setup and stay untouched at their location of action for a long while.