Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Production vs Local in engineering: Piyush Verma - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The Link Between Early Detection and Internet Resilience: A Lesson from Salesforce's Outage

Almost every study examining the hourly cost of outages invariably leads to a clear and undeniable conclusion: outages are expensive. According to a 2016 study, the average cost of downtime was estimated at approximately $9,000 per minute. In a more recent study, 61% of respondents stated that outages cost them at least $100,000, with 32% indicating costs of at least $500,000 and 21% reporting expenses of at least $1 million per hour of downtime.

The Single Pane of Glass in Modern Observability

Recently I caught up with Jamie Allen on Episode 67 of the Slight Reliability podcast to discuss the idea of a single pane of glass (SPOG). Jamie had written an article titled The Single Pain of Glass which coincidentally was what I titled Slight Reliability Episode 10. I thought given our shared use of puns and this topic that it was worth a conversation! So, what is a single pane of glass? Is it an idea with practical application? How does it fit into the world of modern observability?

Harmonizing Digital Channels and Business Operations to Deliver a Good Customer Experience

In celebration of Customer Experience Day 2023, this post is part of a series on customer experience and the ways that Splunk strifves to deliver superior customer experience at every level. Today, customers interact with brands through a variety of channels and platforms. In fact, 57% of customers prefer to engage with brands through digital channels first.

Simplifying Microsoft Teams Troubleshooting for IT Teams

Microsoft Teams has become the go-to platform for seamless collaboration and communication. However, like any technology, performance issues can arise, and these issues affect user experience and productivity. For IT teams tasked with Microsoft Teams troubleshooting, having access to comprehensive data is key. In this blog, we explore the challenges faced by IT teams and how harnessing more data can make the process significantly easier.

How We Did It: Data Ingest and Compression Gains in InfluxDB 3.0

A few weeks ago, we published some benchmarking that showed performance gains in InfluxDB 3.0 that are orders of magnitude better than previous versions of InfluxDB – and by extension, other databases as well. There are two key factors that influence these gains: 1. Data ingest, and 2. Data compression. This begs the question, just how did we achieve such drastic improvements in our core database? This post sets out to explain how we accomplished these improvements for anyone interested.

Top 10 Tools to Monitor Core Web Vitals of Your Website

What guarantees the success of a website today isn’t just its content and design; delivering a seamless and efficient user experience (UX) is also extremely critical. This is where Core Web Vitals are important as they provide a collection of performance metrics to evaluate the quality of website user experience. Core Web Vitals are critical to attract visitors and retain them as they directly impact a site’s visibility on Google.

Configuration Drift: Understanding, Avoiding, Managing and Resolving in Kubernetes

If you work with Kubernetes, you know that any number of issues can pose a serious threat to the stability and security of your deployments. One that's subtly damaging is configuration drift, which occurs when the actual state of how your system is set up — its configuration — strays from the way you defined. Configuration drift in Kubernetes can happen when people make changes manually, systems aren't synchronized properly or monitoring falls short.