Monthly Archive

Sponsored Post

Innovating Security with Managed Detection & Response (MDR) and ChaosSearch

Sep 30, 2025 By David Bunting In ChaosSearch

Managed Detection and Response (MDR) services occupy an important niche in the cybersecurity industry, supporting SMBs and enterprise organizations with managed security monitoring and threat detection, proactive threat hunting, and incident response capabilities. In this week's blog, we're taking a closer look at the role of MDRs in cybersecurity, the biggest challenges they face, and how integrating ChaosSearch is helping MDRs manage complexity, reduce data retention costs, and enable long-term security analytics use cases that are critical for customer success.

Read Post

ChaosSearch

Read more about Innovating Security with Managed Detection & Response (MDR) and ChaosSearch

Sponsored Post

7 Downdetector Alternatives

Sep 30, 2025 By StatusGator In StatusGator

Downdetector is one of the best-known outage-tracking platforms, but its consumer-first approach has limitations for technical teams. Its reliance on user-submitted incident reports makes it prone to noise, false positives, and incomplete coverage of B2B and cloud-specific services. That's why we're exploring the best Downdetector alternatives available today, and highlighting which ones work best for businesses.

Read Post

StatusGator

Read more about 7 Downdetector Alternatives

Reducing Alert Fatigue in Microsoft SCOM

Sep 30, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

Alert fatigue is one of the most common challenges organizations face when using Microsoft System Center Operations Manager (SCOM). The sheer volume of notifications from servers, applications, network devices, and cloud services can overwhelm IT teams, making it difficult to distinguish between critical incidents and low-priority events.

Read Post

NiCE IT Mgmt

Read more about Reducing Alert Fatigue in Microsoft SCOM

AWS Health integration is here!

Sep 30, 2025 By Valeria Kurolapova In StatusGator

We’re excited to roll out a new addition to StatusGator’s Private Ingestion suite: AWS Health integration — now available for Enterprise customers.

Read Post

StatusGator

Read more about AWS Health integration is here!

Your infrastructure Is more distributed than you think.

Sep 30, 2025 By Catchpoint In Catchpoint

An eCommerce platform, a banking app, even a simple user portal depends on a web of APIs, cloud tools, hosting services, and edge networks. Each one introduces another potential point of failure. And when those dependencies break? User experience suffers. Brand trust takes a hit. Millions in revenue are at risk. That’s why leading digital businesses, especially in eCommerce and banking, are expanding visibility beyond the application stack.

View Video

Catchpoint

Read more about Your infrastructure Is more distributed than you think.

Resolve website transaction bottlenecks faster with Step Summary and Step Performance Reports

Sep 30, 2025 By ManageEngine Site24x7 In Site24x7

Ever wondered why some steps on your website feel slower than others? In this video, we’ll show you how to spot slow logins, delayed checkouts, and page load issues, and how to pinpoint their causes so you can fix them fast using the Step Summary and Step Performance reports. You’ll learn how to access these reports, what insights they provide, and how they help you quickly pinpoint performance bottlenecks to ensure a seamless user experience.

View Video

Site24x7

Monitoring

Read more about Resolve website transaction bottlenecks faster with Step Summary and Step Performance Reports

What Is RabbitMQ And How Do You Manage It With Kubernetes?

Sep 30, 2025 By Navdeep Sidhu In meshIQ

The world of Kubernetes and RabbitMQ evolves rapidly. Our popular 2022 post laid the groundwork for HA deployments; now, join us for the crucial 2025 update to ensure your architecture remains cutting-edge. As organizations continue their powerful shift from monolithic architecture (where all the code building the application exists as a single, monolithic entity) to microservices architecture.

Read Post

meshIQ

Read more about What Is RabbitMQ And How Do You Manage It With Kubernetes?

The Compliance Shortcut: Automation as the New Operating System for Resilience

Sep 30, 2025 By ScienceLogic In ScienceLogic

For years, compliance has been synonymous with checklists, manual reporting, and time-consuming audits. That definition no longer holds. In our September 2025 webinar, Patrick Hubbard, Technical Marketing Director, led a conversation with JB Baker, Vice President of Product Engineering, and Marc Jensen, Channel Sales Engineer. Together, they showed how automation is transforming compliance into something far more strategic: the foundation of modern resilience.

Read Post

ScienceLogic

Read more about The Compliance Shortcut: Automation as the New Operating System for Resilience

How to Boost Revenue and Cut Network Spending with Kentik Traffic Costs

Sep 30, 2025 By Lauren Basile In Kentik

Network operators across the digital ecosystem are under pressure to cut costs while protecting revenue. This post explores three practical use cases where Kentik Traffic Costs helps turn traffic insight into commercial intelligence that helps teams negotiate smarter, protect margins, and boost profitability.

Read Post

Kentik

Read more about How to Boost Revenue and Cut Network Spending with Kentik Traffic Costs

Getting Started With Copilot Log Analysis - Demo

Sep 30, 2025 By Splunk In Splunk

The team also published a blog focusing on Audit Logs and Microsoft Office365 Copilot Activity Logs using Splunk Add-on for Microsoft Office 365. This Splunk Add-on allows Splunk to pull service status, service messages and management activity logs from Office 365 Management API.

View Video

Splunk

Read more about Getting Started With Copilot Log Analysis - Demo

From Logs to Insights: Accelerate Customer-Impact Analysis with Datadog Sheets

Sep 30, 2025 By Datadog In Datadog

Datadog Sheets helps you move from log exploration to actionable insights quickly and with no code required. In this demo, see how to enrich logs with Salesforce data, build pivot tables, uncover customer impact trends, and build shareable reporting, all within Datadog.

View Video

Datadog

Read more about From Logs to Insights: Accelerate Customer-Impact Analysis with Datadog Sheets

Kubernetes monitoring 101: Best practices to kickstart your journey

Sep 30, 2025 By Grace Nalini In Site24x7

Use this guide to help you build a solid observability foundation without getting overwhelmed and get started with the best practices for a practical Kubernetes management. Starting your Kubernetes journey can feel like diving into the deep end; with hundreds of metrics, endless logs, and a growing list of tools, it's easy to lose focus. But here's the good news: you don't need to monitor everything from day one. Instead, start small.

Read Post

Site24x7

Read more about Kubernetes monitoring 101: Best practices to kickstart your journey

Sentry Agent for Linear is in beta!

Sep 30, 2025 By Sentry In Sentry

Wouldn't that issues list be a little less intimidating if you had some help from Sentry? We're shipping our Sentry Agent for Linear, and you can use it to call Seer and get Root Cause Analysis and Solutions back. Check it out!

View Video

Sentry

Read more about Sentry Agent for Linear is in beta!

5 Tools for Monitoring WebSocket Connections in Real Time

Sep 30, 2025 By Super Monitoring In Super Monitoring

What if your app, website, or online platform suddenly starts crashing? Users cannot connect with the application, nothing is loading, and complaints start coming in. You contact your developer. They checked the backend technicalities like API, server, and databases, and everything seems fine. So, what is the real problem here? In many real-time applications, this issue lies one layer deeper. Most people often overlook this issue, and that is: WebSocket connections.

Read Post

Super Monitoring

Read more about 5 Tools for Monitoring WebSocket Connections in Real Time

Paving the way for a new era: Mezmo's Active Telemetry

Sep 30, 2025 By Mezmo In Mezmo

The world of software development has fundamentally changed. We've moved from monthly releases to continuous delivery measured in minutes, and the rise of AI means velocity is no longer just a goal—it's a requirement for survival. But this relentless speed has exposed a critical flaw in how we approach observability. The industry relies on a "store first, ask questions later" model where you collect every log, metric, and trace, and then hope to find the root cause when something breaks.

Read Post

Mezmo

Read more about Paving the way for a new era: Mezmo's Active Telemetry

Datadog Feature Flags, track Claude costs, migrate historical logs, and more | This Month in Datadog

Sep 30, 2025 By Datadog In Datadog

See how you can reduce risk during feature rollouts in September’s This Month in Datadog. This episode, we spotlight Datadog Feature Flags, which combines advanced targeting with built-in observability, and guardrails to make rollouts safer and more controlled. Plus, we cover: This Month in Datadog brings you the latest updates on our newest product features, announcements, resources, and events.

View Video

Datadog

Read more about Datadog Feature Flags, track Claude costs, migrate historical logs, and more | This Month in Datadog

Docker Daemon Logs: How to Find, Read, and Use Them

Sep 30, 2025 By Faiz Shaikh In Last9

Sometimes Docker behaves in ways that catch you off guard—containers don’t start as expected, images pause during pull, or networking takes longer than usual to respond. In those moments, the Docker daemon logs are your best reference point. These logs capture exactly what the Docker engine is doing at any given time. They give you a running account of system state, performance signals, and events that help you understand what’s happening beneath the surface.

Read Post

Last9

Read more about Docker Daemon Logs: How to Find, Read, and Use Them

September 2025 Product Updates

Sep 30, 2025 By Leo Baecker In Hyperping

September brings powerful new capabilities that give you more control over your status pages and make migrations smoother than ever. From fully custom domain support to streamlined Teams integration, here's what's new.

Read Post

Hyperping

Read more about September 2025 Product Updates

Node.js Monitoring in Serverless Environments - A Complete Guide

Sep 30, 2025 By Pavithra Parthiban In Atatus

Serverless computing with Node.js is transforming how applications are built and scaled by removing the need to manage servers. However, serverless functions run for short durations and scale dynamically, making traditional monitoring ineffective. Effective monitoring is essential to track performance, detect errors, optimize cold starts, and control costs.

Read Post

Atatus

Read more about Node.js Monitoring in Serverless Environments - A Complete Guide

What's New in InfluxDB 3.5: Explorer Dashboards, Cache Querying, and Expanded Control

Sep 30, 2025 By Peter Barnett In InfluxData

InfluxDB 3.5 is now available for both Core and Enterprise, along with updates to the new Explorer UI that make it easier to save, organize, and query your data. This release highlights the biggest updates since our 3.4 release, including Explorer Dashboards in beta, new cache querying capabilities, and stronger operational tools for managing clusters. InfluxDB 3 Core is free and open source, optimized for recent data, and licensed under MIT and Apache 2.

Read Post

InfluxData

Read more about What's New in InfluxDB 3.5: Explorer Dashboards, Cache Querying, and Expanded Control

How to boost observability ROI with continuous profiling and Grafana Drilldown

Sep 30, 2025 By Joey Bartolomeo In Grafana

For the longest time, observability was centered around logs, metrics, and traces, but the growth of more complex systems has made continuous profiling another essential part of maintaining healthy systems. It provides insights into resource usage and latency down to the code level, delivering key insights to improve performance.

Read Post

Grafana

Read more about How to boost observability ROI with continuous profiling and Grafana Drilldown

Ship features faster and safer with Datadog Feature Flags

Sep 30, 2025 By Paige Andrews In Datadog

Releasing new features is one of the highest-stakes moments in the software delivery life cycle. Even with CI/CD pipelines in place, plenty of things can still go wrong when a feature goes live for actual users. Most feature flagging tools operate in isolation from important observability tooling, forcing engineers to monitor changes across multiple disconnected systems to fully understand their impact. This slows down development and increases the chance of missing critical issues.

Read Post

Datadog

Read more about Ship features faster and safer with Datadog Feature Flags

Maximizing ROI of Microsoft SCOM

Sep 29, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

Microsoft System Center Operations Manager (SCOM) is a robust, enterprise-ready platform that provides rich visibility into IT environments. From infrastructure and databases to enterprise applications and cloud services, SCOM delivers comprehensive monitoring capabilities out of the box.

Read Post

NiCE IT Mgmt

Read more about Maximizing ROI of Microsoft SCOM

Build on Your Microsoft SCOM Foundation

Sep 29, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

Enterprises that rely on Microsoft System Center Operations Manager (SCOM) as their monitoring backbone often share an everyday reality: the bigger the environment, the bigger the challenges. Noisy alert storms can bury critical issues. Management Packs (MPs) require ongoing care and expertise to deliver accurate insights. And without consistent reporting, teams risk slipping into reactive fire-fighting instead of strategic monitoring.

Read Post

NiCE IT Mgmt

Read more about Build on Your Microsoft SCOM Foundation

How GenAI Is Empowering Elastic Workforce

Sep 29, 2025 By Elastic In Elastic

With over 10,000 questions answered and a 99% satisfaction rate in just 90 days, ElasticGPT, our internal generative AI assistant built on Elastic’s Search AI Platform, is transforming how our teams find information, make decisions, and complete day-to-day tasks. Matt Minetola, CIO, explains how ElasticGPT helps employees access company knowledge faster using natural language queries. Learn how we’re using retrieval augmented generation (RAG) and a secure, scalable architecture to deliver trusted, real-time AI experiences across the organization.

View Video

Elastic

Read more about How GenAI Is Empowering Elastic Workforce

How We Built VictoriaLogs Cluster: A CTO's

Sep 29, 2025 By VictoriaMetrics In VictoriaMetrics

Go behind the scenes with the VictoriaMetrics team! In this special talk, Marc Sherwood is joined by our CTO, Alexander Marshalov, to explore our powerful, open-source logging solution, VictoriaLogs. This isn't just a feature showcase. This is a deep dive into the engineering mindset that drives our development. Alexander shares firsthand insights into why we built VictoriaLogs Cluster, the technical challenges of creating a distributed system for logs, and the core principles of simplicity and efficiency that guide our architecture.

View Video

VictoriaMetrics

Monitoring

Read more about How We Built VictoriaLogs Cluster: A CTO's

Telemetry Now Teaser: "Turning Network Telemetry Into Financial Insight"

Sep 29, 2025 By Kentik In Kentik

Network operators prioritize cost, performance, security, and reliability as their core foundational needs. But how do they get the economic data to make tradeoffs when one of these needs suffers? Tune into the latest Telemetry Now with special guest Lauren Basile to learn how Kentik Traffic Costs is providing data-backed answers to these questions.

View Video

Kentik

Read more about Telemetry Now Teaser: "Turning Network Telemetry Into Financial Insight"

Why Does Your Node.js App Crash in Production and How Can You Fix it?

Sep 29, 2025 By Pavithra Parthiban In Atatus

Node.js has become one of the most popular platforms for building scalable and high-performance web applications. Its event-driven, non-blocking I/O model allows developers to efficiently handle thousands of concurrent connections with minimal overhead. However, many businesses still face a critical challenge, Node.js applications often crash unexpectedly in production environments, causing downtime, lost revenue, and damage to brand reputation.

Read Post

Atatus

Read more about Why Does Your Node.js App Crash in Production and How Can You Fix it?

Improving Browser Tracing Step by Step

Sep 29, 2025 By Sentry In Sentry

The JavaScript SDK team has been cooking hard lately. In this video I'll show you all the browser tracing improvements the team has shipped recently.

View Video

Sentry

Monitoring

Read more about Improving Browser Tracing Step by Step

Essential network reports for IT teams

Sep 29, 2025 By ManageEngine Site24x7 In Site24x7

Stay ahead of network issues before they impact your users. In this video, we explore Site24x7’s reports for your monitored networks. Here’s what you’ll learn: Next up: We’ll cover NetFlow and network configuration management reports.

View Video

Site24x7

Read more about Essential network reports for IT teams

Best Web Transaction Monitoring Tools in 2025

Sep 29, 2025 By Hasan Wajahat In Sematext

Websites are no longer static pages. They’re dynamic, transaction-heavy ecosystems where every click, form submission, and login matters. Whether you’re in e-commerce, SaaS, or finance, transaction failures can lead to revenue loss, frustrated customers, and even damage to your brand. That’s where web transaction monitoring tools come in — a critical component to make sure every interaction goes smoothly.

Read Post

Sematext

Read more about Best Web Transaction Monitoring Tools in 2025

What is AI-Native Monitoring? The Complete Guide for Developers

Sep 29, 2025 By Sarah Morgan In Scout

Before we talk about AI-native monitoring, let’s take a quick step back to make sure everyone is on the same page. In software engineering, monitoring is the continuous collection and analysis of data about a system’s health, performance, and behavior. Tools like Scout Monitoring, Datadog, and New Relic traditionally track server uptime, request latency, error rates, and database performance.

Read Post

Scout

Read more about What is AI-Native Monitoring? The Complete Guide for Developers

Top 11 Java APM Tools: A Comprehensive Comparison

Sep 29, 2025 By Anjali Udasi In Last9

Are your Java applications running at their optimal performance, or is there room for improvement to make them faster and more efficient? With so many services depending on Java, keeping applications responsive and reliable is a core part of modern software engineering. This blog walks you through the leading Java Application Performance Monitoring (APM) tools, with a clear comparison to help you choose the right option for your needs.

Read Post

Last9

Read more about Top 11 Java APM Tools: A Comprehensive Comparison

13 Best Windows Monitoring Tools in 2025

Sep 29, 2025 By Costas Pipilas In Sematext

It’s 2 AM, and your phone buzzes with an urgent alert—your primary server application is down, and users are flooding the support channels with complaints. As you dive into the logs, the cause is elusive, buried somewhere in the sea of system events. Is it a rogue service eating up memory? A failing disk? Or a network bottleneck? Without powerful Windows monitoring tools, you’re left troubleshooting in the dark.

Read Post

Sematext

Read more about 13 Best Windows Monitoring Tools in 2025

The telemetry time bomb - and what to do about it

Sep 29, 2025 By Bill Emmett In Cribl

Telemetry data is growing at an average of 29% a year — doubling costs every 18 months. That’s putting pressure on ITOps budgets, observability platforms, SecOps teams, and SIEM deployments alike. In this post, we’ll explore how unchecked data volumes, siloed tools, and aging architectures are creating a telemetry cost crunch that limits visibility, slows both troubleshooting and threat detection, and impacts business outcomes.

Read Post

Cribl

Read more about The telemetry time bomb - and what to do about it

Uptrends brings synthetic monitoring to DevOps

Sep 29, 2025 By Uptrends In Uptrends

Automated, codified, Linux-ready. For DevOps and Platform Engineers who need monitoring that ships with infrastructure.

Read Post

Uptrends

Read more about Uptrends brings synthetic monitoring to DevOps

Model your architecture with custom entities in the Datadog Software Catalog

Sep 29, 2025 By Dan Green In Datadog

Every software organization has its own unique architecture and workflows. Beyond services and APIs, teams rely on internal libraries, CI/CD jobs, data pipelines, AI agents, and more to keep systems running smoothly. But as architectures grow more complex and interconnected, it can become difficult to keep track of all the structural dependencies and interactions in one place.

Read Post

Datadog

Read more about Model your architecture with custom entities in the Datadog Software Catalog

Say hello to one-click call quality and SBC data correlation for Teams Phone

Sep 28, 2025 By Sara Purdon In Martello Technologies

For IT teams managing Teams Phone performance, the SBC is kind of a ‘last frontier’. Issues that occur there are hard to track down. Critical information slips out of view or can’t easily be associated with what users are experiencing. Our latest update to Vantage DX closes the Microsoft Teams Phone SBC monitoring visibility gap with automated correlation of SBC records and Teams Phone quality data — the first solution in the industry to do so.

Read Post

Martello Technologies

Read more about Say hello to one-click call quality and SBC data correlation for Teams Phone

OpenMetrics vs OpenTelemetry - A guide on understanding these two specifications

Sep 28, 2025 By Bhupesh Varshney In SigNoz

OpenMetrics and OpenTelemetry are popular standards for instrumenting cloud-native applications. Both projects are part of the Cloud Native Computing Foundation (CNCF) and aim to simplify how we generate, collect and monitor services in a modern cloud-native distributed application environment. Let's have a look at how both the standards are aiming to help solve the observability conundrum.

Read Post

SigNoz

Read more about OpenMetrics vs OpenTelemetry - A guide on understanding these two specifications

Grafana Campfire - UI Extensions: Enabling Cross-App Workflows (Grafana Community Call -Sept 2025)

Sep 27, 2025 By Grafana In Grafana

In this upcoming Grafana Campfire Community Call, we will talk about the Grafana UI Extensions, where we will discuss how the Framework enables plugins to interoperate by adding links, components, or functions into defined places (extension points) in Grafana. We will talk about (but not limited to):What are UI Extensions and where you find the resourcesHow it can be leveraged to deliver fun new featuresAdding custom actions, links, and components to various parts in UI and much more.....

View Video

Grafana

Read more about Grafana Campfire - UI Extensions: Enabling Cross-App Workflows (Grafana Community Call -Sept 2025)

How to Become an SRE Engineer

Sep 27, 2025 By Alexandr Bandurchin In Uptrace

Site Reliability Engineering has emerged as one of the most sought-after careers in tech, combining software engineering expertise with operational excellence. SRE engineers ensure that critical systems remain reliable, scalable, and performant while enabling rapid feature development. With the global SRE job market projected to grow by over 25% in 2025, skilled professionals in this field command competitive salaries and enjoy diverse career opportunities across industries.

Read Post

Uptrace

Read more about How to Become an SRE Engineer

Synthetic Monitoring from Multiple Locations: Where to Run Tests (and Why It Matters)

Sep 26, 2025 By Dotcom-Monitor In Dotcom-Monitor

Most organizations think of monitoring as a checkbox: set it up once, confirm that it runs, and move on. If the tool says the website is “up,” then the job is done, right? Not quite. The truth is that where you run synthetic monitoring tests from can be just as important as the tests themselves. Synthetic monitoring works by simulating user actions from pre-defined probes or agents. Those probes might live in a cloud data center, a mobile network, or even inside a corporate office.

Read Post

Dotcom-Monitor

Read more about Synthetic Monitoring from Multiple Locations: Where to Run Tests (and Why It Matters)

Harnessing AppNeta's Browser- and HTTP-based Workflows to Track User Experience

Sep 26, 2025 By Alec Pinkham In Broadcom

These days, maintaining uptime of your servers and other infrastructure elements remains as critical as ever—but it’s not enough. Quite simply, even the best server reliability metrics won’t mean a thing if the user experience is poor. What truly matters is understanding the service levels your users experience, whether they’re accessing apps through a web browser or interacting with API-based services.

Read Post

Broadcom

Read more about Harnessing AppNeta's Browser- and HTTP-based Workflows to Track User Experience

Defining the Network Engineer of Tomorrow

Sep 26, 2025 By Yann Guernion In Broadcom

A little while ago, I wrote a piece with the provocative title, "The End of the Network Engineer as We Know It?" It struck a chord because it articulated a shift many of us feel in our bones: the ground is moving beneath our feet. The traditional, well-defined corporate network has dissolved into a sprawling, borderless ecosystem of public clouds, SaaS platforms, and the vast, untamed internet. The old role, focused on the care and feeding of devices within our four walls, is no longer sufficient.

Read Post

Broadcom

Read more about Defining the Network Engineer of Tomorrow

Grafana Labs Co-founder Woods: Market maturity, OpenTelemetry, and AI are reshaping observability

Sep 26, 2025 By Mikhail Kho In Grafana

As organizations navigate increasingly complex tech environments, unified observability practices have become essential. That was one of the main takeaways from Grafana Labs Co-founder Anthony Woods’ recent appearance on “Tech Keys by by Mercari India,” a podcast hosted by Vaibhav Khurana, Head of Platform Engineering at Mercari India.

Read Post

Grafana

Read more about Grafana Labs Co-founder Woods: Market maturity, OpenTelemetry, and AI are reshaping observability

Lighting up your dashboards: How to visualize the CheerLights IoT project in Grafana Cloud

Sep 26, 2025 By Simon Prickett In Grafana

I recently joined the Developer Advocacy team here at Grafana Labs, and have been exploring ways to accelerate my Grafana learning journey. Like many others in the Grafana community, my introduction to the open source project happened when I needed a way to easily visualize data that resided in external databases, mostly using SQL queries.

Read Post

Grafana

Read more about Lighting up your dashboards: How to visualize the CheerLights IoT project in Grafana Cloud

Inside InfluxDB 3's Diskless Architecture

Sep 26, 2025 By InfluxData In InfluxData

InfluxDB 3 Enterprise uses a cloud-native, diskless architecture to eliminate traditional storage limits. Its stateless design simplifies operations, delivers instant failover with zero data loss, and lets you scale compute and storage independently to petabyte levels without re-architecting your system.

View Video

InfluxData

Read more about Inside InfluxDB 3's Diskless Architecture

Why Healthcare CIOs Are Becoming Transformation Leaders, Not Just Tech Leaders

Sep 26, 2025 By Sofia Burton In LogicMonitor

The role of the healthcare CIO looks nothing like it did a decade ago. Running the EHR, keeping infrastructure online, and managing vendor contracts are still table stakes, but they’re no longer the whole story. Today’s healthcare CIOs are being asked to do something far bigger: lead enterprise-wide transformation.

Read Post

LogicMonitor

Read more about Why Healthcare CIOs Are Becoming Transformation Leaders, Not Just Tech Leaders

Monitor Kubernetes Hosts with OpenTelemetry

Sep 26, 2025 By Anjali Udasi In Last9

It’s 3 AM. API latency just spiked from 200ms to 2s. Alerts are firing, and users are frustrated. You SSH into the first server: top, free -h, iostat — nothing unusual. On to the next host. And the next. That’s how most of us learned to debug. The tools worked, and we got good at using them. But as infrastructure became distributed and dynamic, this approach started to break down. Modern monitoring needs more than SSH and top. It needs unified telemetry.

Read Post

Last9

Read more about Monitor Kubernetes Hosts with OpenTelemetry

Introducing Catchpoint Session Replay: See Digital Experience Through Your Users' Eyes

Sep 26, 2025 By Piril Kavlak In Catchpoint

When was the last time you really saw what your customers experience on your site? We're excited to introduce Session Replay, a new capability in our Internet Performance Monitoring (IPM) platform that lets you step directly into the user's journey. Session Replay is so much more than a platform upgrade. It’s an opportunity to understand, fix, and even prevent the issues that lead to churn, missed conversions, and frustrated users, all from their point of view.

Read Post

Catchpoint

Read more about Introducing Catchpoint Session Replay: See Digital Experience Through Your Users' Eyes

Integrations Overview

Sep 26, 2025 By Uptime Website Monitoring In uptime

This video provides a detailed tour of our integrations, including how to set up automated email, SMS, and phone call alerts. Learn how to connect with various trusted tools, tailor your alerts to your team's needs, and pass key data between Uptime.com and your favorite applications. Discover how to add and manage new integrations, create dedicated contacts, and assign integrations to specific checks. We also introduce our Zapier partnership, enabling connections to over 8000 additional services.

View Video

uptime

Read more about Integrations Overview

k8s-monitoring-helm Chart Office Hours (September 2025)

Sep 26, 2025 By Grafana In Grafana

In the September edition of the Kubernetes Monitoring Helm chart office hours, we discuss the version 3.4 and 3.5 releases as well as the plan for upcoming features.

View Video

Grafana

Read more about k8s-monitoring-helm Chart Office Hours (September 2025)

How SSL Certificate Monitoring Prevents Man-in-the-Middle Attacks

Sep 26, 2025 By Simon Rodgers In WebSitePulse

Man-in-the-Middle (MITM) attacks remain one of the most dangerous cybersecurity threats. In these attacks, hackers secretly intercept and sometimes alter communication between two parties. Without proper encryption, sensitive data such as passwords, credit card details, and personal information becomes exposed. SSL/TLS certificates encrypt this communication, preventing unauthorized access. However, certificates can expire, become misconfigured, or become compromised, creating security gaps.

Read Post

WebSitePulse

Read more about How SSL Certificate Monitoring Prevents Man-in-the-Middle Attacks

Building a VictoriaMetrics PaaS: The What, Why, and "Easier Button" - Tech Talk #9

Sep 26, 2025 By VictoriaMetrics In VictoriaMetrics

Ready to tame your monitoring complexity? Join Mathias and Marc for the first episode of our brand-new series dedicated to building a robust, scalable, and user-friendly VictoriaMetrics Platform as a Service (PaaS)! As organizations grow, managing monitoring infrastructure becomes a major challenge. This series provides a practical, step-by-step guide to building your own VictoriaMetrics-based PaaS to reduce developer friction, improve reliability, and save on costs.

View Video

VictoriaMetrics

Monitoring

Read more about Building a VictoriaMetrics PaaS: The What, Why, and "Easier Button" - Tech Talk #9

You First Steps in Sentry - A Hands-On Workshop

Sep 26, 2025 By Sentry In Sentry

It turns out, Sentry does A LOT these days. Errors, Logs, Replays, Traces, and even Agent and MCP monitoring… but all this observability works better with a good foundation. We’re going to step back to 0, and show you how to build a “not bad” Sentry implementation from the ground up.

View Video

Sentry

Monitoring

Read more about You First Steps in Sentry - A Hands-On Workshop

Diagnose slow database queries in Node.js: Why Monitoring is Essential?

Sep 25, 2025 By Mohana Ayeswariya J In Atatus

Node.js is popular for building scalable applications because its non-blocking architecture can handle many requests at once. But when your app depends on a database, performance hinges on how efficiently queries run behind the scenes. Even a single slow database query can block the Node.js event loop, causing delayed responses, frustrated users, and cascading performance issues. Too often, teams only notice these problems after customers experience lag or timeouts.

Read Post

Atatus

Read more about Diagnose slow database queries in Node.js: Why Monitoring is Essential?

How Nexus BMS Uses Time Series and AI to Power Smarter Buildings

Sep 25, 2025 By Charles Mahler In InfluxData

Monitoring equipment isn’t enough for today’s smart buildings; true value comes from being able to predict issues, optimize performance, and take action automatically. Traditional building management systems often fall short, limited to dashboards and alarms that only notify you of an issue after the fact. With the rise of open source hardware, modern databases, and AI-driven diagnostics, facilities can now move from reactive to proactive management.

Read Post

InfluxData

Read more about How Nexus BMS Uses Time Series and AI to Power Smarter Buildings

Session Replay: Becoming your own digital secret shopper

Sep 25, 2025 By Kyle Tryon In Sentry

Retail stores have long relied on a secret weapon to measure and improve the shopping experience: the secret shopper. Posing as ordinary customers, they evaluate the customer experience, spotting friction points like hard-to-find items, gauging the quality of customer service, and testing how seamless the checkout process feels.

Read Post

Sentry

Read more about Session Replay: Becoming your own digital secret shopper

InfluxDB 3 Core: Open Source, Recent-Data Engine

Sep 25, 2025 By InfluxData In InfluxData

Dive into InfluxDB 3 Core, an open source, high-speed recent-data engine. InfluxDB 3 Core is an open source, high-performance real-time data engine (MIT/Apache 2 licensed). It’s built for real-time monitoring, edge data collection and transformation, sensor alerting, and streaming analytics with simplicity and speed.

View Video

InfluxData

Read more about InfluxDB 3 Core: Open Source, Recent-Data Engine

Infinity Data Source Now Supports Auth for Actions | Grafana 12.2

Sep 25, 2025 By Grafana In Grafana

Grafana 12.2 introduces actions authentication with the Infinity data source — giving you more secure and flexible ways to trigger actions. Previously, actions were limited to browser-based HTTP requests subject to CORS. Now you can choose between browser requests or Infinity connections, leveraging preconfigured authentication settings. This update makes actions more powerful and reliable in Grafana 12.2.

View Video

Grafana

Read more about Infinity Data Source Now Supports Auth for Actions | Grafana 12.2

Introducing the UptimeRobot v3 API

Sep 25, 2025 By Tomas Koprusak In Uptime Robot

As you may have noticed, we released the latest version of the UptimeRobot API a few weeks ago. Don’t worry, the v2 will remain available; however, it will no longer receive support or updates. New features will be added only to v3. Built on a RESTful architecture, v3 unlocks more flexibility, cleaner workflows, and expanded capabilities for developers who want tighter control over their monitoring. Below, we’ll highlight what’s new and how it compares to the legacy v2 API.

Read Post

Uptime Robot

Read more about Introducing the UptimeRobot v3 API

New Grafana One-Page Report (Public Preview) | Grafana 12.2

Sep 25, 2025 By Grafana In Grafana

Grafana 12.2 introduces a redesigned reporting feature, now in public preview. The new one-page report creation flow replaces the old multi-step wizard, making it easier and more intuitive to schedule and share insights. You can now: Check out how the new reporting experience simplifies sharing data in Grafana 12.2.

View Video

Grafana

Read more about New Grafana One-Page Report (Public Preview) | Grafana 12.2

Complete Guide to HAProxy Visibility Using Promtail and Loki

Sep 25, 2025 By Benjamin Pitts In MetricFire

HAProxy is the workhorse in front of countless APIs and apps because it’s fast, lean, and flexible. Because it sits on the traffic hot path, it’s also your earliest warning system when something slows down or breaks entirely. This means that monitoring it isn’t optional. You need to see connection queues and retries, per-stage timings, health-check failures, and spikes in error statuses to catch incidents before users do.

Read Post

MetricFire

Read more about Complete Guide to HAProxy Visibility Using Promtail and Loki

Monitor your data pipelines with Airflow lineage

Sep 25, 2025 By Thomas Sobolik In Datadog

In complex data pipelines with dozens of jobs and intermediary datasets, it can be difficult to effectively monitor how data travels and changes through various steps. When tracking issues in these pipelines, you need visibility into upstream components where the root cause may originate from, as well as downstream datasets and consumers of data that may be experiencing further impacts.

Read Post

Datadog

Read more about Monitor your data pipelines with Airflow lineage

Soft navigations: The future of seamless browsing

Sep 25, 2025 By Daniel Zohar In Coralogix

In the ever-evolving world of web standards, a new experimental feature is quietly reshaping how browsers perceive navigation: Soft Navigations. While still in the early stages, this concept has the potential to redefine user experience metrics, improve performance monitoring, and better align browsers with the behavior of modern web applications. Let’s dive into what soft navigations are, why they’re important, and how you can start exploring them today.

Read Post

Coralogix

Read more about Soft navigations: The future of seamless browsing

Securing the Future: Responsible AI on AWS with Sumo Logic -- Customer Brown Bag -- Sept 25th, 2025

Sep 25, 2025 By Sumo Logic, Inc. In Sumo Logic

This session with Moumita Saha, Sr. Security Partner SA – WW Consulting Partners, AWS, and Adam White, Sr. Dir. Technical Marketer at Sumo Logic explores how AWS and Sumo Logic partner to deliver practical strategies for securing generative AI applications, ensuring they remain safe, compliant, and trustworthy.

View Video

Sumo Logic

Read more about Securing the Future: Responsible AI on AWS with Sumo Logic -- Customer Brown Bag -- Sept 25th, 2025

Visualize Jenkins CI/CD Pipelines: Introducing the New Jenkins Data Source Plugin in Grafana 12.2

Sep 25, 2025 By Grafana In Grafana

Grafana 12.2 introduces the new Jenkins data source plugin, giving you real-time insights into your Jenkins CI/CD pipelines. With easy setup, you can connect your Jenkins instance and explore two built-in dashboards: See how Jenkins data becomes instantly actionable inside Grafana.

View Video

Grafana

Read more about Visualize Jenkins CI/CD Pipelines: Introducing the New Jenkins Data Source Plugin in Grafana 12.2

How to analyze observability and monitoring tools for actionability

Sep 25, 2025 By BigPanda In BigPanda

Choosing the right observability tools is critical so ensure your teams get actionable insights. In this video, we explore how to evaluate observability platforms based on their ability to detect anomalies, link causes, and trigger effective responses.

View Video

BigPanda

Read more about How to analyze observability and monitoring tools for actionability

Create New Alert Rules Without PromQL Queries in Grafana 12.2 | Metrics Drilldown

Sep 25, 2025 By Grafana In Grafana

Grafana 12.2 makes alert creation simpler by integrating the Metrics Drilldown app with the Alert Rule Query Editor. Instead of writing PromQL from scratch, you can now use a queryless workflow: explore metrics, add labels, and generate queries directly from Drilldown. This helps teams move faster and makes alerting more accessible for those new to PromQL.

View Video

Grafana

Read more about Create New Alert Rules Without PromQL Queries in Grafana 12.2 | Metrics Drilldown

Grafana 12.2 release: LLM-powered SQL expressions, updates to canvas and table visualizations, simplified reporting, and more

Sep 25, 2025 By Grafana Labs Team In Grafana

Grafana 12.2 has arrived, delivering new features to help you and your team move from data to decisions faster than ever. Grafana 12.2: Download now! Below are just some of the highlights from the latest Grafana release.

Read Post

Grafana

Read more about Grafana 12.2 release: LLM-powered SQL expressions, updates to canvas and table visualizations, simplified reporting, and more

LLM Observability in the Wild - Why OpenTelemetry should be the Standard

Sep 25, 2025 By Pranay Prateek In SigNoz

A few days ago I hosted a live conversation with Pranav, co-founder of Chatwoot, about issues his team was running into with LLM observability. The short version: building, debugging, and improving AI agents in production gets messy fast. There's multiple competing standards for default libraries for LLM observability. And many such libraries like OpenInference which claim to be based on OpenTelemetry don't strictly adhere to it's conventions.

Read Post

SigNoz

Read more about LLM Observability in the Wild - Why OpenTelemetry should be the Standard

CriblCon sneak peek with AlphaSoc

Sep 25, 2025 By Cribl In Cribl

The countdown to is on and we’re giving you an exclusive first look at the expert insights, innovative solutions, and success stories you’ll see on the big stage. Join us as we chat with Chris McNab, Founder of AlphaSOC, a security startup that processes network telemetry to uncover infected hosts, emerging threats, and targeted attacks.

View Video

Cribl

Read more about CriblCon sneak peek with AlphaSoc

Availability Summary Report in Site24x7

Sep 25, 2025 By ManageEngine Site24x7 In Site24x7

Track uptime and downtime at a glance with the Site24x7 Availability Summary Report. In this video, we break down each section of the Availability Summary Report when a single monitor is chosen, including monitor availability, suspension summary, outage details, Mean Time To Repair, Mean Time Between Failures, and location-based metrics. Learn how to use this report to validate downtime, analyze performance trends, and ensure service reliability.

View Video

Site24x7

Monitoring

Read more about Availability Summary Report in Site24x7

MCP Design Principles

Sep 24, 2025 By Honeycomb In Honeycomb

You can give AI agents everywhere fingers & eyes into your tool or service, by implementing an MCP (Model Context Protocol) server. It’s a great idea! It’s also a new kind of design and engineering. Jessica describes how it’s different from implementing an API or a GUI, and why it’s more exciting than either.

View Video

Honeycomb

Read more about MCP Design Principles

Certificate Rotation with Progress-powered Solutions

Sep 24, 2025 By Progress WhatsUp Gold In WhatsUp Gold

Don’t let expired certificates put your organization at risk! Progress WhatsUp Gold makes it easy to discover, manage, and automate certificate lifecycles across your network. With powerful automation from Progress Infrastructure solutions, you can rotate and manage certificates without a manual routine to maintain compliance and security. Schedule updates, push certificates to thousands of nodes and maintain governance with built-in traceability. Experience simplicity, scalability and seamless integration with Progress-powered solutions.

View Video

WhatsUp Gold

Read more about Certificate Rotation with Progress-powered Solutions

Telemetry Now Teaser: "What's the real cost of delivering this traffic?"

Sep 24, 2025 By Kentik In Kentik

Why is it so difficult to answer the age-old question CFOs are asking, "What's the real cost of delivering this traffic?" Complex billing structures and cost modeling are only part of it. Lauren Basile joins Phillip Gervasi to discuss turning network telemetry into financial insight in the latest episode of Telemetry Now.

View Video

Kentik

Read more about Telemetry Now Teaser: "What's the real cost of delivering this traffic?"

Top 3 MSP dashboards compared: SquaredUp, BrightGauge and MSPbots

Sep 24, 2025 By Squared Up In Squared Up

Managed Service Providers (MSPs) live and die by their data. Externally, clients expect clear reporting, fast responses, and visible proof of value. Internally, smooth operations and low overheads are essential to business success. But with so many tools, key data is scattered across multiple systems – PSA, RMM, cloud services, ticketing, monitoring, finance – and that causes blind spots. Dashboards fix this problem by consolidating data into a single view.

Read Post

Squared Up

Read more about Top 3 MSP dashboards compared: SquaredUp, BrightGauge and MSPbots

You don't need a real outage to find your weak spots.

Sep 24, 2025 By Catchpoint In Catchpoint

Modern digital services rely on complex systems, and chaos can strike at any layer. But the most effective teams don’t wait for failure to learn. They simulate it. By introducing controlled performance degradations, you can stress your systems, test your dependencies, and uncover hidden risks without touching production. In our latest webinar, Catchpoint experts walk through how teams are building resilience through proactive, safe failure testing, and why it’s become a cornerstone of digital reliability.

View Video

Catchpoint

Read more about You don't need a real outage to find your weak spots.

Mute timing vs. silences in Grafana Alerting: How to pick the best fit for your use case

Sep 24, 2025 By Kristie Grebe In Grafana

Have you ever been in a situation where know your team is going to run their weekly maintenance window and you silence your notifications to prevent a flood of false positives from pinging your inbox? If you are associated with a team that uses any type of alert system, you know how easily alert fatigue can happen. The incessant and unpredictable (or even, at times, predictable) pings, emails, and notification alerts can drive even the most serene worker totally batty.

Read Post

Grafana

Read more about Mute timing vs. silences in Grafana Alerting: How to pick the best fit for your use case

What is SNMP Trap: Real-Time Alerts for Network Monitoring

Sep 24, 2025 By Jan Schuppik In Icinga

Why wait for the next poll? An SNMP trap is a real-time alert sent from a device to a monitoring system, without waiting for polling. Ever had a router die silently at 3 AM while your monitoring system was still polling away every 5 minutes? Yeah… not fun. That’s where SNMP traps step in. Think of them as the push notifications of network monitoring: instant, lightweight, and sometimes misunderstood.

Read Post

Icinga

Read more about What is SNMP Trap: Real-Time Alerts for Network Monitoring

Distinct Value Cache in InfluxDB 3

Sep 24, 2025 By InfluxData In InfluxData

The Distinct Value Cache in InfluxDB 3 speeds up metadata queries and tag value lookups for faster, more responsive UIs. The Distinct Value Cache in InfluxDB 3 delivers sub-30 ms lookups for tag values and series metadata, making exploratory queries and UI dropdowns quick and responsive. By reducing latency on these common operations, it allows developers to build real-time monitoring and analytics tools without extra complexity.

View Video

InfluxData

Read more about Distinct Value Cache in InfluxDB 3

Monitor and optimize your systems with Uptrace

Sep 24, 2025 By Uptrace In Uptrace

Uptrace is your single source of truth for monitoring, understanding, and optimizing complex distributed systems. Proven in production for over five years and trusted by more than a thousand installations worldwide, it lets you see your system like never before. What makes the difference is that Uptrace is pure OpenTelemetry, built natively from day one. This isn't a translation layer—it's a direct connection that eliminates friction and ensures zero vendor lock-in. Your homepage serves as your command center, providing complete visibility across your stack at a glance.

View Video

Uptrace

Read more about Monitor and optimize your systems with Uptrace

How to Push Prometheus Metrics to Splunk Observability Cloud with the OpenTelemetry Collector

Sep 24, 2025 By Splunk In Splunk

In this video, you’ll learn how to scrape Prometheus endpoints with the OpenTelemetry Collector’s Prometheus receiver and send metrics to Splunk Observability Cloud. We’ll walk through configuring three common data sources (a Python Flask app, node_exporter for host metrics, and the NGINX Prometheus exporter), show how to enrich metrics with resource attributes, and build simple charts in Splunk Observability Cloud. You’ll see how centralized scraping and consistent tagging make it easy to manage and visualize Prometheus metrics in Splunk Observability Cloud.

View Video

Splunk

Read more about How to Push Prometheus Metrics to Splunk Observability Cloud with the OpenTelemetry Collector

An overview of Context Propagation in OpenTelemetry

Sep 24, 2025 By Muskan Paliwal In SigNoz

To effectively manage modern applications, you need to understand how they work on the inside. Distributed tracing is the key to this, providing a detailed picture of a request's journey across every service. OpenTelemetry has emerged as the industry-standard framework for implementing tracing and achieving true observability in complex, distributed systems. In this article, we embark on a journey to explore the core concept of context propagation within Open Telemetry.

Read Post

SigNoz

Read more about An overview of Context Propagation in OpenTelemetry

Kubernetes monitoring explained: Key metrics, labels, and best practices

Sep 24, 2025 By ManageEngine Site24x7 In Site24x7

Monitoring Kubernetes and containers doesn’t have to be overwhelming. In this video, we’ll break down the essential metrics you need to track, why labels are critical for container visibility, and the best practices for Kubernetes monitoring at scale. You’ll learn: How tools like Site24x7 simplify Kubernetes monitoring with auto-discovery, dashboards, anomaly detection, and forecasting. Whether you’re a DevOps engineer, SRE, or developer, this video gives you the practical knowledge to improve container monitoring and observability.

View Video

Site24x7

Read more about Kubernetes monitoring explained: Key metrics, labels, and best practices

Creating and using a Network Discovery Profile in Site24x7

Sep 24, 2025 By ManageEngine Site24x7 In Site24x7

Learn how to create and use a Discovery Profile in Site24x7 to simplify and automate network device onboarding. In this video, we walk you through setting up discovery parameters, applying filters and thresholds, grouping and tagging devices, configuring alerts, integrating with ITSM and collaboration tools, and scheduling periodic rediscovery. Whether you're managing a single site or multiple customer environments, Discovery Profiles help you.

View Video

Site24x7

Monitoring

Read more about Creating and using a Network Discovery Profile in Site24x7

Building and Monitoring AI Agents and MCP servers

Sep 24, 2025 By Sentry In Sentry

Whether you’re building agents in your applications, or standing up an MCP server because its the new cool thing, working with AI is just different. Trying to figure out why it does the weird things it does is hard.

View Video

Sentry

Read more about Building and Monitoring AI Agents and MCP servers

Creating a Sustainable Open Source Business Model - Introduction

Sep 24, 2025 By Jean-Jerome Schmidt-Soisson In VictoriaMetrics

Open source defies everything you’ve ever heard or learned about business before (author’s quote). Yes, open source software has been around since the 90s, but there’s still little else like it. If anything, as time has gone on, we’ve added adjacent concepts like “open core” and “source available” that have added complexity to a model that isn’t that straight forward to grasp to begin with. VictoriaMetrics is an open source company.

Read Post

VictoriaMetrics

Read more about Creating a Sustainable Open Source Business Model - Introduction

How to Responsibly and Effectively Contribute to Open Source Using AI

Sep 24, 2025 By Tyler Helmuth In Honeycomb

With the influx of AI tooling, it’s never been easier to contribute to open source communities. These tools are capable of gathering context quickly, “understanding” repositories faster than ever before. They provide instant summaries about repositories that, previously, would have meant reading lines and lines of code. They can fix bugs in programming languages you don’t know, and ultimately allow more contributors to get involved, which (almost) every open source project wants.

Read Post

Honeycomb

Read more about How to Responsibly and Effectively Contribute to Open Source Using AI

Two Decades of Microsoft SCOM & Monitoring Expertise

Sep 23, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

In today’s complex IT environments, reliable monitoring isn’t optional — it’s essential. From critical infrastructure in Government & Defense to highly regulated sectors like Healthcare, Energy, and Finance, organizations worldwide trust NiCE to deliver secure, future-ready monitoring solutions.

Read Post

NiCE IT Mgmt

Read more about Two Decades of Microsoft SCOM & Monitoring Expertise

Memory stall: the agony before OOM

Sep 23, 2025 By Nikolay Sivko In Coroot

When we set a memory limit for a container, the expectation is simple: if the app leaks memory, the OOM killer steps in, the container dies, Kubernetes restarts it, done. But reality is messier. As a container gets close to its memory limit, allocations don’t just fail instantly. They get slower. The kernel tries to reclaim memory inside the cgroup, and that takes time. Instead of being killed right away, your app just crawls.

Read Post

Coroot

Read more about Memory stall: the agony before OOM

The Architecture of Digital Resilience: IT Leadership in the New Physics of IT

Sep 23, 2025 By ScienceLogic In ScienceLogic

The New Physics of IT has revealed its second law: resilience isn’t reactive. It is the architecture of leadership itself.

Read Post

ScienceLogic

Read more about The Architecture of Digital Resilience: IT Leadership in the New Physics of IT

Key APM Metrics You Must Track

Sep 23, 2025 By Anjali Udasi In Last9

Application Performance Monitoring (APM) helps you understand how your software runs in production. When you track the right metrics, you see how requests move through your system, where slowdowns happen, and how resources are being used. With this knowledge, you can spot issues early and keep your applications reliable for your users. In this blog, we discuss the key APM metrics to monitor, grouped into categories, and why each one matters for performance and user experience.

Read Post

Last9

Read more about Key APM Metrics You Must Track

How to perform real-time DNS monitoring in Grafana Cloud

Sep 23, 2025 By Bukola Ayodele In Grafana

When DNS or domain name server resolution processes fail, or become sluggish, users can experience timeouts, connection errors, and degraded performance — often without clear indication of the root cause. This is where DNS checks in Grafana Cloud Synthetic Monitoring come in, allowing you proactively monitor domain name resolution, verify that domains resolve to the correct IP address, and even measure how quickly that resolution occurs.

Read Post

Grafana

Read more about How to perform real-time DNS monitoring in Grafana Cloud

Node.js Event Loop: Why Monitoring Matters

Sep 23, 2025 By Mohana Ayeswariya J In Atatus

Node.js has become a cornerstone for modern application development because of its non-blocking and asynchronous architecture. According to Stack Overflow Developer Survey, Node.js remains among the most widely used technologies for web applications, powering millions of services globally. While this event-driven model provides scalability and efficiency, it also introduces challenges.

Read Post

Atatus

Read more about Node.js Event Loop: Why Monitoring Matters

Beyond Automation: The Rise of Agentic Networks

Sep 23, 2025 By Justin Ryburn In Kentik

Agentic AI is the next evolution in network management, moving beyond simple automation to intelligent systems that can reason, plan, and act autonomously. Justin Ryburn, Kentik Field CTO, highlights how this shift automates expertise, enables proactive problem-solving, and empowers human engineers for strategic innovation.

Read Post

Kentik

Read more about Beyond Automation: The Rise of Agentic Networks

InfluxDB 3 Enterprise: Deploy Your Way, Scale on Demand

Sep 23, 2025 By InfluxData In InfluxData

InfluxDB 3 Enterprise is engineered for performance and designed for flexibility, delivering high-scale, production-ready time series data management with operational simplicity. InfluxDB 3 Enterprise is built on a cloud-native, diskless architecture that removes the limits of traditional storage. It’s easy to deploy, scales effortlessly, and eliminates the complexity of managing clusters so you can deploy your way and meet the unique demands of your environment.

View Video

InfluxData

Read more about InfluxDB 3 Enterprise: Deploy Your Way, Scale on Demand

Automate Your Infrastructure Analysis with Scheduled AI Reports

Sep 23, 2025 By Shyam Sreevalsan In netdata

The least exciting part of an operations or SRE role is often the manual, repetitive task of generating reports. It’s the Monday morning scramble to summarize weekly infrastructure health for the team, or the end-of-quarter push to build a capacity planning document. This is boilerplate work that pulls you away from critical engineering tasks. We believe that if a process is repeatable, it should be automated. That’s why we’re introducing Scheduled AI Investigations and Insights.

Read Post

netdata

Read more about Automate Your Infrastructure Analysis with Scheduled AI Reports

Building Real-Time Data Pipelines with Kafka, Telegraf, and InfluxDB 3

Sep 23, 2025 By Suyash Joshi In InfluxData

When milliseconds matter and data never stops flowing, you need a pipeline that can handle high-velocity streaming data with reliability and scale. The modern streaming stack of Kafka, Telegraf, and InfluxDB 3 Core delivers exactly that. To give you a concrete example, this blog works with a fictitious use case: “Papa Giuseppe’s Pizzeria.” Every oven, prep station, and order in this pizza restaurant generates data. Our workflow looks like this.

Read Post

InfluxData

Read more about Building Real-Time Data Pipelines with Kafka, Telegraf, and InfluxDB 3

MCP Server Monitoring Has Shipped!

Sep 23, 2025 By Sentry In Sentry

See how you can started using MCP Server Monitoring against your MCP Servers REALLY fast using using Sentry. One line to wrap the code, and you're good to go. Check out this video where we set it up using Vercel's mcp-handler in Next.js.

View Video

Sentry

Read more about MCP Server Monitoring Has Shipped!

Sentry AI code review, now in beta: break production less

Sep 23, 2025 By Lindsay Piper In Sentry

This could’ve been prevented. This should have been prevented. This too. We all hate getting tagged in PRs. The time, the blame for when you inevitably miss something, and constant “I wouldn’t have written it that way” feeling is just hard to shake. LLMs promised this would get easier. Promised they would do it for us. But as we’ve seen, we’re not there yet. But this is what Sentry does for a living. We catch bugs… in prod.

Read Post

Sentry

Read more about Sentry AI code review, now in beta: break production less

Your Next Observability RFP is All Wrong. Why AI Changes Everything

Sep 23, 2025 By Asaf Yigal In logz.io

AI-first observability addresses two of the most pressing troubleshooting challenges: complex IT environments and AI-generated code. But understanding how to implement AI in a way that brings ROI, requires cutting through the hype and maintaining realistic expectations, while keeping a forward-thinking vision. In this blog post, we bring practical tips for including AI in your next observability RFP. The article is based on a webinar held with Logz.io founders, CEO Tomer Levy and CTO Asaf Yigal.

Read Post

logz.io

Read more about Your Next Observability RFP is All Wrong. Why AI Changes Everything

New Relic's CCU-based pricing is creating unpredictable costs, pushing teams to sample heavily

Sep 23, 2025 By Anushka Karmakar In SigNoz

We talked to 7 companies in August 2025 who were looking to switch from New Relic. One engineering director said they're paying $1,000 a month and only ingesting 10% of their traces. Teams are defaulting to aggressive sampling, some at 1%, others at 10%, to manage costs.

Read Post

SigNoz

Read more about New Relic's CCU-based pricing is creating unpredictable costs, pushing teams to sample heavily

OpenTelemetry and Jaeger | Key Features & Differences [2025]

Sep 23, 2025 By Ankit Anand In SigNoz

OpenTelemetry is a broader, vendor-neutral framework for generating and collecting telemetry data (logs, metrics, traces), offering flexible backend integration. Jaeger, on the other hand, is focused on distributed tracing in microservices. Earlier Jaeger had its own SDKs based on OpenTracing APIs for instrumenting applications, but now Jaeger recommends using OpenTelemetry instrumentation and SDKs. Warning The original Jaeger client SDKs (based on OpenTracing) are archived and no longer maintained.

Read Post

SigNoz

Read more about OpenTelemetry and Jaeger | Key Features & Differences [2025]

ICMP Monitoring: What Is ICMP & How It Works

Sep 23, 2025 By Alyssa Lamberti In Obkio

Ever “pinged” a server and wondered what those milliseconds actually mean? If you’re a network admin or IT pro, you already use ping as a quick sniff test. But ICMP is more than a green checkmark or a scary timeout. In this article, we’ll define ICMP, walk through how echo requests and replies work, and show how to turn basic pings into useful network and ICMP monitoring.

Read Post

Obkio

Read more about ICMP Monitoring: What Is ICMP & How It Works

Analyze alert rules and reduce alert fatigue with Last9 MCP

Sep 23, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

Learn more: https://last9.io/mcp/
Get started: https://last9.io/docs/mcp/

View Video

Last9

Read more about Analyze alert rules and reduce alert fatigue with Last9 MCP

Introducing SquaredUp for MSPs: Dashboards That Do More

Sep 23, 2025 By SquaredUp In Squared Up

Introducing SquaredUp for MSPs: Dashboards That Do More.

View Video

Squared Up

Read more about Introducing SquaredUp for MSPs: Dashboards That Do More

10 Best Practices for Proactive Database Performance Monitoring to Prevent Downtime

Sep 23, 2025 By Pavithra Parthiban In Atatus

Databases are the core of modern applications, whether it is an e-commerce platform, a banking system, or a social media app. Slow database performance or unexpected downtime can cause serious problems, from lost revenue to poor customer experience. Proactive database performance monitoring helps teams identify issues before they escalate. Unlike reactive monitoring, which only addresses problems after they occur, proactive monitoring ensures your database remains fast, stable, and reliable.

Read Post

Atatus

Read more about 10 Best Practices for Proactive Database Performance Monitoring to Prevent Downtime

Zooplus Found Faster Root Cause Detection with Elastic Observability

Sep 23, 2025 By Elastic In Elastic

Zooplus Platform Engineering Lead Aram Hakobayan shares how Elastic Observability helps manage 3,000+ microservices and 15,000+ logs/sec across their AWS cloud. Learn how Elastic powers their French market, centralizes monitoring, simplifies root cause analysis, and avoids costly vendor migration. Ideal for DevOps, SREs, and cloud architects scaling fast.

View Video

Elastic

Read more about Zooplus Found Faster Root Cause Detection with Elastic Observability

Sponsored Post

Addressing Key Challenges in IBM Db2 Monitoring

Sep 22, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

From Performance Bottlenecks to High-Availability Failures and Alert Noise.

Read Post

NiCE IT Mgmt

Read more about Addressing Key Challenges in IBM Db2 Monitoring

The one where we talk about Cribl Guard

Sep 22, 2025 By Cribl In Cribl

Manual hunts for sensitive data are slow, error-prone, and expensive. Cribl Guard combines advanced AI with a human-in-the-loop control point to spot sensitive data, such as credit card, passport, and Social Security numbers, as it flows through Cribl Stream. Whether you’re fully cloud or hybrid, Cribl Guard puts you firmly in control of every piece of sensitive information that crosses your pipes.

View Video

Cribl

Read more about The one where we talk about Cribl Guard

AI in Server Monitoring: Why Human Context Still Matters in 2025

Sep 22, 2025 By Rebecca Grassing In Auvik

When Microsoft rolled out Windows Server 2025 last November, it marked a turning point in how IT teams think about monitoring. Suddenly, AI-powered features like anomaly detection, predictive resolution, and even self-healing aren’t ideas on a roadmap — they’re built into the very fabric of enterprise infrastructure.

Read Post

Auvik

Read more about AI in Server Monitoring: Why Human Context Still Matters in 2025

Reddit to Reality: Top 7 Omnissa Horizon Performance Issues and Fixes

Sep 22, 2025 By Gita Rao Prasad In eG Innovations

Slow logons, laggy VDI sessions, and poor Horizon performance are common pain points IT admins face frequently. When Omnissa Horizon environments slow down, both end-users and IT teams feel the pressure—users grow frustrated while admins struggle to troubleshoot without complete visibility. To uncover the real-world Horizon issues admins face, we turned to Reddit forums like r/VMwareHorizon, r/sysadmin, and r/Citrix, where IT professionals openly share their Horizon troubleshooting struggles.

Read Post

eG Innovations

Read more about Reddit to Reality: Top 7 Omnissa Horizon Performance Issues and Fixes

Burndown and burnup: Two charts every engineering dashboard needs

Sep 22, 2025 By Blog In Squared Up

As engineering organizations scale, project visibility becomes a real challenge. Engineering managers lose track of what's actually happening across multiple teams. Executives ask "are we on track?" and get conflicting answers. Status meetings multiply but clarity doesn't improve. The root problem isn't lack of data, modern engineering teams generate tons of project information across JIRA, GitHub, CI/CD pipelines, and project management tools.

Read Post

Squared Up

Read more about Burndown and burnup: Two charts every engineering dashboard needs

How to Connect Jaeger with Your APM

Sep 22, 2025 By Anjali Udasi In Last9

Microservices make it tough to understand how applications behave end-to-end. Most teams already rely on an Application Performance Monitoring (APM) tool to track system health. But as requests move across many services, you also need distributed tracing. Jaeger gives you that visibility. The real value comes from connecting the two. Instead of running APM and Jaeger in silos, you can combine their strengths, metrics from your APM, and traces from Jaeger, to get a clearer view of performance.

Read Post

Last9

Read more about How to Connect Jaeger with Your APM

Integrating JMX and OpenTelemetry

Sep 22, 2025 By Alex Boten In Honeycomb

The OpenTelemetry community and the contributors to the Java Special Interest Group (SIG) have spent a great deal of time integrating core Java technologies into the project. An integration that is particularly useful is Java Management Extensions (JMX). It has been around since J2SE 5, and has been mature for some time. Many of the most widely used Java applications have adopted it over time and support this extension.

Read Post

Honeycomb

Read more about Integrating JMX and OpenTelemetry

Introducing Request Mirror: a free micro-service to reflect HTTP requests

Sep 22, 2025 By Freek Van der Herten In Oh Dear

We have launched Request Mirror, a little free service to reflect HTTP requests. We've also open-sourced it: you can read the code in the ohdearapp/request-mirror.ohdear.app repo on GitHub. In this blog post I'd like to explain why we built it and how you can use it.

Read Post

Oh Dear

Read more about Introducing Request Mirror: a free micro-service to reflect HTTP requests

From Shadow AI to Strategy: The Six-Month AI Imperative (w/ Charlene Li)

Sep 22, 2025 By Nexthink In Nexthink

In this very special episode of The DEX Show, we welcome back one of the world’s most influential voices on digital transformation and the future of AI leadership: Charlene Li. Charlene is a bestselling author and trailblazing thinker who has helped leaders navigate disruption for over two decades. She returns to the show for an unmissable conversation on the realities of AI Transformation—and what it means for organizations, leaders, and employees at every level.

View Video

Nexthink

Read more about From Shadow AI to Strategy: The Six-Month AI Imperative (w/ Charlene Li)

OpenTelemetry Exporters - Types and Configuration Steps

Sep 22, 2025 By Favour Daniel In SigNoz

In this post, we will talk about OpenTelemetry exporters. OpenTelemetry exporters help in exporting the telemetry data collected by OpenTelemetry. OpenTelemetry frees you from any kind of vendor lock-in by letting you export the collected telemetry data to any backend of your choice. In modern distributed systems, efficiently collecting, transmitting, and analyzing telemetry data from diverse sources poses a significant challenge.

Read Post

SigNoz

Read more about OpenTelemetry Exporters - Types and Configuration Steps

Grafana & Friends Stockholm meetup at 0+X

Sep 22, 2025 By Grafana In Grafana

In this talk, we’ll introduce the Kafka Data Source plugin we developed for Grafana, which enables users to query and visualise Kafka topic data directly in their dashboards—without the need for intermediate storage or external services. We'll share how the idea came about, how we collaborated with the Grafana community and developers to bring it to life, and the challenges we faced along the way. We'll also discuss our vision for the plugin’s future and its role in the evolving observability landscape.

View Video

Grafana

Read more about Grafana & Friends Stockholm meetup at 0+X

How Nexthink Enables Smarter, Data-Driven Hardware Refresh Strategies

Sep 20, 2025 By Nexthink In Nexthink

Bob did—until he realized there was a smarter way. With real-time insights from Nexthink, he stopped guessing and started making data-driven decisions: keeping the high performers, replacing the real troublemakers, and upgrading the underperformers. The result? Happier employees, optimized budgets, and devices refreshed based on need—not age.

Read Post

Nexthink

Read more about How Nexthink Enables Smarter, Data-Driven Hardware Refresh Strategies

Notes from the Field: When a PVS Upgrade Broke XenServer Licensing

Sep 19, 2025 By GripMatix In GripMatix

Even a simple upgrade can cause unexpected issues. Here’s what happened when a PVS upgrade broke XenServer licensing, and how we fixed it.

Read Post

GripMatix

Read more about Notes from the Field: When a PVS Upgrade Broke XenServer Licensing

Synthetic Monitoring Frequency: Best Practices & Examples

Sep 19, 2025 By Dotcom-Monitor In Dotcom-Monitor

Synthetic monitoring is, at its core, about visibility. It’s the practice of probing your systems from the outside to see what a user would see. But there’s a hidden parameter that determines whether those probes actually deliver value: frequency. How often you run checks is more than a technical configuration—it’s a strategic choice that ripples through detection speed, operational noise, and even your team’s credibility.

Read Post

Dotcom-Monitor

Read more about Synthetic Monitoring Frequency: Best Practices & Examples

Instrumenting the Node.js event loop with eBPF

Sep 19, 2025 By Nikolay Sivko In Coroot

Recently, I was testing Coroot’s AI Root Cause Analysis on failure scenarios from the OpenTelemetry demo. One of them, loadgeneratorFloodHomepage, simulates a flood of excessive requests. As expected, it caused a latency degradation across the stack. Coroot’s RCA highlighted how the latency cascaded through all dependent services. At the same time, we noticed a moderate increase in CPU usage for the frontend service and the node itself.

Read Post

Coroot

Read more about Instrumenting the Node.js event loop with eBPF

AWS Prometheus: Production Patterns That Help You Scale

Sep 19, 2025 By Anjali Udasi In Last9

You've got Prometheus running in one cluster — maybe a dev environment, a single EKS cluster, or a proof-of-concept setup. The configuration is straightforward: node_exporter on a few EC2 instances, some service discovery for pods, and a single Prometheus server scraping everything. Storage is local, retention is 15 days, and you can keep all the default recording rules without worrying about costs.

Read Post

Last9

Read more about AWS Prometheus: Production Patterns That Help You Scale

How GenAI is Shaping Elastic Customer Support

Sep 19, 2025 By Elastic In Elastic

Discover how GenAI has accelerated Elastic's customer and support efficiency. Built on Elastic’s Search AI Platform, the Support Assistant delivers self-service in-product customer support and capacity gains within our support function. Julie Rudd, VP of Support at Elastic, shares how it speeds up issue resolution by combining generative AI with Elastic’s deep knowledge base. Hear directly from a support engineer how the Support Assistant streamlines case resolution and helps engineers and customers find answers faster.

View Video

Elastic

Read more about How GenAI is Shaping Elastic Customer Support

The Strategic Imperative: Transforming Platform Sunset into Competitive Advantage

Sep 19, 2025 By ScienceLogic In ScienceLogic

With innovation cycles accelerating, product end-of-life announcements have become an inevitable reality. Infoblox NetMRI, for example, has reached end of life with license sales ending April 2025 and support shutting off by early 2027. Whether it’s a network management platform, IT monitoring system, or enterprise application, the sunset of critical business tools forces organizations into what many view as disruptive, costly transitions.

Read Post

ScienceLogic

Read more about The Strategic Imperative: Transforming Platform Sunset into Competitive Advantage

I turned error messages into a sales machine (by accident)

Sep 19, 2025 By Dan Mindru In Sentry

Dan Mindru is a Frontend Developer and Designer who is also the co-host of the Morning Maker Show. Dan is currently developing a number of applications including PageUI, Clobbr, and CronTool. I find it remarkable that we’re getting so many AI startups every day. As software engineers, most of us like to know what our software is actually doing. We plan, review, and perform automatic tests to verify it’s working as expected. Then we do a round of manual testing for good measure. Not with AI.

Read Post

Sentry

Read more about I turned error messages into a sales machine (by accident)

SquaredUp Cloud + Dashboard Server

Sep 19, 2025 By Blog In Squared Up

SquaredUp Dashboard Server (DS) and SquaredUp Cloud both deliver cutting-edge data visualization for IT and engineering teams. The two products can be used independently, or together for complete operational visibility. This article explores how SquaredUp DS and Cloud differ, when to use each, and how they work together.

Read Post

Squared Up

Read more about SquaredUp Cloud + Dashboard Server

Sentry Seer and Performance with Ben and Cody

Sep 19, 2025 By Sentry In Sentry

Try Sentry for free: https://sentry.io
Docs: https://docs.sentry.io

View Video

Sentry

Monitoring

Read more about Sentry Seer and Performance with Ben and Cody

Solve Microsoft Teams Performance Troubles Before They Hit Your Inbox

Sep 19, 2025 By MartelloTech In Martello Technologies

Who Solves It Faster? Microsoft Native Tools vs. Vantage DX Tickets piling up. Execs on your case. Teams acting up. Microsoft’s tools only show part of the story—leaving you stuck reacting. Watch our pros do a no-fluff, side-by-side showdown: Microsoft Native Tools vs. Martello Vantage DX. Watch them tackle real Teams issues and see who finds and fixes the problem faster. What Attendees will learn.

View Video

Martello Technologies

Read more about Solve Microsoft Teams Performance Troubles Before They Hit Your Inbox

Elastic Cloud Serverless on Google Cloud doubles region availability

Sep 19, 2025 By Yuvraj Gupta, In Elastic

We’re pleased to announce the availability of Elastic Cloud Serverless on Google Cloud in three new regions: This doubles the number of available regions on Google Cloud and dramatically increases serverless deployment options in the US. Elastic Cloud Serverless provides the fastest way to start and scale observability, security, and search solutions without managing infrastructure.

Read Post

Elastic

Read more about Elastic Cloud Serverless on Google Cloud doubles region availability

Chaos to Choreography: How To Automate IT Operations with Nexthink Flow

Sep 19, 2025 By Nexthink In Nexthink

Taylor proved it—turning license headaches, VPN chaos, SCCM continuity and patch pain into a smooth, confident performance.⁠⁠Here's how Taylor did it.⁠ In the end, IT isn’t just about fixing—it’s about flowing, scaling, and making work effortless.⁠Request a demo today.

Read Post

Nexthink

Read more about Chaos to Choreography: How To Automate IT Operations with Nexthink Flow

How Nexthink Enables Data-Driven Software License Reclamation

Sep 19, 2025 By Radhika Mukesh In Nexthink

This was what Sarah was looking to solve. ⁠Managing software licenses isn’t just tracking installs—it’s about uncovering hidden usage and reclaiming wasted spend. When Sarah faced $12M in software costs, scattered licenses, and zero visibility, she needed a better way. With Nexthink, she gained real-time insights, smart user nudges, and automated reclamation. ⁠The result?

Read Post

Nexthink

Read more about How Nexthink Enables Data-Driven Software License Reclamation

The Blind Spots That Haunt Legal IT

Sep 19, 2025 By Teneo In Teneo

In a recent survey, Udacity’s team explored the evolving landscape of AI adoption by asking 2000 professionals (including those in the legal sector) if they used AI. Unsurprisingly, over 90% of respondents said they did. More concerning, 72% of managers reported personally paying out of pocket for AI tools to use at work, introducing uncontrolled risk into corporate environments.

Read Post

Teneo

Read more about The Blind Spots That Haunt Legal IT

How AI Turns Monitoring From "What Now?" Into "What's Next?"

Sep 19, 2025 By Howard Beader In Catchpoint

It's 3 AM. Your phone starts buzzing with alerts, and you stumble to your laptop only to be greeted by a dashboard that looks like the control panel of a nuclear reactor in meltdown: Red lights everywhere. Numbers that should be green are decidedly not green. And your brain, still foggy from sleep, is asking the most fundamental question in all of IT operations: "Okay, yes, there's clearly a problem... but, now what?".

Read Post

Catchpoint

Read more about How AI Turns Monitoring From "What Now?" Into "What's Next?"

OpenTelemetry Logs - A Complete Introduction & Implementation

Sep 19, 2025 By Ankit Anand In SigNoz

OpenTelemetry is a Cloud Native Computing Foundation(CNCF) incubating project aimed at standardizing the way we instrument applications for generating telemetry data(logs, metrics, and traces). OpenTelemetry aims to provide a vendor-agnostic observability framework that provides a set of tools, APIs, and SDKs to instrument applications.

Read Post

SigNoz

Read more about OpenTelemetry Logs - A Complete Introduction & Implementation

From Firefighting to Proactive Resolution: How Nexthink Transforms Service Desk Operations

Sep 19, 2025 By Radhika Mukesh In Nexthink

Level 1 engineers face incoming tickets without real-time visibility into endpoints. The result? Endless tool-switching, guesswork diagnostics, missed SLAs, and unnecessary escalations. Critical issues remain hidden until they impact productivity.⁠ Then came Nexthink.⁠ Now engineers see issues in real time, fix faster, and even prevent problems users don’t notice.

Read Post

Nexthink

Read more about From Firefighting to Proactive Resolution: How Nexthink Transforms Service Desk Operations

Why Has Network Management Missed Its Own Revolution?

Sep 19, 2025 By Yann Guernion In Broadcom

We love to talk about IT revolutions. We celebrate the leaps in innovation that change how we work and live. We look at the 1980s and see the personal computer, which turned computing from a command-line chore into an intuitive experience for everyone. We point to the 1990s as the decade the internet connected the world, the 2000s as the era when virtualization and the cloud broke the chains of physical hardware, and this decade as the dawn of mainstream AI. Each of these moments was transformative.

Read Post

Broadcom

Read more about Why Has Network Management Missed Its Own Revolution?

Modern E2E Testing with Playwright and AI

Sep 19, 2025 By Checkly In Checkly

Pair Playwright with LLMs to plan, generate, refactor, and monitor end-to-end tests, without shipping hallucinations. This webinar showcases practical workflow: ground models with fresh docs, driving the browser via Playwright MCP, auto-fixing failing tests, refactoring to POMs, add API checks, and reusing the same suite for synthetic monitoring in Checkly. Chapters.

View Video

Checkly

Read more about Modern E2E Testing with Playwright and AI

Modern Monitoring, Zero Blackouts: High Availability Reimagined

Sep 19, 2025 By Progress WhatsUp Gold In WhatsUp Gold

Downtime is an expensive inconvenience. Yet many IT teams still face monitoring blackouts due to rigid licensing models and outdated failover strategies. In this session, we’ll introduce a smarter approach: High Availability by Design. Whether you're scaling operations or modernizing infrastructure, this session will enable you with the tools and insights to build a resilient, future-ready monitoring strategy.

View Video

WhatsUp Gold

Read more about Modern Monitoring, Zero Blackouts: High Availability Reimagined

Why IT Teams Still Struggle with Shadow IT in 2025

Sep 19, 2025 By OpsMatters In OpsMatters

Many businesses are still struggling with shadow IT. What is Shadow IT? Any software or hardware, including that of cloud services, which are used without explicit knowledge of the company's IT department, is referred to as shadow IT and is highly dangerous for any business. Not only does it pose significant security risks like data breaches and increased vulnerability to cyberattacks, but it also puts employees at risk.

Read Post

OpsMatters

Read more about Why IT Teams Still Struggle with Shadow IT in 2025

Enhance Your Microsoft SCOM Capabilities With Exclusive Benefits

Sep 18, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

Start Smarter with NiCE Perks Claim Now.

Read Post

NiCE IT Mgmt

Read more about Enhance Your Microsoft SCOM Capabilities With Exclusive Benefits

Automated BSoD (Blue Screen of Death) Monitoring and Troubleshooting

Sep 18, 2025 By Erik van Veenendaal In eG Innovations

Yes, BSoDs are still cropping up in high-impact ways in 2025, from flawed Windows updates (especially 24H2 patches) to driver rollouts and heavily-threaded server environments. It remains essential for IT admins to track event reports, test updates in staging, enable rollback strategies, and be prepared with recovery mechanisms.

Read Post

eG Innovations

Read more about Automated BSoD (Blue Screen of Death) Monitoring and Troubleshooting

Your Next Observability RFP Is All Wrong: Why AI Changes Everything

Sep 18, 2025 By Logz.io In logz.io

Watch how AI is reshaping observability for the years ahead. In this fireside chat, Logz.io founders Tomer Levy and Asaf Yigal reveal how the most innovative AI-first companies are breaking free from dashboards, avoiding common RFP mistakes, and building future-ready stacks. You’ll see: Watch and learn how autonomous AI eliminates noise, slashes costs, and gives engineering teams back their velocity.

View Video

logz.io

Read more about Your Next Observability RFP Is All Wrong: Why AI Changes Everything

Monitor and optimize your systems with Uptrace

Sep 18, 2025 By Uptrace In Uptrace

View Video

Uptrace

Read more about Monitor and optimize your systems with Uptrace

Observability Day San Francisco: The Future of AI and Observability Is Bright

Sep 18, 2025 By Ken Rimple In Honeycomb

AI and observability are no longer separate conversations—they’re deeply intertwined. Across keynotes, panels, and demos, speakers at Honeycomb's Observability Day San Francisco unpacked what that means for engineering teams today: faster insights, smarter tools, and new challenges to solve.

Read Post

Honeycomb

Read more about Observability Day San Francisco: The Future of AI and Observability Is Bright

OpenTelemetry Observability: An In-Depth Look at Features and Best Practices

Sep 18, 2025 By Rotem Froimovici In logz.io

OpenTelemetry (OTel) is a unified framework of APIs, SDKs and tools, for collecting, processing, and exporting telemetry data (logs, metrics, and traces) across applications and infrastructure. OTel is especially required in today’s cloud-native world, where applications run on microservices, Kubernetes, and distributed systems.

Read Post

logz.io

Read more about OpenTelemetry Observability: An In-Depth Look at Features and Best Practices

Database Monitoring Challenges Every DevOps Engineer Should Know

Sep 18, 2025 By Pavithra Parthiban In Atatus

Databases form the critical foundation of modern applications, and maintaining their performance and reliability is essential for operational efficiency and user satisfaction. Effective database monitoring however presents numerous challenges. Modern systems produce extensive metrics, operate across diverse environments, and must scale in line with growing workloads, all while ensuring compliance and security.

Read Post

Atatus

Read more about Database Monitoring Challenges Every DevOps Engineer Should Know

LLM app Observability: Opentelemetry as a standard

Sep 18, 2025 By SigNoz - Open Source Observability Platform In SigNoz

LLM observability is broken There are too many new libraries floating around, but they don't follow accurately the OpenTelemetry conventions. OTel isn’t perfect for LLMs yet—but extending a proven standard beats inventing another one. Why not use the same standard (OTel) which works so well for rest of the apps, and just work on top of it? This is what I was ranting with Pranav Raj S, co-founder at Chatwoot and we thought there must be other folks facing similar issues.

View Video

SigNoz

Read more about LLM app Observability: Opentelemetry as a standard

Internal SLAs for Third-Party Vendors: Complete Guide

Sep 18, 2025 By Nuno Tomas In isDown

Managing third-party vendors effectively requires clear expectations and measurable standards. Internal SLAs for third-party vendors provide the framework to track vendor performance, ensure compliance, and maintain service quality across your entire vendor ecosystem. This guide covers everything you need to establish and manage vendor SLAs that protect your business interests while fostering productive vendor relationships.

Read Post

isDown

Read more about Internal SLAs for Third-Party Vendors: Complete Guide

Datadog in the Era of AI

Sep 18, 2025 By Datadog In Datadog

AI is changing everything. At Datadog, our approach is two-fold: empower you with complete observability across your entire stack, including AI as you incorporate it, and harness emergent technologies to make Datadog even more powerful. Join VP of Product Michael Whetten to see how Datadog is accomplishing these two approaches. He'll share the latest feature updates and new products designed to help you thrive in an AI-powered world. Plus, get a look at our long-term vision for the future of AI and its impact on your work.

View Video

Datadog

Read more about Datadog in the Era of AI

NVIDIA DCGM Monitoring: Setup, Metrics & Alerts | MetricFire

Sep 18, 2025 By Benjamin Pitts In MetricFire

Want a quick path to NVIDIA DCGM monitoring? This guide shows how to install the DCGM Exporter, scrape /metrics, visualize in Grafana, and set alerts for power, errors, and utilization.

Read Post

MetricFire

Read more about NVIDIA DCGM Monitoring: Setup, Metrics & Alerts | MetricFire

Proactively monitor Kerberos-authenticated web apps and APIs with Datadog Synthetics

Sep 18, 2025 By Lauren Zuniga In Datadog

When employee authentication fails or becomes unreliable, users can lose access to the critical systems they need. Authentication enables access to internal tools like HR applications, finance portals, and internal dashboards, so even short outages can interrupt day-to-day work, while persistent issues increase the risk of broader operational disruption.

Read Post

Datadog

Read more about Proactively monitor Kerberos-authenticated web apps and APIs with Datadog Synthetics

Single-tenant vs. multi-tenant architecture with Grafana Cloud: A guide to choosing the right approach

Sep 18, 2025 By Naima Alexander In Grafana

Grafana Cloud’s flexibility is one of its greatest strengths, but the breadth of choices can sometimes be overwhelming. We see this a lot when it comes to selecting the right architectural approach, with organizations unsure of how many stacks they need to host their environment. Grafana Cloud provides robust features for managing tenancy, enabling organizations to effectively handle diverse teams and projects.

Read Post

Grafana

Read more about Single-tenant vs. multi-tenant architecture with Grafana Cloud: A guide to choosing the right approach

Bridging the Network Cost Gap: Why Operators Need Real-Time, Traffic-Based Cost Intelligence

Sep 18, 2025 By Jezzibell Gilmore In Kentik

Jezzibell Gilmore’s latest blog dives into the critical challenge network operators face: bridging the gap between massive traffic growth and understanding its actual cost. Learn why real-time, traffic-based cost intelligence is no longer optional for maintaining margins and driving revenue in today’s complex network landscape.

Read Post

Kentik

Read more about Bridging the Network Cost Gap: Why Operators Need Real-Time, Traffic-Based Cost Intelligence

Sponsored Post

Implementing Agentic AI: A Technical Overview of Architecture and Frameworks

Sep 17, 2025 By Shailesh Manjrekar In Fabrix

As businesses strive for smarter, faster operations, Agentic AI redefines enterprise operations, introducing solutions for autonomous decision-making and tackling complex challenges with precision. Agentic AI introduces an intelligent, enterprise-focused approach to enhancing operational efficiency and adaptability, paving the way for innovation. Its ability to support operational scalability and streamline workflows positions it as a vital tool for modern IT ecosystems.

Read Post

Fabrix

Read more about Implementing Agentic AI: A Technical Overview of Architecture and Frameworks

Why AIX Automation Starts with Better Monitoring: How Galileo Powers Smarter Action

Sep 17, 2025 By Kristy Slimmer In Galileo

If your automation can’t trust the data it’s acting on, it’s not automation. It’s a guess. That’s why AIX automation monitoring is the foundation for success. Many teams encounter this gap when trying to automate AIX operations. Red Hat Ansible Automation Platform (AAP) and Event-Driven Ansible (EDA) can absolutely streamline routine tasks, like expanding filesystems or tuning adapters. But every playbook still depends on one thing: accurate, real-time monitoring.

Read Post

Galileo

Read more about Why AIX Automation Starts with Better Monitoring: How Galileo Powers Smarter Action

ManageEngine named in the 2025 Gartner Magic Quadrant for AI Applications in ITSM

Sep 17, 2025 By Alexandria Nisha In ManageEngine

We're proud to announce that ManageEngine has been recognized in the 2025 Gartner Magic Quadrant for AI Applications in ITSM. This recognition comes after Gartner's comprehensive evaluation of our Completeness of Vision and Ability to Execute. We believe this recognition reflects our commitment to making AI-driven ITSM cost-effective, easy to implement, and scalable to meet modern enterprises' growing needs.

Read Post

ManageEngine

Read more about ManageEngine named in the 2025 Gartner Magic Quadrant for AI Applications in ITSM

What does the EU Data Act mean for Observability?

Sep 17, 2025 By Chris Cooney In Coralogix

The EU Data Act came into effect on January 12th, 2024 and most of its provisions apply from September 12th, 2025. The EU Data Act is designed to give individuals and businesses more control over the data they generate, ensuring fair access, use, and sharing across sectors. For any data generating platform that intends to operate in the European Union, this new legislation matters.

Read Post

Coralogix

Read more about What does the EU Data Act mean for Observability?

Early Outage Detection: Get Notified Before Outages Become Official Incidents

Sep 17, 2025 By Nuno Tomas In isDown

We've just launched one of our biggest feature updates yet - Early Outage Detection. This fundamentally changes how IsDown works, transforming it from a reactive monitoring tool into a proactive early warning system.

Read Post

isDown

Read more about Early Outage Detection: Get Notified Before Outages Become Official Incidents

Streamlining GitHub security alerts monitoring with SquaredUp

Sep 17, 2025 By SquaredUp In Squared Up

GitHub security alerts can get noisy very quickly, making it hard to see exactly what's going on across an organization. In this video, our Director of Engineering Tim Wheeler shows how you can quickly and easily get the visibility you need in a SquaredUp dashboard.

View Video

Squared Up

Read more about Streamlining GitHub security alerts monitoring with SquaredUp

Release webinar: Dashboard Server 7.0

Sep 17, 2025 By SquaredUp In Squared Up

In this release webinar, Graham Davies, SCOM veteran and Dashboard Server Product Manager, takes a look at the latest enhancements in DS 7.0. Key highlights include.

View Video

Squared Up

Read more about Release webinar: Dashboard Server 7.0

Overview of Contacts, Integrations & Alert Escalations

Sep 17, 2025 By Uptime Website Monitoring In uptime

Learn how Uptime.com supports multi-channel alerting and escalations so you can get the alerts you need, where you need, when you need.

View Video

uptime

Monitoring

Read more about Overview of Contacts, Integrations & Alert Escalations

Announcing Dynamic Service Insights in LogicMonitor Envision

Sep 17, 2025 By Ismath Mohideen In LogicMonitor

If you’re in IT operations, you’ve likely faced the disconnect firsthand: your dashboards say everything’s green, but your business stakeholders are asking why the website is slow, the customer portal is timing out, or a regional service is underperforming. Your team is usually on top of issues, such as monitoring infrastructure health, resolving alerts, and keeping systems online. But the business isn’t looking at device uptime.

Read Post

LogicMonitor

Read more about Announcing Dynamic Service Insights in LogicMonitor Envision

Redefining Resilient IT: Edwin AI, Service Intelligence, and What's Next for LogicMonitor

Sep 17, 2025 By Garth Fort In LogicMonitor

Downtime is more than an inconvenience these days, nor is it solely a problem for the ITOps team. Since every organization is a digital business, downtime can cost millions of dollars per hour, stall innovation, and erode customer trust. Yet most IT teams are still trapped in reactive mode, scrambling across fragmented tools and drowning in alert fatigue. That model no longer works. The future of IT is about foresight, not firefighting.

Read Post

LogicMonitor

Read more about Redefining Resilient IT: Edwin AI, Service Intelligence, and What's Next for LogicMonitor

Future-Proofing Your Historian with a Time Series Database

Sep 17, 2025 By Allyson Boate In InfluxData

As technology scales and data volumes accelerate, organizations face a pressing challenge: how can they modernize data infrastructure without putting daily operations at risk? Data historians, specialized databases that capture and store time-stamped machine and sensor data, have long been the foundation for reliability and compliance. However, they were not designed for the openness and advanced analytics that modern workloads demand.

Read Post

InfluxData

Read more about Future-Proofing Your Historian with a Time Series Database

From Monitoring to Meaning: Why Service Observability Platforms Are Essential for Modern Enterprises

Sep 17, 2025 By david.arrowsmith In Interlink

At Interlink, we believe the future of IT Operations (ITOps) is about Service Observability, incident prevention and automated remediation.

Read Post

Interlink

Read more about From Monitoring to Meaning: Why Service Observability Platforms Are Essential for Modern Enterprises

Track the performance of your HPC workloads with Datadog's AWS PCS integration

Sep 17, 2025 By Candace Shamieh In Datadog

AWS Parallel Computing Service (AWS PCS) is a managed service that helps users run and scale their high performance computing (HPC) workloads. AWS PCS uses Slurm, an open source workload manager, for scheduling and orchestrating simulations, which enables users to build their scientific and engineering models in a familiar HPC environment.

Read Post

Datadog

Read more about Track the performance of your HPC workloads with Datadog's AWS PCS integration

Single Sign-On Configuration Using SAML

Sep 17, 2025 By Uptime Website Monitoring In uptime

Learn how to configure Single Sign-On (SSO) with SAML on Uptime.com to streamline your login process and enhance security. This step-by-step guide will walk you through the setup!

View Video

uptime

Read more about Single Sign-On Configuration Using SAML

LibreNMS + VictoriaMetrics: The Ultimate Monitoring Duo

Sep 17, 2025 By VictoriaMetrics In VictoriaMetrics

Get the best of both worlds! Love LibreNMS but need more power for long-term metrics? Integrate it with VictoriaMetrics! We show you a surprisingly simple way to combine these two powerhouses for advanced querying and storage, and even how to A/B test your setup. Want more ways to power-up your monitoring stack? Subscribe for more integration guides and pro tips!

View Video

VictoriaMetrics

Monitoring

Read more about LibreNMS + VictoriaMetrics: The Ultimate Monitoring Duo

Monitoring Critical User Experiences In React Native Mobile Apps

Sep 17, 2025 By Sentry In Sentry

In this video, we show you how to instrument Sentry tracing in a React Native app to.

View Video

Sentry

Read more about Monitoring Critical User Experiences In React Native Mobile Apps

You can't understand digital experience without monitoring from where your users actually are!

Sep 17, 2025 By Catchpoint In Catchpoint

If you’re monitoring your applications, you’re missing what your customers are actually seeing. Performance issues don’t happen in a vacuum. They happen at the edges, on mobile devices, over congested networks, in last-mile dead zones. Monitoring only works when it’s aligned with reality. And reality starts at the user.

View Video

Catchpoint

Read more about You can't understand digital experience without monitoring from where your users actually are!

Icinga Experience: Insights from Real-World Icinga Deployments Across Industries

Sep 17, 2025 By Angelika Bang In Icinga

Modern IT environments are hybrid, distributed, and constantly growing. To keep them reliable, organizations rely on monitoring that scales, automates, and integrates seamlessly into existing workflows. We collected 24 Icinga customer stories from industries including finance, telecom, manufacturing, and public services. What unites them is the choice of Icinga as a flexible and cost-efficient alternative to proprietary monitoring tools.

Read Post

Icinga

Read more about Icinga Experience: Insights from Real-World Icinga Deployments Across Industries

SQL performance improvements: finding the right queries to fix (part 1)

Sep 17, 2025 By Mattias Geniar In Oh Dear

A few weeks ago, we massively improved the performance of the dashboard & website by optimizing some of our SQL queries. In this post, we'll share how we identified the queries that needed work. In the next post, we'll explore how we fixed each of them. We'll cover the basics and gradually work our way up to the more advanced/complex ways of identifying slow queries. In this post, you'll see: Let's go!

Read Post

Oh Dear

Read more about SQL performance improvements: finding the right queries to fix (part 1)

You can now connect your AI to Oh Dear

Sep 17, 2025 By Freek Van der Herten In Oh Dear

Today, we have launched our MCP server. MCP (Model Context Protocol) is a standardized way for AI models to connect with external data sources and tools. If you use a tool like Claude Code, then this is how you can connect Oh Dear to it (you can create an API token in your account settings)

Read Post

Oh Dear

Read more about You can now connect your AI to Oh Dear

What is Asynchronous Job Monitoring?

Sep 17, 2025 By Anjali Udasi In Last9

Modern applications don’t process everything inside the request/response path. To keep APIs responsive, time-consuming work like image resizing, payment processing, or data syncs is moved into background queues. Workers then pick up these asynchronous jobs and run them outside the main thread. Asynchronous job monitoring is the practice of tracking these background tasks: Without this visibility, background workers become a blind spot.

Read Post

Last9

Read more about What is Asynchronous Job Monitoring?

Faster, more memory-efficient performance in Grafana Mimir: a closer look at Mimir Query Engine

Sep 17, 2025 By Jon Kartago Lamida In Grafana

Until recently, Grafana Mimir — our open source, horizontally scalable, multi-tenant time series database (TSDB) — has exclusively used Prometheus’ PromQL engine to evaluate queries. While the PromQL engine works great, it sometimes needs a lot of memory to run, specifically in the Mimir querier component. To address this memory consumption issue, we recently introduced Mimir Query Engine (MQE).

Read Post

Grafana

Read more about Faster, more memory-efficient performance in Grafana Mimir: a closer look at Mimir Query Engine

IT and OT Convergence: Defending Critical Infrastructure

Sep 17, 2025 By Nick Vlasov In Flowmon

We recently delivered a webinar titled IT/OT Convergence: Proactive Threat Detection for Industrial Control Systems (also available via Brighttalk).

Read Post

Flowmon

Read more about IT and OT Convergence: Defending Critical Infrastructure

Cribl.Cloud Government Is a New Era of Secure Cloud Telemetry for Federal Agencies

Sep 17, 2025 By Dritan Bitincka In Cribl

As a Co-founder and CPO at Cribl, I'm genuinely stoked that our new federal suite, Cribl.Cloud Government, has achieved an “In Process” designation under the Federal Risk and Authorization Management Program (FedRAMP). This isn’t any old milestone. We’re bringing all of Cribl’s kickass capabilities to government agencies, even those that require the strictest compliance and security standards. Because, who doesn’t love a good set of rules?

Read Post

Cribl

Read more about Cribl.Cloud Government Is a New Era of Secure Cloud Telemetry for Federal Agencies

Making the invisible visible: Are your cloud firewalls and DDoS protection really working?

Sep 17, 2025 By Sheikh Mursaleen In Catchpoint

Every business builds strong defences to keep attackers out. Firewalls and DDoS protection serve that purpose, standing guard over company apps and websites, like knights at the castle gate keeping out trolls (not just the ones on X). But here’s the problem: those defences only work if users actually walk through the front gate. Sometimes, people find hidden paths or side doors around your walls, so the guards never see them enter.

Read Post

Catchpoint

Read more about Making the invisible visible: Are your cloud firewalls and DDoS protection really working?

BYOS with Cribl Lake: Data ownership meets flexibility

Sep 17, 2025 By Rick Salsa and In Cribl

Today, more than ever, organizations face a difficult balancing act: how to keep sensitive data fully under their control while still making it accessible and usable so teams can unlock the value and insights they need. Industries such as financial services, healthcare, and government agencies often must comply with strict regulations that require data to remain in environments they directly own and manage.

Read Post

Cribl

Read more about BYOS with Cribl Lake: Data ownership meets flexibility

Cribl.Cloud Goes to Washington: Cribl.Cloud Government FedRAMP Authority to Operate Milestone

Sep 17, 2025 By Andy Nortrup In Cribl

Way back in 2009, when I was serving as a second lieutenant in the U.S. Army, I worked in a network operations center for a deployed Army unit. Our mission was to provide network connectivity across central and northern Iraq. Our observability tools were incredibly limited. We had a network map that would turn nodes and network links red, yellow, and green when they were up or down. We had to write down in a physical logbook any status changes and what we did about them.

Read Post

Cribl

Read more about Cribl.Cloud Goes to Washington: Cribl.Cloud Government FedRAMP Authority to Operate Milestone

Top woes of IT pros: A third say they're only noticed when 'something explodes'

Sep 16, 2025 By SolarWinds In SolarWinds

Survey reveals the phrases, buzzwords and behaviours IT teams are sick of hearing.

Read Post

SolarWinds

Read more about Top woes of IT pros: A third say they're only noticed when 'something explodes'

CloudSpend for iOS 26 for sharper, smarter, and simpler cloud cost management

Sep 16, 2025 By Ramkumar Ramaswamy In ManageEngine

Experience seamless control, clarity, and cost optimization with the CloudSpend app on iOS 26. This update integrates Apple’s new Liquid Glass design and secure, on-device AI summaries to deliver instant insights into your cloud spending, empowering you to act decisively from anywhere.

Read Post

ManageEngine

Read more about CloudSpend for iOS 26 for sharper, smarter, and simpler cloud cost management

Unlock Real-Time AWS Observability With Streaming Ingestion in DX Operational Observability

Sep 16, 2025 By Ashish Aggarwal In Broadcom

In fast-paced cloud environments, traditional monitoring methods often fall short. This leaves teams with latency and data gaps. It’s time to gain near real-time visibility into your AWS telemetry, enabling faster incident response and deeper insights. With its new streaming ingestion capabilities, DX Operational Observability (DX O2) is revolutionizing cloud monitoring—enabling teams to leverage AWS CloudWatch Metric Streams and Amazon Kinesis Data Firehose.

Read Post

Broadcom

Read more about Unlock Real-Time AWS Observability With Streaming Ingestion in DX Operational Observability

Observability and IT Monitoring Governance (Part 4 of 4)

Sep 16, 2025 By Steve Danseglio In Broadcom

Following parts one, two, and three of this blog series, this post offers a short, real-world example that shines light on why strong monitoring governance is a must have.

Read Post

Broadcom

Read more about Observability and IT Monitoring Governance (Part 4 of 4)

Observability and IT Monitoring Governance: Establishing Order (Part 3 of 4)

Sep 16, 2025 By Ravishu Arora In Broadcom

In our previous posts, we explored why robust IT monitoring governance is no longer a luxury but a strategic imperative. We highlighted how a disciplined framework prevents blind spots, reduces risk, and ensures the reliability and scalability of your critical business applications. But how do you translate these principles into practical, actionable governance within your IT environment?

Read Post

Broadcom

Read more about Observability and IT Monitoring Governance: Establishing Order (Part 3 of 4)

Logstash Alternative: Why Security Teams Are Choosing Modern Data Pipelines

Sep 16, 2025 By VirtualMetric In VirtualMetric

Logstash has been a workhorse in data processing pipelines for years, but it was not designed with today’s security operations in mind. Security teams now deal with massive telemetry volumes, rising SIEM costs, and diverse log formats that require constant normalization. In this environment, Logstash shows its age: manual configuration, outdated parsing, and scalability bottlenecks introduce fragility instead of efficiency.

Read Post

VirtualMetric

Read more about Logstash Alternative: Why Security Teams Are Choosing Modern Data Pipelines

The Evolution of Digital Adoption: Insights from Gartner's 2025 Market Guide

Sep 16, 2025 By Nexthink In Nexthink

The 2025 Market Guide for Digital Adoption Platforms (DAPs) marks an important point in the evolution of the category. Digital adoption has matured from a supporting role into a central part of enterprise strategy. Organizations are no longer asking if they need a DAP—they’re asking which one. In this latest research, Gartner establishes DAPs as essential to business transformation, efficiency, and employee experience. The takeaway is clear: digital adoption is no longer optional.

Read Post

Nexthink

Read more about The Evolution of Digital Adoption: Insights from Gartner's 2025 Market Guide

Speed improvements to the dashboard, website & job processing

Sep 16, 2025 By Mattias Geniar In Oh Dear

The past month we dedicated time and resources into optimising the speed and experience of our public website, our dashboard and our behind-the-scenes uptime checks that we perform. Overall, our website and dashboard feels about 2x to 3x faster. The biggest gains are for our users that have > 100 sites on their dashboard, they'll get a noticeably faster loading time. For those biggest users, the dashboard is quite litterally 10x faster.

Read Post

Oh Dear

Read more about Speed improvements to the dashboard, website & job processing

Logs & Lattes: Episode 1 - Smart Logging Without the Price Trap

Sep 16, 2025 By Graylog In Graylog

How much value are you really getting from your logs, and what are you giving up to stay on budget? In this episode of Logs and Lattes, host Palmer Wallace sits down with Seth Goldhammer, VP of Product Management at Graylog, for a candid conversation about the hidden cost of traditional SIEM pricing. Seth explains how ingest-based and resource-heavy licensing models pressure security teams into tough tradeoffs, such as dropping logs, tuning down detections, or limiting retention just to avoid budget overages.

View Video

Graylog

Read more about Logs & Lattes: Episode 1 - Smart Logging Without the Price Trap

Govern Custom Metrics Volumes in Datadog

Sep 16, 2025 By Datadog In Datadog

Custom Metrics provide critical visibility into your environment and applications. In this video, we’ll show you how to govern Custom Metrics volumes in Datadog by: Building monitors to proactively catch usage spikes Identifying and attributing your largest cost drivers Reducing costs on less valuable, unused metrics.

View Video

Datadog

Read more about Govern Custom Metrics Volumes in Datadog

Kubernetes Service Discovery Explained with Practical Examples

Sep 16, 2025 By Faiz Shaikh In Last9

In Kubernetes, applications are constantly changing — new pods start, old ones shut down, workloads shift across nodes. The challenge is making sure that different parts of your system, and even external clients, can still find each other when the actual locations keep moving. That’s what service discovery handles. It provides a stable way for applications to connect and communicate, no matter where they’re running or how often the underlying infrastructure changes.

Read Post

Last9

Read more about Kubernetes Service Discovery Explained with Practical Examples

Bridging the Gap Integrating Logs Metrics and Flow for Observability

Sep 16, 2025 By VictoriaMetrics In VictoriaMetrics

In this video, we discuss handling both old and new systems in IT environments. From legacy SNMP setups to modern telemetry, most organizations juggle multiple data sources, which can make observability feel overwhelming. We explore how to combine logs, metrics, and flow data into one system that provides actionable insights. You’ll see practical examples of simplifying scattered tools and making sense of complex, disparate information. Understanding how these different types of data work together is key to getting observability right.

View Video

VictoriaMetrics

Read more about Bridging the Gap Integrating Logs Metrics and Flow for Observability

Pastries with SREs: OTel me where the cronuts are

Sep 16, 2025 By Elastic In Elastic

In this episode of Pastries with SREs, we tackle an observability debated topic: Do you need a Single Pane of Glass OR is OpenTelemetry a better strategy? We explore: Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

View Video

Elastic

Read more about Pastries with SREs: OTel me where the cronuts are

Frontend JavaScript performance testing: A comprehensive guide

Sep 16, 2025 By Anton Bjorkman In Sentry

When a page pauses for even a quarter-second users feel it, and many will tab away before the spinner stops. Front-end performance testing lets us spot those delays on our own machines instead of reading about them in support tickets. The browser runs JavaScript, layout, painting, and every user interaction on a single main thread. If one task takes too long, everything else queues up behind it.

Read Post

Sentry

Read more about Frontend JavaScript performance testing: A comprehensive guide

Top Node.js Application Challenges and How Monitoring Solves Them

Sep 16, 2025 By Mohana Ayeswariya J In Atatus

Deploying a Node.js application may feel straightforward at first. Everything checks out in tests, staging runs smoothly, and early users run into no problems. But as real traffic ramps up, hidden problems start to appear in unexpected ways. Requests fail intermittently, latency spikes without warning, memory usage climbs silently, and logs are scattered across multiple processes making it nearly impossible to trace the root cause.

Read Post

Atatus

Read more about Top Node.js Application Challenges and How Monitoring Solves Them

Distributed performance testing for Kubernetes environments: Grafana k6 Operator 1.0 is here

Sep 16, 2025 By Olha Yevtushenko In Grafana

Performance testing is critical to build reliable applications, but testing at scale, especially inside modern Kubernetes environments, can be a challenge. For example, how do you coordinate tests across multiple nodes, test private services without compromising security, or even do both at once? And most importantly, how do you do all this without adding too much operational complexity to your stack?

Read Post

Grafana

Read more about Distributed performance testing for Kubernetes environments: Grafana k6 Operator 1.0 is here

OpenTelemetry Operator Complete Guide [OTel Collector + Auto-Instrumentation Demo]

Sep 16, 2025 By Favour Daniel In SigNoz

Manually deploying and managing OpenTelemetry components in a Kubernetes environment can be a complex and time-consuming task. It involves creating various Kubernetes resources, setting up configurations, and ensuring the components are properly integrated with the applications.

Read Post

SigNoz

Read more about OpenTelemetry Operator Complete Guide [OTel Collector + Auto-Instrumentation Demo]

Why Doctors Now Recommend Wearable Technology for Elderly Parents

Sep 16, 2025 By OpsMatters In OpsMatters

Wearable technology for elderly parents is becoming increasingly vital as we face an unprecedented demographic shift. By 2050, the global population of individuals aged 65 and older is projected to reach nearly 1.5 billion. This number has been projected to rise from 46 million in 2016 to over 98 million by 2060 in the United States alone. These statistics highlight why we need innovative solutions for senior care.

Read Post

OpsMatters

Read more about Why Doctors Now Recommend Wearable Technology for Elderly Parents

Smoother, smarter observability with the updated Site24x7 iOS 26

Sep 15, 2025 By Ramkumar Ramaswamy In ManageEngine

Enjoy improved control, clarity, and communication using the Site24x7 app on iOS 26. This update blends Apple's dynamic liquid glass design language with fast, secure, on-device AI summaries that help you observe your IT stack instantly and act decisively, from anywhere.

Read Post

ManageEngine

Read more about Smoother, smarter observability with the updated Site24x7 iOS 26

How to connect ServiceNow to Grafana Cloud IRM incidents

Sep 15, 2025 By Matías Bordese In Grafana

Companies rely on a variety of services to streamline their workflows, which often requires data synchronization or information sharing across platforms. But are your tools flexible enough to connect with external systems? ServiceNow is widely recognized for its robust and complex workflow support for enterprises. However, it may not always offer the most intuitive or user-friendly experience when handling incidents.

Read Post

Grafana

Read more about How to connect ServiceNow to Grafana Cloud IRM incidents

What is Database Monitoring? A Guide for Developers, DevOps, and SREs

Sep 15, 2025 By Pavithra Parthiban In Atatus

Databases handle critical operations for applications, from online banking to e-commerce and streaming services. Any slowdown or failure can directly affect application performance and user experience. Database monitoring tracks performance, detects issues, and helps prevent downtime. It also ensures efficient use of resources, maintains security, and supports compliance requirements.

Read Post

Atatus

Read more about What is Database Monitoring? A Guide for Developers, DevOps, and SREs

Hand Code or no Code, Scout Keeps Error Monitoring Out of the Log Mess

Sep 15, 2025 By Sarah Morgan In Scout

You’ve built something, it’s live, and users are starting to show up. Maybe you programmed it from scratch, used a tool, or vibe-coded it into existence. No matter how it came to be, the fact that you’ve got users is great! But here’s a question every new developer must eventually ask: how do I know when my site is actually failing? ‍ The thing is, these failures aren’t always obvious.

Read Post

Scout

Read more about Hand Code or no Code, Scout Keeps Error Monitoring Out of the Log Mess

Introducing AI Drive: Closing the AI Value Gap

Sep 15, 2025 By Samuele Gantner In Nexthink

The enterprise is standing at the edge of a seismic shift: an AI revolution. In the next five years, the way work gets done will be fundamentally reshaped as workflows once handled by humans are increasingly replaced or enhanced by artificial intelligence. But here’s the reality: success won’t come from simply handing out Copilot or GPT licenses and hoping employees figure it out.

Read Post

Nexthink

Read more about Introducing AI Drive: Closing the AI Value Gap

Background Job Observability Beyond the Queue

Sep 15, 2025 By Anjali Udasi In Last9

Background jobs handle the critical work that happens outside the request path: processing payments, sending emails, generating reports, syncing data. They keep applications running smoothly, but the signals they produce look different from API endpoints. Most teams start with queue metrics—how many jobs are waiting and how quickly they complete. These metrics provide the foundation, but job health extends beyond throughput.

Read Post

Last9

Read more about Background Job Observability Beyond the Queue

The Performance Impact of Session Replay Scripts

Sep 15, 2025 By Rollbar In Rollbar

Session replay vendors love talking about features and pricing, but rarely publish the technical specs that matter most to developers. We analyzed the actual JavaScript bundles and their performance impacts across five major platforms. You know what's wild? Bundle sizes range from 36KB to over 550KB gzipped. That's the difference between imperceptible impact and noticeable slowdown for your users.

Read Post

Rollbar

Read more about The Performance Impact of Session Replay Scripts

Honeycomb Observability Day San Francisco

Sep 15, 2025 By Honeycomb In Honeycomb

Did you miss Honeycomb's Observability Day San Francisco? Here are some highlights of the day.

View Video

Honeycomb

Read more about Honeycomb Observability Day San Francisco

10 Most Common Network Devices & How to Monitor Them

Sep 15, 2025 By Andrii Kernitskyi In Obkio

When it comes to running a reliable IT infrastructure, network devices are at the center stage. They sit quietly in the background (routing packets, securing traffic, and keeping teams connected) until something goes wrong. The truth is, without them, nothing works. Every network device has a specific role: some connect users, some protect data, others balance traffic or bridge different protocols.

Read Post

Obkio

Read more about 10 Most Common Network Devices & How to Monitor Them

Reality Bytes: How AI Drive Will Help YOU Win the AI Race (w/ Matt Rose)

Sep 15, 2025 By Nexthink In Nexthink

In this special Reality Bytes, the team sits down with Matt Rose, Product Marketing Manager at Nexthink, to talk about the just-announced AI Drive. Matt breaks down how AI Drive helps organizations finally make sense of their scattered AI landscape—bringing visibility, usage insights, guidance, and measurable impact into one place inside the Nexthink Infinity platform. He also shares early customer feedback, the future roadmap, and why AI adoption has become a critical competitive differentiator.

View Video

Nexthink

Read more about Reality Bytes: How AI Drive Will Help YOU Win the AI Race (w/ Matt Rose)

The Agent-Gateway Model: Best Practice OpenTelemetry Collector Architecture Explained

Sep 15, 2025 By Bindplane In ObservIQ

View the recording of Bindplane's OTel Deep Dive Workshop in August 2025.

View Video

ObservIQ

Read more about The Agent-Gateway Model: Best Practice OpenTelemetry Collector Architecture Explained

LangChain Observability: Monitoring Guide for Production Apps

Sep 15, 2025 By Alexandr Bandurchin In Uptrace

LangChain applications fail differently than traditional web apps. A single user request can trigger 15+ LLM calls, cost $5 in tokens, and fail silently without throwing errors. One team discovered a $12,000 OpenAI bill caused by a recursive chain with no monitoring. This guide shows how to implement observability for LangChain applications, giving you complete visibility into performance, costs, and errors before they impact your users or budget.

Read Post

Uptrace

Read more about LangChain Observability: Monitoring Guide for Production Apps

OpenTelemetry Collector Agent-Gateway Troubleshooting & High Availability Explained

Sep 15, 2025 By Bindplane In ObservIQ

View the recording of Bindplane's OTel Deep Dive Workshop in August 2025.

View Video

ObservIQ

Read more about OpenTelemetry Collector Agent-Gateway Troubleshooting & High Availability Explained

The Personalization Paradox: When Tailored UX Turns "Creepy"

Sep 14, 2025 By Germain UX Team In Germain UX

“Stop watching me.” That’s an actual message a user typed into a search bar, captured during session monitoring. They weren’t talking to customer support. They were talking to the algorithm. It sounds absurd until you realize how common this is. When users believe a human is behind your personalization system, attributing consciousness to your automated algorithms, everything changes. Their behavior becomes erratic. Your conversions tank. And nobody talks about it.

Read Post

Germain UX

Read more about The Personalization Paradox: When Tailored UX Turns "Creepy"

Fixing the Reconciliation Gap: Why Order to Cash Breaks Across Industries and How to Close It

Sep 13, 2025 By Andrew Mallaband In meshIQ

Whether you sell consumer goods, ship freight, manufacture vehicles, process payments, underwrite insurance, or manage hospital claims, your business depends on the same thing: order to cash. Orders are created, fulfilled, invoiced, and paid. In principle, it should be simple. In practice, the process is riddled with breaks. Most companies believe they are covered. They run ERP systems like SAP. They use EDI gateways such as Sterling.

Read Post

meshIQ

Read more about Fixing the Reconciliation Gap: Why Order to Cash Breaks Across Industries and How to Close It

Cost Control | SigNoz Launch Week 5.0 | Day 5

Sep 13, 2025 By SigNoz - Open Source Observability Platform In SigNoz

Take control of your observability spending with complete transparency into usage patterns across logs, metrics, and traces. No more surprise bills or blind cost optimization - get the visibility you need to manage budgets effectively.

View Video

SigNoz

Read more about Cost Control | SigNoz Launch Week 5.0 | Day 5

You can now choose the frequency of checks

Sep 13, 2025 By Freek Van der Herten In Oh Dear

As part of our big deploy that added ping and TCP monitoring, we’ve also shipped a small, but often requested feature: you can now choose the frequency of the check we run. By default, we check your website for uptime every minute. The Lighthouse check runs daily. Using our new feature, you can now, for instance, choose that the uptime check should run every 2 minutes, and the Lighthouse check every 5 days. You can choose the frequency at the settings of the check.

Read Post

Oh Dear

Read more about You can now choose the frequency of checks

Trace Operators: Demo

Sep 12, 2025 By SigNoz - Open Source Observability Platform In SigNoz

Map service dependencies and validate architectural patterns without manually analyzing trace flows. Trace Operators let you query relationships between services within distributed traces using simple, intuitive syntax.

View Video

SigNoz

Read more about Trace Operators: Demo

Query Builder v5 | SigNoz Launch Week 5.0 | Day 2

Sep 12, 2025 By SigNoz - Open Source Observability Platform In SigNoz

Query Builder v5 brings familiar SQL-like syntax to observability data with expression-based querying that works across logs, metrics, and traces. Write complex queries using the syntax you already know.

View Video

SigNoz

Read more about Query Builder v5 | SigNoz Launch Week 5.0 | Day 2

13 Proven Node.js Monitoring Best Practices You Need

Sep 12, 2025 By Mohana Ayeswariya J In Atatus

What if your Node.js application suddenly froze during peak hours? Imagine thousands of users trying to log in, make payments, or send messages; instead, they’re stuck waiting. Every second feels like a countdown to frustration, churn, and bad reviews. The truth is, Node.js is powerful but unforgiving. It runs on a single-threaded event loop, meaning just one poorly optimized task or slow dependency can bottleneck your entire app. When performance slips, it affects every customer simultaneously.

Read Post

Atatus

Read more about 13 Proven Node.js Monitoring Best Practices You Need

Trace Operators | SigNoz Launch Week 5.0 | Day 4

Sep 12, 2025 By SigNoz - Open Source Observability Platform In SigNoz

View Video

SigNoz

Read more about Trace Operators | SigNoz Launch Week 5.0 | Day 4

OSS Improvements | SigNoz Launch Week 5.0 | Day 3

Sep 12, 2025 By SigNoz - Open Source Observability Platform In SigNoz

Self-hosting SigNoz just got significantly easier with community-focused improvements that remove deployment friction and give you more flexibility in how you run your observability stack.

View Video

SigNoz

Read more about OSS Improvements | SigNoz Launch Week 5.0 | Day 3

Cost Meter: Demo

Sep 12, 2025 By SigNoz - Open Source Observability Platform In SigNoz

View Video

SigNoz

Read more about Cost Meter: Demo

What's Really Happening in Your Branch Office Network?

Sep 12, 2025 By Yann Guernion In Broadcom

The great return to the office is in full swing, but the office doesn't look like it used to. Today's enterprise is a fluid entity, with employees collaborating across home offices, corporate headquarters, and geographically dispersed branch locations. This has elevated the branch office from a simple satellite to a critical hub of productivity and innovation.

Read Post

Broadcom

Read more about What's Really Happening in Your Branch Office Network?

Website Monitoring by Error Type: DNS, TCP, TLS, and HTTP

Sep 12, 2025 By Dotcom-Monitor In Dotcom-Monitor

When a website goes down, the failure often feels like a black box. Visitors see a spinning wheel, a cryptic error code, or a blank page. For the people responsible for keeping that site online, the first question is always the same: what broke? The truth is that there is no single way a website “goes down.” Instead, a request from a browser passes through multiple steps—DNS resolution, TCP connection, TLS negotiation, and HTTP response. Each step depends on the ones before it.

Read Post

Dotcom-Monitor

Read more about Website Monitoring by Error Type: DNS, TCP, TLS, and HTTP

How External Dependencies Affect SLAs: Managing Third-Party Risk

Sep 12, 2025 By Nuno Tomas In isDown

Modern applications rely heavily on external services to function properly. From payment processors to CDN providers, these external dependencies can significantly impact your ability to meet Service Level Agreements. Understanding how external dependencies affect SLAs is crucial for maintaining reliable services and managing customer expectations.

Read Post

isDown

Read more about How External Dependencies Affect SLAs: Managing Third-Party Risk

What is Service Catalog Observability and How Does It Work?

Sep 12, 2025 By Faiz Shaikh In Last9

A service catalog gives teams a shared view of their systems—what services exist, who owns them, how dependencies are structured, and the SLAs that guide expectations. It’s an important part of development infrastructure because it helps everyone speak the same language about services. Service catalog observability builds on that foundation.

Read Post

Last9

Read more about What is Service Catalog Observability and How Does It Work?

Why Open Source Is Important: Accesibility to Innovation for Everyone

Sep 12, 2025 By Coroot In Coroot

Valkey OSS Developer Advocate Roberto Luna Rojas shares why open source matters to him - and the world.

View Video

Coroot

Read more about Why Open Source Is Important: Accesibility to Innovation for Everyone

Introducing Cost Meter - Proactive Observability Cost Control with Per-Hour Granularity

Sep 12, 2025 By Anushka Karmakar In SigNoz

The irony isn't lost on us - observability platforms are built to be proactive about system health, yet when it comes to managing observability costs themselves, teams are forced to be reactive. Today, that changes with Cost Meter, now live in our platform. Cost Meter transforms observability spend management from a monthly billing surprise into a proactive, data-driven process with hourly aggregated metrics that give you complete visibility into your telemetry ingestion patterns.

Read Post

SigNoz

Read more about Introducing Cost Meter - Proactive Observability Cost Control with Per-Hour Granularity

Trace Operators - Define Relationships Between Spans in a Trace

Sep 12, 2025 By Anushka Karmakar In SigNoz

If you're thinking something and you can think it in a generic English sentence, you can write that query and execute it with trace operators. That's basically it.

Read Post

SigNoz

Read more about Trace Operators - Define Relationships Between Spans in a Trace

Microservices Failures and Cascading Outages: Prevention Guide

Sep 12, 2025 By Nuno Tomas In isDown

Microservices architecture offers tremendous benefits for scalability and flexibility, but it also introduces new failure modes that can quickly spiral out of control. When one service fails in a distributed system, the impact can cascade across services like dominoes falling, creating widespread outages that affect your entire application. Understanding how these cascading failures occur and implementing the right defensive patterns is crucial for building resilient microservices.

Read Post

isDown

Read more about Microservices Failures and Cascading Outages: Prevention Guide

How to Ensure Regulation Compliance as a Government Contractor

Sep 12, 2025 By OpsMatters In OpsMatters

The government contracting sector is a highly regulated business environment. Entering this sector requires transparency, accountability and expertise. You must also familiarize yourself with regulatory bodies and their standards to boost your reputation in the eyes of federal agencies. Discover how you can ensure compliance with regulations as a new government contractor.

Read Post

OpsMatters

Read more about How to Ensure Regulation Compliance as a Government Contractor

Early Warning Signals now in Webex

Sep 11, 2025 By Valeria Kurolapova In StatusGator

We’re happy to announce that Early Warning Signals are now available in Webex! With Webex now supported, Early Warning Signals are available across all chat integrations—including Microsoft Teams, Slack, Google Chat, Discord, Webhooks and now Webex—plus email and SMS. No matter where your team communicates, you’ll never miss the early signs of an outage.

Read Post

StatusGator

Read more about Early Warning Signals now in Webex

Debug, query, and build faster with AI: How we use Grafana Assistant at Grafana Labs

Sep 11, 2025 By Ivana Huckova In Grafana

We recently released Grafana Assistant into public preview for Grafana Cloud, and we’ve been excited to see how our customers have already made it part of their daily observability routines. At the same time, Assistant is becoming a go-to companion for developers right here at Grafana Labs, whether they’re debugging on-call issues, helping customers, or trying to remember tricky PromQL syntax.

Read Post

Grafana

Read more about Debug, query, and build faster with AI: How we use Grafana Assistant at Grafana Labs

APM vs Observability: Observing beyond APM

Sep 11, 2025 By Leon Adato In Catchpoint

In my previous post I made a bold, sweeping statement that APM is not - in the most specific sense - a subset of observability. Still standing by it I stand by that because words matter and - like many "monitoring engineers" (IT folks who make monitoring and observability their specialty) - I, too, bear scars from the flame-wars on Twitter back in the 2020's where we fought internecine battles over the proper definition of (and number of pillars in) “observability”.

Read Post

Catchpoint

Read more about APM vs Observability: Observing beyond APM

The Cost of Ignoring Expired SSL Certificates for Businesses

Sep 11, 2025 By Simon Rodgers In WebSitePulse

SSL certificates secure the digital backbone of businesses. They encrypt data, protect customer trust, and ensure compliance with strict regulations. Yet many companies still face the cost of ignoring expired SSL certificates every year. When a certificate expires, the consequences hit hard: websites go offline, users see security warnings, and revenues drop. Let's break down the risks, costs, and ways to prevent expired SSL certificates from damaging your business.

Read Post

WebSitePulse

Read more about The Cost of Ignoring Expired SSL Certificates for Businesses

Introducing Honeycomb Intelligence Canvas

Sep 11, 2025 By Honeycomb In Honeycomb

Canvas is an AI-guided workspace inside Honeycomb that combines an AI assistant with an interactive notebook for visualizing query results and traces. You can ask a natural language question about your data and Canvas will immediately start exploring your traces, through multiple queries and other tools, to find the right next steps. Instead of having to write each query yourself, Canvas automatically proposes relational queries, comparisons, and visualizations that explain why an SLO fired or what changed after a deploy.

View Video

Honeycomb

Read more about Introducing Honeycomb Intelligence Canvas

Have you read the latest SolarWinds IT Trends report?

Sep 11, 2025 By solarwindsinc In SolarWinds

Hear what SolarWinds Product Evangelist Sascha Giese has to say about the findings. Plus, learn how your peers are managing AI adoption and other IT challenges.

View Video

SolarWinds

Read more about Have you read the latest SolarWinds IT Trends report?

Detect Cloud Cost Anomalies with Datadog

Sep 11, 2025 By Datadog In Datadog

Unexpected cloud cost spikes can derail your budget. Datadog’s Cloud Cost Anomalies automatically surfaces unusual spending — whether it’s from a team, account, service, cluster, or region. Quickly identify the source, pivot into Cost Explorer, and take action before costs spiral out of control.

View Video

Datadog

Read more about Detect Cloud Cost Anomalies with Datadog

Driving Customer Success Beyond Deployment

Sep 11, 2025 By ScienceLogic In ScienceLogic

In the rapidly evolving landscape of IT operations, businesses face the constant challenge of staying ahead of emerging technologies and shifting market demands. Implementing new systems or solutions is not just about the initial setup. It is about ensuring long-term success, reducing risk, and unlocking sustained value for the organization. That is exactly what SL360, our comprehensive customer success framework, was designed to deliver.

Read Post

ScienceLogic

Read more about Driving Customer Success Beyond Deployment

Detect Email Delays Before They Hit Users - Monitor O365 with eG Enterprise

Sep 11, 2025 By Srividhya Seshachalam In eG Innovations

Email downtime or email delays can significantly disrupt business operations, making proactive monitoring essential to avoid problems. In today’s hybrid work environments, email remains a critical communication channel for customer interactions, internal collaboration, and workflow approvals. Even brief outages or delays in email delivery can lead to missed opportunities, poor customer experience, SLA (Service Level Agreement) breaches and reputational damage.

Read Post

eG Innovations

Read more about Detect Email Delays Before They Hit Users - Monitor O365 with eG Enterprise

Understanding OpenTelemetry Spans in Detail

Sep 11, 2025 By Favour Daniel In SigNoz

Debugging errors in distributed systems can be a challenging task, as it involves tracing the flow of operations across numerous microservices. This complexity often leads to difficulties in pinpointing the root cause of performance issues or errors. OpenTelemetry provides instrumentation libraries in most programming languages for tracing.

Read Post

SigNoz

Read more about Understanding OpenTelemetry Spans in Detail

Pastries with SREs: Limitless observability and uncompromised donuts

Sep 11, 2025 By Elastic In Elastic

In this episode of Pastries with SREs, we dig into Limitless Observability with a sweet side of unified observability strategy. If you're tired of siloed tools, fractured data, and swivel-chair investigations, this one’s for you. We explore: Why are silos still the norm in modern observability? What’s the true cost of inefficiencies across logs, metrics, and traces? How can SREs, IT operations, and dev teams shift to a no-compromise, unified observability model?

View Video

Elastic

Read more about Pastries with SREs: Limitless observability and uncompromised donuts

How we used Sentry's User Feedback widget to shape Logs throughout beta

Sep 11, 2025 By Jasmin Kassas In Sentry

At Sentry, we build in public and we move fast. But moving fast means we don’t always get everything right on the first try. That’s where feedback comes in: it helps us validate what’s working, spot what’s missing, and catch issues we wouldn’t always see through error tracking alone.

Read Post

Sentry

Read more about How we used Sentry's User Feedback widget to shape Logs throughout beta

Debugging issues with Sentry's MCP

Sep 11, 2025 By Sentry In Sentry

Turns out, this MCP thing is pretty solid. We've built the MCP server to tap into all the different areas of context within Sentry and make it easy to bring these into your editor client to help debug your application. Want to know the most fixable issues in your environment? Easy. Want to see your query performance for your backend? Just ask it.

View Video

Sentry

Read more about Debugging issues with Sentry's MCP

DevOps Guide to Monitoring in Serverless Applications

Sep 11, 2025 By Pavithra Parthiban In Atatus

Serverless computing helps teams move faster by removing the need to manage servers. Code runs only when needed, scaling up or down automatically. For DevOps engineers, this means quicker deployments and less infrastructure work. But serverless also brings new challenges. Functions run for short periods, making it hard to track errors, performance, and costs.

Read Post

Atatus

Read more about DevOps Guide to Monitoring in Serverless Applications

Meet Canvas: Your AI-guided Workspace Within Honeycomb

Sep 11, 2025 By Morgante Pell In Honeycomb

Modern systems are wonderfully capable, but relentlessly complex. Debugging across microservices, frontends, and cloud edges often means switching between five or more tools, trying to stitch together “what changed” and “why it broke.” Honeycomb’s wide events model has proven to be a superpower for taming that complexity, by allowing you to easily observe and query end-to-end traces without worrying about how much granular data you attach to your events.

Read Post

Honeycomb

Read more about Meet Canvas: Your AI-guided Workspace Within Honeycomb

Breaking Free from SQLite - Why We Added PostgreSQL Support to SigNoz

Sep 11, 2025 By Anushka Karmakar In SigNoz

"Let us support different relational databases apart from SQLite. Nobody likes to run SQLite in production." This was one of the most requested features from our community. Your requests have been heard, and we've added support for different relational databases, starting with PostgreSQL. If you're self-hosting SigNoz, you no longer need to worry about SQLite's limitations. Let's dive into what we've built and why it matters for your production deployments.

Read Post

SigNoz

Read more about Breaking Free from SQLite - Why We Added PostgreSQL Support to SigNoz

Behind the Dashboard: How to monitor your LLM integrations

Sep 11, 2025 By Catchpoint In Catchpoint

Behind the Dashboard is an ongoing series where we look under the hood of a specific Catchpoint feature. Each episode breaks down the technology itself, what’s challenging about using it for monitoring, and how we removed friction and toil to make it a valuable part of the Catchpoint platform. In this episode Leon, Mursi, and Rahul take a look at Catchpoint’s LLM monitoring capabilities, including ensuring your integrated LLMs are up and performing optimally; as well as knowing if you’re using the most effective (accurate) and economical (cheapest per query) option in your suite.

View Video

Catchpoint

Read more about Behind the Dashboard: How to monitor your LLM integrations

How AISPM Helps Achieve Continuous Cybersecurity Monitoring

Sep 11, 2025 By OpsMatters In OpsMatters

Cybersecurity threats evolve at breakneck speed. What worked yesterday might fail tomorrow. Organizations need monitoring systems that never sleep, never blink, and never miss a beat. This is where AI-powered Security Performance Management (AISPM) transforms how we protect digital assets.

Read Post

OpsMatters

Read more about How AISPM Helps Achieve Continuous Cybersecurity Monitoring

Reducing Compliance Gaps with Continuous Monitoring Solutions

Sep 11, 2025 By OpsMatters In OpsMatters

Organizations face an increasingly complex web of regulatory requirements that demand strict adherence to security protocols. Among these challenges, maintaining firewall compliance stands as a critical yet often overlooked aspect of cybersecurity strategy. Many companies struggle with compliance gaps that leave them vulnerable to breaches, regulatory penalties, and operational disruptions.

Read Post

OpsMatters

Read more about Reducing Compliance Gaps with Continuous Monitoring Solutions

Sentry Hacktime with Bete and Cody

Sep 10, 2025 By Sentry In Sentry

Today we will be implementing application Logs live.

View Video

Sentry

Read more about Sentry Hacktime with Bete and Cody

APM for Kubernetes: Monitor Distributed Applications at Scale

Sep 10, 2025 By Anjali Udasi In Last9

When a payment service runs across 12 pods — each serving different customer segments — and an authentication layer spans three namespaces, performance issues can originate in both the application code and the orchestration layer. The challenge is linking request-level performance data with what’s happening inside the cluster: container CPU limits, pod scheduling decisions, and node-level events.

Read Post

Last9

Read more about APM for Kubernetes: Monitor Distributed Applications at Scale

Monitor Cloud-Native & Hybrid Apps and Business Transactions With Observability Cloud APM

Sep 10, 2025 By Wei Li In Splunk

As organizations modernize, most applications don’t fit neatly into one category—they span both traditional three-tier architectures and cloud-native microservices. To monitor these hybrid environments effectively, teams need APM tools that can seamlessly connect the two worlds.

Read Post

Splunk

Read more about Monitor Cloud-Native & Hybrid Apps and Business Transactions With Observability Cloud APM

Debugging and logging in Laravel applications

Sep 10, 2025 By Kyle Tryon In Sentry

Logic errors, failed HTTP requests, background jobs that ghost silently—software breaks in all kinds of fun ways. The difference between resilient systems and fragile ones isn’t about avoiding errors altogether. It’s about how fast and clearly you can see what went wrong, and fix it. Laravel gives you a solid foundation: structured logging, real-time introspection, and built-in performance monitoring.

Read Post

Sentry

Read more about Debugging and logging in Laravel applications

Introducing Anomaly Detection: Your Early Warning System for Service Health

Sep 10, 2025 By Matt Ransford In Honeycomb

Modern engineering teams face a persistent challenge: knowing when something goes wrong before their customers do. With microservices architectures sprawling across dozens or hundreds of services, creating comprehensive alerting becomes an overwhelming task. You're left playing whack-a-mole with manual alert configurations, often missing critical issues or drowning in false positives.

Read Post

Honeycomb

Read more about Introducing Anomaly Detection: Your Early Warning System for Service Health

It broke... lets fix it with Sentry MCP and Seer

Sep 10, 2025 By Sentry In Sentry

Real debugging starts in the editor where you're probably digging through the last commits wondering what random thing changed. Fortunately, you're probably using Sentry and it's going to give you that information. Sentry's MCP is the best way to bring all that context of what broke and how, into your editor so you can fix broken things faster. With Seer, you can bring in the root cause, and solution, and have tools like Cursor or Claude Code go fix it. We'll show you how.

View Video

Sentry

Read more about It broke... lets fix it with Sentry MCP and Seer

Full-Stack Observability with VictoriaMetrics in the OTel Demo

Sep 10, 2025 By Diana Todea In VictoriaMetrics

The OpenTelemetry Astronomy Shop is a widely used demonstration environment designed to illustrate the concepts and practical implementation of observability in distributed systems. Built as a microservice-based e-commerce application, the demo provides developers with a near real-world environment where they can explore how telemetry data—metrics, logs, and traces—can be collected, processed, and visualized.

Read Post

VictoriaMetrics

Read more about Full-Stack Observability with VictoriaMetrics in the OTel Demo

The Real ROI of Using an APM Tool for SaaS Businesses

Sep 10, 2025 By Mohana Ayeswariya J In Atatus

For every SaaS leader, engineer, and operations professional, growth is always the main goal. You’re expected to release features quickly, keep user experiences smooth, and manage everything within a limited budget. But behind the scenes, your application may have hidden issues such as slow performance, unnoticed errors, and laggy transactions that quietly eat away at revenue, reduce customer trust, and exhaust your engineering team.

Read Post

Atatus

Read more about The Real ROI of Using an APM Tool for SaaS Businesses

Serverless Monitoring for Modern Industries: Compliance, Scalability, and User Experience

Sep 10, 2025 By Pavithra Parthiban In Atatus

Serverless computing has changed the way developers build and scale applications. With event-driven execution, automatic scaling, and a pay-as-you-go model, it removes the need to manage servers and helps teams move faster. This is why industries like FinTech, e-commerce, and media streaming are adopting serverless at a rapid pace. But serverless also brings new monitoring challenges. Functions are short-lived, run in different places, and are triggered by many types of events.

Read Post

Atatus

Read more about Serverless Monitoring for Modern Industries: Compliance, Scalability, and User Experience

Upgrade your monitoring lists with icon images

Sep 10, 2025 By Florian Strohmaier In Icinga

Recently I was importing an Icinga configuration for testing purposes. Working with this configuration, I found that there were icon images assigned to the objects. Sadly, those didn’t display, because I didn’t have the icon set installed. So I thought of creating my own.

Read Post

Icinga

Read more about Upgrade your monitoring lists with icon images

AI Wrote Your Bugs, AI Will Fix Your Bugs

Sep 10, 2025 By Todd H. Gardner In TrackJS

There’s a lot of JavaScript developers these days not actually writing code. They whisper sweet prompts to our AI tools and hope for the best. Is it really any worse than copy-pasting from StackOverflow? Welcome to the era of vibe coding, where understanding your code is optional and “it works on my machine” has evolved into “the AI said it would work.”

Read Post

TrackJS

Read more about AI Wrote Your Bugs, AI Will Fix Your Bugs

AppSignal Launch Week Recap

Sep 10, 2025 By Connor James In AppSignal

Last week, we were busy at AppSignal with our first-ever Launch Week and Rails World 2025! In case you missed it or spent the time snacking on our stroopwafels in Amsterdam, here's a quick recap of what we launched, and what that means for your monitoring setup!

Read Post

AppSignal

Read more about AppSignal Launch Week Recap

Prevention Cure: The Happiness Factor in IT Health (w/ EDGE Solutions)

Sep 10, 2025 By Nexthink In Nexthink

In this episode Tim and Tom sit down with Sean Thomas, Managing Director of EUEM Business at Edge Consulting, to explore how happiness, health, and IT success are all connected. They dive into the parallels between personal health and IT health, the evolution of proactive and preventative IT, and how organizations can better operationalize digital employee experience (DEX). From vendor partnerships and process investments to the role of AI in shaping the next generation of IT management, Sean offers valuable insights for leaders navigating today’s complex digital workplace.

View Video

Nexthink

Read more about Prevention Cure: The Happiness Factor in IT Health (w/ EDGE Solutions)

Introducing Honeycomb Intelligence Anomaly Detection

Sep 10, 2025 By Honeycomb In Honeycomb

Modern teams face a persistent challenge: knowing when something goes wrong before their customers do. With architectures sprawling across dozens or hundreds of services, creating comprehensive alerting becomes an overwhelming task. You're left playing whack-a-mole with manual alert configurations, often missing critical issues or drowning in false positives. Today, we're excited to announce our solution to this challenge: Anomaly Detection (currently in alpha), Honeycomb's proactive approach to understanding and acting on service health.

View Video

Honeycomb

Read more about Introducing Honeycomb Intelligence Anomaly Detection

Data Historians vs. Time Series Databases: A Practical Path Forward

Sep 10, 2025 By Allyson Boate In InfluxData

Industrial data strategy often feels like a choice: keep legacy systems or replace them outright. But neither extreme is ideal. Full replacements are disruptive and costly, while avoiding change leaves businesses stuck with tools that limit growth. The better path is incremental. Each organization has different needs, and modernization works best when you build on proven systems while adding new capabilities.

Read Post

InfluxData

Read more about Data Historians vs. Time Series Databases: A Practical Path Forward

Instrumentation Your Way: Introducing a Combined Splunk AppDynamics Agent

Sep 10, 2025 By Courtney Gannon In Splunk

In 2025, microservices are everywhere and Kubernetes is the de facto standard for operating cloud native applications. But not all apps are built in microservices architectures. For most enterprises, hybrid environments are the reality, with their business run on a mix of three-tier and cloud native applications.

Read Post

Splunk

Read more about Instrumentation Your Way: Introducing a Combined Splunk AppDynamics Agent

Query Builder v5 - Two Years of Technical Debt, 80 Closed Issues, and a Fundamental Rethinking

Sep 10, 2025 By Anushka Karmakar In SigNoz

In 2022, we had three different query interfaces. Logs had a custom search syntax with no autocomplete. Traces only had predefined filters - no query builder at all. Metrics had a raw PromQL input box where you'd paste queries from somewhere else and hope they worked. Each system spoke a different language. An engineer debugging a production issue had to context-switch not just between data types, but between entirely different mental models of how to query data.

Read Post

SigNoz

Read more about Query Builder v5 - Two Years of Technical Debt, 80 Closed Issues, and a Fundamental Rethinking

What is the Internet Stack... and why should you care?

Sep 10, 2025 By Catchpoint In Catchpoint

We talk a lot about the application stack, the code and services you build. However, just as critical is the infrastructure that delivers that code to your users. That’s the Internet Stack: a complex chain of technologies and services, from DNS and BGP to CDNs and ISPs, that every digital experience depends on. It’s separate from your application stack. It’s different for every user, in every geography. And most importantly, it still impacts your users—even if you don’t directly own it.

View Video

Catchpoint

Read more about What is the Internet Stack... and why should you care?

Monitor Windows Certificate Store with Datadog

Sep 10, 2025 By Shanel Huang In Datadog

The Windows Certificate Store is a critical component of any modern Windows environment. Certificates enable TLS encryption for Internet Information Services (IIS)-hosted applications, support certificate-based authentication in Active Directory, and help validate the identity of trusted Windows services. But if a certificate in your store expires, is revoked, or is part of a broken certificate chain, you risk instability and security gaps in your Windows environment.

Read Post

Datadog

Read more about Monitor Windows Certificate Store with Datadog

Visually identify observability gaps with Cloudcraft in Datadog

Sep 10, 2025 By Jace Harker In Datadog

Modern cloud environments are highly complex and dynamic, with critical services relying on large numbers of ephemeral resources. Ensuring observability coverage across this landscape is essential for troubleshooting, maintaining reliability, optimizing performance, and enforcing security standards. But as environments grow more elaborate and their ownership more dispersed, tracking observability coverage becomes increasingly challenging.

Read Post

Datadog

Read more about Visually identify observability gaps with Cloudcraft in Datadog

Logs vs. Metrics: Why You Need Both for Observability

Sep 10, 2025 By Patrick Sites In LogicMonitor

Picture this: Your dashboards are calm. CPU load is steady. Error rates are low. Everything looks fine. That is, until the alarms go off. Now what? Metrics tell you something’s wrong, but not what, where, or why. They reveal symptoms, not root causes, and in high-stakes environments, that’s only half the story. Say your API response times spike. Metrics raise the flag, but they don’t tell you if it’s a code deployment, a database hang, or a traffic surge.

Read Post

LogicMonitor

Read more about Logs vs. Metrics: Why You Need Both for Observability

A smarter filter for Grafana Alerting: Introducing a new way to find your alerts

Sep 10, 2025 By Lauren Armstrong In Grafana

At Grafana Labs, we believe that effective alerting is the cornerstone of any robust observability strategy. That’s why we’re constantly listening to your feedback and working to improve the Grafana user experience so it’s easier for you to manage and interact with your alert rules. Today, we’ve excited to tell you about an update in Grafana Alerting that’s built to address some of your biggest pain points.

Read Post

Grafana

Read more about A smarter filter for Grafana Alerting: Introducing a new way to find your alerts

Visualize Logs Alongside Metrics: Complete Observability Elasticsearch Performance

Sep 10, 2025 By Benjamin Pitts In MetricFire

Elasticsearch is a distributed search and analytics engine that powers everything from log management platforms to e-commerce search bars. It excels at indexing and retrieving large volumes of data quickly, but like any complex system it can slow down under heavy load or inefficient queries.

Read Post

MetricFire

Read more about Visualize Logs Alongside Metrics: Complete Observability Elasticsearch Performance

Why Real-Time Network Monitoring Is Critical for Modern Business Resilience

Sep 10, 2025 By OpsMatters In OpsMatters

Business operations are more interconnected and technology-driven than ever before. Networks form the backbone of communication, data exchange, and service delivery. A single failure can disrupt productivity, weaken customer trust, and even result in financial losses.

Read Post

OpsMatters

Read more about Why Real-Time Network Monitoring Is Critical for Modern Business Resilience

MetrixInsight Alerting Beyond Citrix Director/Monitor

Sep 9, 2025 By GripMatix In GripMatix

MetrixInsight for Citrix VAD/DaaS surpasses Citrix Director/Monitor in many ways. Director/Monitor is useful for day-to-day visibility, but it is not a true enterprise monitoring platform. MetrixInsight adds much more, also when it comes to alerting and enterprise monitoring. It closes important gaps that can directly impact user experience when capacity or performance issues slip through unnoticed. One clear example is how operational VDA capacity is handled.

Read Post

GripMatix

Read more about MetrixInsight Alerting Beyond Citrix Director/Monitor

Custom OpenTelemetry Collectors: Build, Run, and Manage at Scale

Sep 9, 2025 By Adnan Rahic In ObservIQ

I tried thinking back to when the last time I read an actual tutorial that did not include a bunch of em (—) dashes, semicolons, normal dashes, and an unnervingly large quantity of the phrases like “XYZ-thing Alert ” and “Exciting News!”. Well, hold on to your suspenders folks, here we go again. Part 2 is up and it’s a controversial one.

Read Post

ObservIQ

Read more about Custom OpenTelemetry Collectors: Build, Run, and Manage at Scale

Creating Calculated Fields with Honeycomb AI

Sep 9, 2025 By Honeycomb In Honeycomb

Did you know you can define a calculated field in your Honeycomb queries? You can, and with the power of Honeycomb AI you can ask it to write the calculated field definition for you. Find out how in this short video.

View Video

Honeycomb

Read more about Creating Calculated Fields with Honeycomb AI

Introducing Honeycomb Intelligence MCP Server - Now GA!

Sep 9, 2025 By Honeycomb In Honeycomb

In the months since we launched our public beta, we’ve been hard at work making Honeycomb MCP more useful and capable for agents and human operators alike. Our goal with this project has been, from the start, to allow AI to engage in the same kind of investigatory loops that we guide users towards. Many of the new features are designed expressly with this in mind, the most exciting of which is BubbleUp, now available in.

View Video

Honeycomb

Read more about Introducing Honeycomb Intelligence MCP Server - Now GA!

Observability and Monitoring Governance (Part 2 of 4)

Sep 9, 2025 By Steve Danseglio In Broadcom

“How did we fail to monitor xyz prior to this incident?" “We should monitor everything" “Are we vetting applications prior to deployment, including security apps that may adversely affect application performance and responsiveness?”

Read Post

Broadcom

Read more about Observability and Monitoring Governance (Part 2 of 4)

DX UIM Hub Interconnectivity and the Benefits of Static Hubs

Sep 9, 2025 By Scott Pieter In Broadcom

In DX Unified Infrastructure Management (DX UIM), there are multiple elements that need to work in harmony to achieve a high level of observability. Understanding the architecture of DX UIM can help you make configuration decisions that minimize resource consumption, without sacrificing the volume and granularity of observability data collected. In addition, using static hubs is a simple and particularly powerful option for specific situations.

Read Post

Broadcom

Read more about DX UIM Hub Interconnectivity and the Benefits of Static Hubs

4 Ways AppNeta Enhances Cost-Focused Cloud Planning

Sep 9, 2025 By Nestor Falcon Gonzalez In Broadcom

Enterprises are hemorrhaging their cloud budget: respondents in 49% of organizations estimate their cloud spending is wasted due to unchecked provisioning and lack of predictive cost governance. This cost inefficiency stems not just from financial blind spots, but also from operational gaps: poor visibility into network reliability and user experience. Real-time, end-to-end visibility is the foundation of cloud optimization.

Read Post

Broadcom

Read more about 4 Ways AppNeta Enhances Cost-Focused Cloud Planning

Introducing Event iQ: Smarter Event Correlation in Splunk IT Service Intelligence (ITSI)

Sep 9, 2025 By Annette Sheppard In Splunk

Every day, IT teams are flooded with alerts—thousands of messages about performance issues, service outages, or suspicious activity. With so many notifications, it’s easy to get overwhelmed, miss critical problems, or waste time chasing false alarms. Correlating related alerts into groups can help reduce the noise and make sense of everything, but setting up those correlations takes time, experience, and a lot of both system and historic knowledge.

Read Post

Splunk

Read more about Introducing Event iQ: Smarter Event Correlation in Splunk IT Service Intelligence (ITSI)

Overview of Contacts, Integrations, and Alert Escalations

Sep 9, 2025 By Uptime Website Monitoring In uptime

Learn how Uptime.com supports multi-channel alerting and escalations so you can get the alerts you need, where you need, when you need.

View Video

uptime

Monitoring

Read more about Overview of Contacts, Integrations, and Alert Escalations

Observability and Monitoring Governance (Part 1 of 4)

Sep 9, 2025 By Steve Danseglio In Broadcom

In contrast to the many flavors of governance used for IT, such as data governance, audit and compliance, and governance and security, IT monitoring governance lacks a definition in many organizations. This is true even as teams have decades of experience monitoring the health, performance, and availability of applications, infrastructures, networks, and user experience. Good monitoring governance “just sort of happens—naturally, organically.” Not exactly!

Read Post

Broadcom

Read more about Observability and Monitoring Governance (Part 1 of 4)

Subsea Cables Parted in Red Sea Again

Sep 9, 2025 By Doug Madory In Kentik

This past weekend saw the latest round of submarine cable cuts to impact internet connectivity between Europe and Asia. And once again they took place in the Red Sea, an historic problem area for subsea cables. In this post, I review some of the impacts that we observed in both the loss of transit in affected countries as well as increased latencies between public cloud regions using Kentik’s Cloud Latency Map.

Read Post

Kentik

Read more about Subsea Cables Parted in Red Sea Again

Measuring service response time and latency: How to perform a TCP check in Grafana Cloud Synthetic Monitoring

Sep 9, 2025 By Bukola Ayodele In Grafana

When your database stops accepting connections or your mail server becomes unreachable during business hours, the impact is immediate and costly. Fortunately, the right monitoring strategy can help you detect these TCP connection failures early on, and prevent them from impacting the user experience.

Read Post

Grafana

Read more about Measuring service response time and latency: How to perform a TCP check in Grafana Cloud Synthetic Monitoring

Interactive Dashboards - Click Any Panel to Start Debugging

Sep 9, 2025 By Anushka Karmakar In SigNoz

Your dashboard shows a latency spike. To investigate it, you copy the query, open logs in a new tab, paste and modify the query, lose your dashboard filters, and repeat for traces. By the time you find the issue, you have 15 tabs open. Starting today, you can click any panel and investigate right there. All your filters and variables carry over. No more tab juggling.

Read Post

SigNoz

Read more about Interactive Dashboards - Click Any Panel to Start Debugging

How Time Works in Elasticsearch (Explained in 90 Seconds)

Sep 9, 2025 By Elastic In Elastic

Time in Elasticsearch is more than a timestamp; it's a powerful lens. In this 90-second guide, see how Elasticsearch lets you bucket logs by minute, hour, day, or even millisecond. You can zoom out to see patterns or zoom in to investigate exact spikes, all without recalculating.

View Video

Elastic

Read more about How Time Works in Elasticsearch (Explained in 90 Seconds)

Monitor the Health, Performance, and Security of Your AI Application Stack with AI Agent and AI Infrastructure Monitoring

Sep 9, 2025 By Ashish Mehta In Splunk

At this year’s.conf25, we introduced an exciting new chapter in observability at Splunk — one that is unified, AI-powered, and agentic — to ensure ITOps and engineering teams are digitally resilient in the AI era.

Read Post

Splunk

Read more about Monitor the Health, Performance, and Security of Your AI Application Stack with AI Agent and AI Infrastructure Monitoring

Why it's time to move beyond APM: Monitoring from the user's perspective

Sep 9, 2025 By Howard Beader In Catchpoint

For years, organizations have relied on Application Performance Monitoring (APM) as the backbone of their observability strategy. The idea was simple: collect as many logs, metrics, and traces as possible, then sift through the data to uncover insights. But as applications have shifted to the cloud and become increasingly API-driven, that model has broken down.

Read Post

Catchpoint

Read more about Why it's time to move beyond APM: Monitoring from the user's perspective

The Answer to SRE Agent Failures: Context Engineering

Sep 9, 2025 By Mezmo In Mezmo

AI agents for SREs were supposed to slash mean time to resolution and eliminate alert fatigue. Instead, most teams got expensive, unreliable tools that burn through tokens without delivering insights. But what if the problem isn't the AI models themselves? Recent benchmarking reveals the real bottleneck: context engineering. When we tested our context engineering approach against conventional methods, the results were dramatic: Scroll down for our benchmark results to see the full comparison.

Read Post

Mezmo

Read more about The Answer to SRE Agent Failures: Context Engineering

Honeycomb MCP Is Now In GA With Support for BubbleUp, Heatmaps, and Histograms

Sep 9, 2025 By Austin Parker In Honeycomb

If you’ve been following my public journey with LLMs this year, it probably won’t surprise you to learn that this blog post is an announcement about the general availability of Honeycomb’s hosted MCP server. I want to share a few updates about what’s new in the GA release, discuss some interesting learnings from building it, and share examples of how we’re using MCP internally. First: if you're still in the dark about MCP and AI agents, go read the earlier blogs I linked.

Read Post

Honeycomb

Read more about Honeycomb MCP Is Now In GA With Support for BubbleUp, Heatmaps, and Histograms

Early Warning Signals now in Discord

Sep 8, 2025 By Valeria Kurolapova In StatusGator

We’re rolling out Early Warning Signals to yet another place your team works every day: Discord. With this release, nearly all of our chat integrations now deliver Early Warning Signals—bringing you proactive outage alerts no matter where your team collaborates. Already available by email, SMS, Slack, Microsoft Teams, Google Chat, and webhooks, Early Warning Signals are now live in Discord too—closing the gap and making sure your team is covered wherever you communicate.

Read Post

StatusGator

Read more about Early Warning Signals now in Discord

How to Use AI for Operational Excellence

Sep 8, 2025 By Nexthink In Nexthink

Organizations are under immense pressure to do more with less – streamline operations, reduce costs, all whilst improving both the outcomes of the business and their employees. For IT and end-user computing (EUC) professionals, this challenge is especially prevalent. Systems are becoming increasingly complex, the digital employee experience is now directly tied to customer satisfaction, and the role of technology teams extends much further than solely keeping the lights on.

Read Post

Nexthink

Read more about How to Use AI for Operational Excellence

Broadcom Recognized as a Leader: Engineering the Future of Service Orchestration

Sep 8, 2025 By Kaj Wierda In Broadcom

In our digitally transforming world, the pace of change is relentless. Businesses are tasked with managing increasingly complex hybrid environments, from core mainframes to dynamic cloud services. The pressure is on, not only to keep the lights on, but to innovate faster, deliver flawless services, and fuel business growth. In this high-stakes environment, service orchestration and automation platforms are no longer just a tool—they are the central nervous system of the modern enterprise.

Read Post

Broadcom

Read more about Broadcom Recognized as a Leader: Engineering the Future of Service Orchestration

Crash reporting for gaming consoles is now Generally Available

Sep 8, 2025 By Bruno Garcia In Sentry

TL;DR: Error monitoring and crash reporting for all major gaming consoles is now generally available (plus, the v1.1 of our Unreal Engine SDK). Already convinced? Jump to the ‘What’s In The Release?’ section. Over a decade ago, a customer hacked Sentry into their PlayStation 3 games. Fast forward to today, Sentry now supports thousands of game developers across web, mobile, and desktop. The missing piece? Consoles. Developers asked for it. We built it.

Read Post

Sentry

Read more about Crash reporting for gaming consoles is now Generally Available

Serverless Monitoring Made Simple: Challenges and Solutions with Atatus

Sep 8, 2025 By Pavithra Parthiban In Atatus

Serverless computing has revolutionized the way applications are built and deployed by eliminating infrastructure management and enabling automatic scaling. However, the dynamic and distributed nature of serverless architectures presents unique monitoring challenges that can impact application performance and user experience.

Read Post

Atatus

Read more about Serverless Monitoring Made Simple: Challenges and Solutions with Atatus

How PHP Monitoring Handles Response Times?

Sep 8, 2025 By Mohana Ayeswariya J In Atatus

Every millisecond matters when users interact with your PHP application. If a page lags or a request takes too long, most people will leave without a second thought. For DevOps teams, these slowdowns are frustrating because the root cause is rarely obvious. Developers are left combing through logs and traces, often realizing too late that poor response times are already hurting user trust and business outcomes. The pain point: slow PHP response times frustrate users and create hidden costs for teams.

Read Post

Atatus

Read more about How PHP Monitoring Handles Response Times?

Monitoring Claude Code Usage with OpenTelemetry and SigNoz

Sep 8, 2025 By SigNoz - Open Source Observability Platform In SigNoz

In this video, we’ll walk you through how to monitor Claude code activity using OpenTelemetry and SigNoz. You’ll learn how to instrument your usage, capture telemetry data, and visualize it with SigNoz to get better insights into your system performance. Whether you’re exploring observability for AI workloads or looking for an open-source solution to monitor your llm activity, this guide will help you get started.

View Video

SigNoz

Read more about Monitoring Claude Code Usage with OpenTelemetry and SigNoz

Observability Journey Panel - Dell x TekStream

Sep 8, 2025 By Grafana In Grafana

Join Dell Technologies, TekStream Solutions, and Grafana Labs for a candid panel on scalining observability. Learn how enterprise teams scale observability, balance centralized vs. decentralized models, and accelerate adoption. The panel explores challenges with culture, governance, tool sprawl, and how AI is reshaping monitoring and incident response.

View Video

Grafana

Read more about Observability Journey Panel - Dell x TekStream

Interactive Dashboards | SigNoz Launch Week 5.0 | Day 1

Sep 8, 2025 By SigNoz - Open Source Observability Platform In SigNoz

Interactive Dashboards eliminate the current workflow of opening new tabs and manually recreating queries every time you need to investigate a spike or anomaly. Click directly on any data point to drill down and explore. What you can do: Built for developers who need to debug production issues efficiently, not juggle with multiple tabs.

View Video

SigNoz

Read more about Interactive Dashboards | SigNoz Launch Week 5.0 | Day 1

Interactive Dashboards: Demo

Sep 8, 2025 By SigNoz - Open Source Observability Platform In SigNoz

View Video

SigNoz

Read more about Interactive Dashboards: Demo

Managing access in Grafana: a single stack journey with teams, roles, and real-world patterns

Sep 8, 2025 By Sarah Constant In Grafana

When multiple teams use Grafana, it can start to feel a bit messy. Dashboards pile up, permissions become unclear, and teams accidentally overwrite each other’s work. To help you and your organization stay clear, collaborative, and secure, we recommend putting all users in a single Grafana Cloud stack and managing access with teams, roles, and folders. To illustrate this, I’ll share a hypothetical example of how you can put this into practice across three teams. Let’s dive in!

Read Post

Grafana

Read more about Managing access in Grafana: a single stack journey with teams, roles, and real-world patterns

A practical guide to error handling in Go

Sep 8, 2025 By Wojciech Gancarczyk In Datadog

When you first start coding in Go, you quickly learn how error handling in the language differs from error handling in languages such as Java, Python, JavaScript, or Ruby. In those languages, throwing an exception automatically generates a stack trace. Go, by contrast, provides no built-in error tracing to reveal an error’s origin.

Read Post

Datadog

Read more about A practical guide to error handling in Go

Beyond Wearables: How Remote Patient Monitoring Shapes Care Delivery

Sep 8, 2025 By OpsMatters In OpsMatters

The healthcare industry has entered an era where the boundaries between in-person visits and digital health interactions are becoming increasingly blurred. Remote Patient Monitoring (RPM) sits at the forefront of this transformation, enabling physicians to track, analyze, and respond to patient health data in real time. While wearables such as smartwatches laid the groundwork, RPM has moved beyond step counts and heart-rate alerts. It now serves as a critical pillar of value-based care, reshaping how medical professionals manage chronic conditions, enhance patient engagement, and improve clinical outcomes.

Read Post

OpsMatters

Read more about Beyond Wearables: How Remote Patient Monitoring Shapes Care Delivery

What Are Buckets in Elasticsearch? (Explained in 60 Seconds)

Sep 6, 2025 By Elastic In Elastic

Overwhelmed by raw data? In this short video, we demonstrate how Elasticsearch utilizes buckets to group and organize data by time, value, region, or any other shared trait. Whether you're tracking error codes or hourly sales trends, buckets and nested aggregations help turn chaos into clarity. Additionally, discover how time-based bucketing enables you to spot patterns and zoom in on valuable insights quickly.

View Video

Elastic

Read more about What Are Buckets in Elasticsearch? (Explained in 60 Seconds)

Navigating cloud databases in the AI era

Sep 6, 2025 By Google Cloud Tech In Google Operations

Cut through the hype and focus on the most important basics for data driven application development. Learn how to choosing the right database for your application and AI needs.

View Video

Google Operations

Read more about Navigating cloud databases in the AI era

Empowering an MCP server with a telemetry pipeline

Sep 5, 2025 By Mezmo In Mezmo

This blog was authored by Jason Bloomberg, Managing Director, Intellyx BV ‍ Observability depends upon telemetry – the data streaming from various applications, services, and systems that indicate their internal state in real-time. Various tools consume such telemetry to enable both operational and cybersecurity tasks.

Read Post

Mezmo

Read more about Empowering an MCP server with a telemetry pipeline

How Teams Are Using AI to Tackle Observability Challenges (2025 Survey Insights) | Grafana Labs

Sep 5, 2025 By Grafana In Grafana

In Grafana’s 3rd annual Observability Survey, over 1,000 engineers and leaders shared their challenges — tool sprawl, complexity, rising costs, and nonstop alerts — and their hopes for how AI can help.

View Video

Grafana

Read more about How Teams Are Using AI to Tackle Observability Challenges (2025 Survey Insights) | Grafana Labs

Kubernetes Monitoring Metrics That Improve Cluster Reliability

Sep 5, 2025 By Anjali Udasi In Last9

A Kubernetes cluster can generate more than 1,400 metrics out of the box. That’s a lot of numbers to sift through, especially when you’re troubleshooting a production slowdown in the middle of the night. The key is knowing which metrics tell you the most, with the least noise. These are the signals worth paying attention to when you need answers fast.

Read Post

Last9

Read more about Kubernetes Monitoring Metrics That Improve Cluster Reliability

Understanding dbt: basics and best practices

Sep 5, 2025 By Nicholas Thomson In Datadog

Data Build Tool (dbt) is an open source analytics engineering framework that enables teams to transform raw data that has been loaded into a warehouse like Snowflake, BigQuery, Redshift, or Databricks using SQL-based workflows. dbt is available in two main forms: dbt Core, the free and open source CLI tool, and dbt Cloud, a managed platform that adds scheduling, UI support, collaboration tools, and native integrations.

Read Post

Datadog

Read more about Understanding dbt: basics and best practices

Synthetic Monitoring for Vibe Coded Apps: Why You Need It

Sep 5, 2025 By Dotcom-Monitor In Dotcom-Monitor

Not all software is the product of rigid planning, extensive documentation, and carefully designed test pipelines. Some of it emerges in bursts of intuition, created by small teams or individuals who prioritize momentum over process. This is what many engineers call vibe coding: development driven by flow and creativity, where the goal is to get something working quickly rather than ensuring every edge case is accounted for.

Read Post

Dotcom-Monitor

Read more about Synthetic Monitoring for Vibe Coded Apps: Why You Need It

Landing Page Monitoring: Why, When and How to Do It Right

Sep 5, 2025 By Dotcom-Monitor In Dotcom-Monitor

Landing pages are the lifeblood of modern marketing campaigns. They’re not the homepage, not the product catalog, not the blog—they’re the sharp end of the funnel where traffic from ads, emails, and social clicks is supposed to turn into revenue. A landing page is where a $50,000 media buy either pays off or evaporates.

Read Post

Dotcom-Monitor

Read more about Landing Page Monitoring: Why, When and How to Do It Right

How to Transform Telemetry Data with the OpenTelemetry Transformation Language

Sep 5, 2025 By Splunk In Splunk

This demonstration shows how to use the OpenTelemetry Transformation Language (OTTL) to transform, filter, and enrich telemetry in the OpenTelemetry Collector without changing application code. We walk through a sample Python application and OpenTelemetry configuration file, generate real traffic, and then analyze the results in Splunk Observability Cloud.

View Video

Splunk

Read more about How to Transform Telemetry Data with the OpenTelemetry Transformation Language

Tiger teams: How we tackle urgent, cross-functional challenges at Grafana Labs

Sep 5, 2025 By Summer Wollin In Grafana

A year ago, we hit a wall. Our Grafana OSS releases were excruciating to execute. The process was confusing and hard to follow, security patches were non-trivial, and many engineering hours were lost to an overly manual process. We needed to move fast, cut through ambiguity, and pull in just the right people without waiting on roadmaps or org charts.

Read Post

Grafana

Read more about Tiger teams: How we tackle urgent, cross-functional challenges at Grafana Labs

How to Improve MariaDB Performance: Track Slow Queries with Logs and Metrics

Sep 5, 2025 By Benjamin Pitts In MetricFire

Database latency rarely starts in your app layer because it’s almost always a query doing more work than it should. Metrics tell you when that happens, but slow-query logging tells you which statement did it and how. That’s gold for tracking down missing indexes, inefficient filters, or accidental full scans. Pair the logging with a some lightweight counter metrics, and you get both an early warning and a clear path to a fix.

Read Post

MetricFire

Read more about How to Improve MariaDB Performance: Track Slow Queries with Logs and Metrics

Top tips for hassle-free software updates and patching

Sep 4, 2025 By Nandini Malhotra In ManageEngine

Top tips is a weekly column where we highlight what’s trending in the tech world and share ways to stay ahead. This week, we’re focusing on simple strategies to make software updates and patch management more efficient.

Read Post

ManageEngine

Read more about Top tips for hassle-free software updates and patching

VMware vSphere 7 Approaches End-of-Service

Sep 4, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

NiCE Introduces Next-Gen Monitoring Register.

Read Post

NiCE IT Mgmt

Read more about VMware vSphere 7 Approaches End-of-Service

The Next Evolution of AI: Forget Smarter Models - It's All About the Data

Sep 4, 2025 By Teneo In Teneo

It’s been a noisy summer in the AI world. Headlines have been filled with doom and gloom: For example, OpenAI’s ChatGPT-5 landing with a thud, and an MIT report claiming 95% of AI pilots are failing. For the sceptics, this is “proof” that AI is just hype. I don’t buy it. The MIT study looked at just 50 projects, a sample size so small you’d fail a basic stats exam for using it. And as someone who uses AI every single day, I can tell you the benefits are real.

Read Post

Teneo

Read more about The Next Evolution of AI: Forget Smarter Models - It's All About the Data

Database Performance Analyzer Overview

Sep 4, 2025 By solarwindsinc In SolarWinds

The cross-platform solution for performance monitoring for both cloud and on-premises databases. Anomaly detection powered by machine learning combined with forensic-level wait-time analysis gives you the power to diagnose performance issues in a matter of minutes, not days. Both real-time and historical data give you down-to-the-second answers to resolve critical problems, while expert advice via query and table tuning advisors allow you to proactively optimize your enterprise.

View Video

SolarWinds

Read more about Database Performance Analyzer Overview

Getting Started with SolarWinds Network Topology Mapper

Sep 4, 2025 By solarwindsinc In SolarWinds

The video provides a quick walkthrough of the SolarWinds Network Topology Mapper, starting with the welcome screen that prompts users to run a new scan. It guides viewers through a wizard for adding SNMP credentials, emphasizing the importance of specifying IP addresses for accurate mapping. The tool can identify unmanaged switches and allows for both one-time scans and scheduled updates, maintaining a clean map by archiving previous scans. Users can hover over connections to view port numbers, although serial numbers remain unavailable. The presenter encourages viewers to reach out with any questions.

View Video

SolarWinds

Read more about Getting Started with SolarWinds Network Topology Mapper

The Public Internet Is Not Your WAN

Sep 4, 2025 By Yann Guernion In Broadcom

Within many organizations, there’s been a strategic imperative to abandon MPLS in favor of SD-WAN and direct internet access, particularly when it comes to branch office connectivity. The benefits of this move are undeniable and compelling. Organizations can establish direct cloud connectivity and realize cost savings and improved agility.

Read Post

Broadcom

Read more about The Public Internet Is Not Your WAN

SvelteKit observability just got 10x better, and we're here for it

Sep 4, 2025 By Lukas Stracke In Sentry

The Svelte Team recently announced full observability and tracing support for SvelteKit! This is great news for SvelteKit and Sentry users, since Sentry is already compatible with the new feature! In addition, this is even greater news for the JavaScript ecosystem as a whole because SvelteKit just became the first ESM-based meta-framework to support instrumentation and tracing out of the box.

Read Post

Sentry

Read more about SvelteKit observability just got 10x better, and we're here for it

Introducing the StatusPage.io Import Tool: Migrate Your Incident History to Hyperping in Minutes

Sep 4, 2025 By Leo Baecker In Hyperping

Switching status page providers shouldn't mean losing years of valuable incident history. Your service timeline tells the story of your reliability journey—outages you've overcome, maintenance windows you've scheduled, and the trust you've built with transparent communication. Yet most migrations force you to choose: start fresh with a clean slate or manually recreate years of historical data.

Read Post

Hyperping

Read more about Introducing the StatusPage.io Import Tool: Migrate Your Incident History to Hyperping in Minutes

What Are Vector Embeddings? (Explained in 2 Minutes)

Sep 4, 2025 By Elastic In Elastic

In under 2 minutes, we explain what vector embeddings are, how they work, and how to use them in real-world applications like text expansion. We'll also show how Elasticsearch supports vector search with two powerful models: E5, open-source text embedding models designed for multilingual search, and ELSER, a sparse embeddings model from Elastic.

View Video

Elastic

Read more about What Are Vector Embeddings? (Explained in 2 Minutes)

Grafana Cloud: Beyond "Just" Observability

Sep 4, 2025 By Grafana In Grafana

From AWS to Zendesk, and across all teams, Ove at CLAAS explains a variety of approaches that go beyond traditional app monitoring. By focusing on the data itself, one can deliver deeper insights and a fuller understanding of your entire system.

View Video

Grafana

Read more about Grafana Cloud: Beyond "Just" Observability

Useful resources available at Grafana Labs to get help

Sep 4, 2025 By Grafana In Grafana

Here are the some useful resources for getting help at Grafana Thanks for watching!

View Video

Grafana

Read more about Useful resources available at Grafana Labs to get help

What's new in the Infinity data source for Grafana: support for JQ parser, additional HTTP methods, and more

Sep 4, 2025 By Ivana Huckova In Grafana

Since its launch in 2020, the Infinity data source for Grafana has become the go-to solution to seamlessly query and visualize data from JSON, CSV, XML, and GraphQL endpoints within Grafana. Allowing users to integrate diverse data formats via HTTP-based APIs, the Infinity data source has enabled a wide range of use cases within our community over the years — from visualizing cloud computing costs to popular Pokémon games.

Read Post

Grafana

Read more about What's new in the Infinity data source for Grafana: support for JQ parser, additional HTTP methods, and more

Visually identify and prioritize security risks using Cloudcraft

Sep 4, 2025 By Colten Woo In Datadog

As cloud infrastructure becomes more dynamic and distributed, DevOps and security teams need to quickly detect risks and understand their context: where those risks live, how critical they are, and how to respond effectively. By surfacing misconfigurations, vulnerabilities, sensitive data risks, and identity threats directly on a real-time diagram of your infrastructure, Cloudcraft helps teams identify, prioritize, and remediate security issues at scale.

Read Post

Datadog

Read more about Visually identify and prioritize security risks using Cloudcraft

Full Session Simulation - Simulate Anything, Everything, Anywhere

Sep 4, 2025 By Swaminathan J In eG Innovations

Full Session Simulation is a powerful troubleshooting strategy. Have you ever been in a situation where everything on your dashboards looks green, but users are still encountering issues and raising support tickets?The cliche of “everything is fine on our side” moment is not just frustrating for everyone. It’s risky! Because when you can’t replicate what the user is experiencing, you’re flying blind.

Read Post

eG Innovations

Read more about Full Session Simulation - Simulate Anything, Everything, Anywhere

What is Infrastructure Monitoring? How it Works, Key Metrics & Use Cases

Sep 4, 2025 By Logz.io In logz.io

Infrastructure monitoring is the process of continuously collecting, analyzing, and visualizing data from an organization’s IT infrastructure. With infrastructure monitoring, DevOps teams can maintain system health, meet SLAs, reduce downtime, and detect and resolve issues proactively. This ensures optimal performance, availability, and reliability. Key networks components infrastructure monitoring typically covers.

Read Post

logz.io

Read more about What is Infrastructure Monitoring? How it Works, Key Metrics & Use Cases

The Critical Role of Networks in AI Data Centers

Sep 4, 2025 By Phil Gervasi In Kentik

The fastest GPU is only as fast as the slowest packet. In this post, we examine how the network is the primary bottleneck in AI data centers and offer a practical playbook for network operators to optimize job completion time (JCT).

Read Post

Kentik

Read more about The Critical Role of Networks in AI Data Centers

Transform your public sector organization with embedded GenAI from Elastic on AWS

Sep 4, 2025 By Marianna Jonsdottir In Elastic

Elastic featured in AWS Generative AI Hub for public sector Elastic is proud to be featured in the new AWS Generative AI Content Hub for public sector — a destination showcasing the most impactful ways agencies can securely adopt and scale generative AI (GenAI).

Read Post

Elastic

Read more about Transform your public sector organization with embedded GenAI from Elastic on AWS

Sharpening My React Hooks Knowledge With ChatGPT

Sep 4, 2025 By Kat Telles In Honeycomb

I’m a product engineer at Honeycomb. While my work spans the stack, I’m currently focused on deepening my frontend expertise. To support this, I’ve been using ChatGPT as a study assistant. It’s helped me break down complex topics with clear explanations, real-world examples, and—critically—interactive practice. The most effective formats I’ve found.

Read Post

Honeycomb

Read more about Sharpening My React Hooks Knowledge With ChatGPT

What If You Could Roll Back Any Network Change in Seconds?

Sep 4, 2025 By ScienceLogic In ScienceLogic

If you’ve worked in network operations, this scenario is all too familiar. Even the most seasoned teams and robust processes can’t escape reality: changes fail, misconfigurations happen, and the fallout is real–lost productivity, unhappy customers, compliance headaches, and hours (or days) of cleanup. But what if it didn’t have to be that way?

Read Post

ScienceLogic

Read more about What If You Could Roll Back Any Network Change in Seconds?

Bringing Observability to Claude Code: OpenTelemetry in Action

Sep 3, 2025 By Goutham Karthi In SigNoz

AI coding assistants like Claude Code are becoming core parts of modern development workflows. But as with any powerful tool, the question quickly arises: how do we measure and monitor its usage? Without proper visibility, it’s hard to understand adoption, performance, and the real value Claude brings to engineering teams. For leaders and platform engineers, that lack of observability can mean flying blind when it comes to understanding ROI, productivity gains, or system reliability.

Read Post

SigNoz

Read more about Bringing Observability to Claude Code: OpenTelemetry in Action

The New Physics of IT: Service-Centric Observability, AI-Driven Operations, and Intelligent Automation

Sep 3, 2025 By ScienceLogic In ScienceLogic

Why the traditional model of monitoring and manual operations is collapsing–and what enterprises must do to survive The digital universe is expanding at a pace no enterprise can keep up with through traditional methods. Dependencies pull at each other in ways even experts can’t predict. What once could be managed with dashboards and siloed monitoring tools has become too vast, too interdependent, and too fast-moving, a new operating model is needed to master such complexity.

Read Post

ScienceLogic

Read more about The New Physics of IT: Service-Centric Observability, AI-Driven Operations, and Intelligent Automation

Why database monitoring is critical for application performance

Sep 3, 2025 By Grace Nalini In Site24x7

When an application slows down, users rarely think about the database—but in many cases, that’s where the bottleneck lies. Databases sit at the core of nearly every application, storing, retrieving, and processing the information that powers business transactions, analytics, and user interactions. A minor inefficiency in query execution or a spike in resource usage can cascade into multiple issues, starting with degraded application performance, service interruptions, or even downtime.

Read Post

Site24x7

Read more about Why database monitoring is critical for application performance

The Role of Service Maps in Optimizing PHP Application Performance

Sep 3, 2025 By Mohana Ayeswariya J In Atatus

Modern PHP applications rarely exist in isolation. They run across distributed environments, connect to MySQL or PostgreSQL databases, interact with Redis or Memcached, rely on APIs, and communicate with microservices. This interconnected web brings power but also enormous complexity. When performance issues arise, finding the root cause can feel like searching for a needle in a haystack. Is it the database? A caching layer? A failing third-party API?

Read Post

Atatus

Read more about The Role of Service Maps in Optimizing PHP Application Performance

How to Reduce Serverless Costs with Smart Monitoring

Sep 3, 2025 By Pavithra Parthiban In Atatus

Serverless architecture has changed how applications are built and run. It removes the need to manage servers, letting developers focus on writing code while automatically scaling with demand. But even with its pay-as-you-go model, serverless apps can get expensive if not monitored and optimized. In this blog, lets see how smart serverless monitoring helps developers and DevOps engineers lower serverless costs, boost performance, and keep operations running smoothly.

Read Post

Atatus

Read more about How to Reduce Serverless Costs with Smart Monitoring

The Fourth Pillar of Observability

Sep 3, 2025 By Lily Waldorf In Coralogix

Your application is only as reliable as the infrastructure it runs on. Most commonly, that means Kubernetes is doing the job by managing fleets of containers, scaling services on demand, and keeping workloads distributed across nodes. Traditional dashboards weren’t built to scale with this reality. They give you snapshots of raw metrics. They don’t scale to multi-cluster environments. They don’t map relationships between resources.

Read Post

Coralogix

Read more about The Fourth Pillar of Observability

When metrics mislead: Inside the 2025 Retail Web Performance Benchmark

Sep 3, 2025 By Denton Chikura In Catchpoint

Over the past few years at Catchpoint, we’ve benchmarked the digital performance of banks, airlines, hotels, travel aggregators, GenAI platforms, athletic footwear brands, and even ad hoc events like the Super Bowl, Olympics, and Election Day. Each time, our approach focused on the technical metrics performance professionals live and breathe: DNS resolution times, Time to First Byte, page load speeds, and six other core measurements that we'd dissect, analyze, and use to rank companies.

Read Post

Catchpoint

Read more about When metrics mislead: Inside the 2025 Retail Web Performance Benchmark

What is APM Tracing?

Sep 3, 2025 By Faiz Shaikh In Last9

APM tracing records the complete execution path of a request as it travels through your system, including database queries, external API calls, cache lookups, message queue events, and inter-service requests. Each step is captured with precise start and end timestamps, duration, and context such as service name, operation name, and relevant attributes. This lets you pinpoint where latency or errors originate without piecing together metrics and logs manually.

Read Post

Last9

Read more about What is APM Tracing?

Top 3 Jira reporting tools: SquaredUp vs Power BI vs Jira

Sep 3, 2025 By Blog In Squared Up

A recent survey revealed that developers and engineering teams waste 8+ hours a week on inefficiencies in their role. Poor reporting tools are a main contributor, with Jira being regarded as a frequent source of friction. But since Jira is so deeply embedded in most organizations' infrastructure and processes, replacing it is not really an option. Rather, the solution lies in optimizing how users interact with it rather than abandoning it altogether.

Read Post

Squared Up

Read more about Top 3 Jira reporting tools: SquaredUp vs Power BI vs Jira

Cutting through Kubernetes Complexity with Lumigo

Sep 3, 2025 By Orr Weinstein In Lumigo

Effectively monitoring Kubernetes environments remains one of the most challenging aspects of modern application management. As applications grow more complex and distributed, the need for comprehensive visibility becomes paramount. We have continued to deliver major advancements in our Kubernetes monitoring, providing you with deeper insights and more powerful tools to tackle these challenges head-on.

Read Post

Lumigo

Read more about Cutting through Kubernetes Complexity with Lumigo

Visualize Logs Alongside Metrics: Complete Observability for Slow MongoDB Operations

Sep 3, 2025 By Benjamin Pitts In MetricFire

MongoDB’s strength of flexible schema and fast iteration can also hide costly queries until they surface as user-facing latency, replica lag, or spiky CPU. A handful of slow operations can impact the cache, starve other workloads, and cascade into timeouts across services. Monitoring slow queries gives you an early warning system for index gaps and query-plan regressions introduced by code deploys, schema changes, or shifting data shapes.

Read Post

MetricFire

Read more about Visualize Logs Alongside Metrics: Complete Observability for Slow MongoDB Operations

AWS metric ingestion for less: Save money and get near real-time stream into Grafana Cloud

Sep 3, 2025 By Karsten Jeschkies In Grafana

There’s a new way to ingest AWS metrics into Grafana Cloud that makes observing your AWS resources more cost-effective, easier to operate, and more accurate. You can now stream metrics into the AWS Observability app in Grafana Cloud in near real-time thanks to our new integration with Amazon CloudWatch and Amazon Data Firehose. We’re already using it internally, and we’re finding that it’s not only easier to operate—it’s at least five times more cost-effective.

Read Post

Grafana

Read more about AWS metric ingestion for less: Save money and get near real-time stream into Grafana Cloud

Logs are Generally Available (Still logs, just finally useful)

Sep 3, 2025 By Dhrumil Parekh In Sentry

When we started building Logs in Sentry we had one goal: make them useful for real debugging, not just another high-volume text storage. This meant making them "trace connected" from day one. This let us ensure they were tightly connected to the actions and performance happening in your application, right where developers already go to investigate errors, performance, and latency issues. Now, Logs is out of beta and generally available to everyone.

Read Post

Sentry

Read more about Logs are Generally Available (Still logs, just finally useful)

Sentry's got logs.

Sep 3, 2025 By Sentry In Sentry

Still logs, just more useful.

View Video

Sentry

Read more about Sentry's got logs.

Setting up Sentry logs

Sep 3, 2025 By Sentry In Sentry

Sentry logs is now generally available for all users. Serge walks us through how to quickly set it up in a Next.js app.

View Video

Sentry

Read more about Setting up Sentry logs

Kentik Traffic Costs Workflow Demo

Sep 3, 2025 By Kentik In Kentik

Learn how Kentik's automated traffic cost workflow provides instant visibility into network traffic costs, enabling you to optimize spend, improve margins, and make smarter business decisions. In this demo, you'll see practical examples like evaluating costs by AS group and downstream customer, helping network, finance, and commercial teams take immediate, actionable steps to reduce costs and boost efficiency.

View Video

Kentik

Read more about Kentik Traffic Costs Workflow Demo

This Month in Datadog - August 2025

Sep 3, 2025 By Datadog In Datadog

In the August episode of This Month in Datadog, Jeremy shares how you can make more informed cloud cost decisions, gain insights into your LiteLLM-powered applications, and secure Kubernetes infrastructure with Datadog Workload Protection. Later in the episode, Danny puts the spotlight on Datadog Kubernetes Autoscaling, which helps you deliver cost savings without sacrificing performance.

Read Post

Datadog

Read more about This Month in Datadog - August 2025

Bridging the Gap: Legacy Systems and Modern Observability

Sep 3, 2025 By Datadog In Datadog

Technology moves quickly and while the spotlight has shifted to dynamic, cloud-based systems, many organizations have legacy applications and infrastructure that they must maintain. In this fireside chat, Datadog’s Matt Moore (Principal Observability Strategist) will host James Flores (Enterprise Systems Engineer) at Australian Community Media to discuss their journey of modernization and bridging legacy systems with the cloud using a bit of ingenuity and observability.

View Video

Datadog

Read more about Bridging the Gap: Legacy Systems and Modern Observability

The hidden costs of shadow AI: CPU drain, data risk, and network bottlenecks

Sep 3, 2025 By Ben Botti In Auvik

The risk of headline-grabbing incidents, like Samsung’s ChatGPT data leak, related to AI usage outside of the authorization and control of IT (a.k.a. shadow AI) is clear. Most IT teams recognize that a high-profile incident can have serious repercussions. However, the risk of shadow AI goes well beyond the risk of a single incident. In fact, the recent Komprise IT Survey indicates that 79% of organizations have experienced negative outcomes from sending corporate data to AI.

Read Post

Auvik

Read more about The hidden costs of shadow AI: CPU drain, data risk, and network bottlenecks

OnlineOrNot updates from August 2025

Sep 3, 2025 By Max Rozen In OnlineOrNot

A bit more behind-the-scenes work than usual this month, but I still managed to ship some public-facing features you might be interested in. Logging in, and clicking around the dashboard just got 60% faster, and we're just getting started.

Read Post

OnlineOrNot

Read more about OnlineOrNot updates from August 2025

Dashboards say green. Users say It's broken.

Sep 3, 2025 By Catchpoint In Catchpoint

Your infrastructure metrics are all green. The code is clean. But support tickets are rolling in. What’s going on? The problem: traditional monitoring tools stop at your infrastructure. They don’t tell you if the user can actually complete their task. As @gerardo explains, the objective of a car is not to have the correct tire pressure or gas levels... it’s to get from point A to point B. User experience works the same way. What’s the point of having green metrics when your users are not experiencing the same thing?

View Video

Catchpoint

Read more about Dashboards say green. Users say It's broken.

Weaving AppNeta Experience Insights into DX NetOps: A Step-by-Step Guide

Sep 3, 2025 By Robert Kettles In Broadcom

Today’s enterprise networks aren’t constrained to a single location—they span continents, clouds, and providers, and they’re relied upon by users who can work from anywhere. For network operations teams, that means every issue is a potential scavenger hunt. Is it the app? The WAN? The cloud provider? The ISP? The stakes are high and your tools need to evolve. That’s why the integration of DX NetOps and AppNeta is such a game-changer.

Read Post

Broadcom

Read more about Weaving AppNeta Experience Insights into DX NetOps: A Step-by-Step Guide

Introducing Kentik Traffic Costs: Real-Time Network Cost Intelligence

Sep 3, 2025 By Lauren Basile In Kentik

Introducing Kentik Traffic Costs, an industry-first automated workflow delivering instant cost estimates for network traffic slices. Learn how this exciting new feature gives network, financial, and sales teams actionable insights to optimize spend, improve margins, and drive revenue.

Read Post

Kentik

Read more about Introducing Kentik Traffic Costs: Real-Time Network Cost Intelligence

Lessons from Building Datadog's Australian Datacenter

Sep 3, 2025 By Datadog In Datadog

Data sovereignty is crucial for businesses today, driven by government regulations and increasing customer demand for control over their online privacy and information. To better serve our Australian customers, Datadog recently opened our AP2 datacenter in Australia.

View Video

Datadog

Read more about Lessons from Building Datadog's Australian Datacenter

Cost Controls and so Much More: Issue Detection Through Usage Analysis

Sep 3, 2025 By Datadog In Datadog

Keeping tabs on cloud spending across multiple organizations and vendors, including Datadog, can be tough and costly. If you're not tracking expenses, you're also missing other critical insights. The Flight Centre Travel Group (FCTG) faced this when moving to Datadog, needing to monitor costs across numerous organizations and over 180 Azure subscriptions. After a rapid migration, new cost reports quickly revealed more than just financial benefits. Unusual spending patterns often highlighted incidents, bugs, or security issues, offering early warnings about internal system problems.

View Video

Datadog

Read more about Cost Controls and so Much More: Issue Detection Through Usage Analysis

Sponsored Post

AI realism (part two)

Sep 2, 2025 By JD Trask In Raygun

Emotions are running high about AI technologies. In this 2-parter, I do my best to make a rational case for the state of AI, and how we can respond to it.

Read Post

Raygun

Read more about AI realism (part two)

Vendor consolidation-the key to IT cost optimization in 2026

Sep 2, 2025 By Jeremy Spence In ManageEngine

IT departments are no strangers to complexity. With businesses navigating a range of cloud services, cybersecurity tools, and automation technologies, the modern IT ecosystem can resemble a medley of vendors, software, and services sown together. While these solutions aim to improve performance, the sheer volume of suppliers can muddy the waters when it comes to efficiency and cost management.

Read Post

ManageEngine

Read more about Vendor consolidation-the key to IT cost optimization in 2026

August product updates

Sep 2, 2025 By Valeria Kurolapova In StatusGator

August was a big month at StatusGator. From major improvements to Early Warning Signals to new privacy controls and expanded service coverage, we shipped a range of updates to make monitoring even more powerful for your team. Here’s a recap of everything we rolled out last month.

Read Post

StatusGator

Read more about August product updates

August Early Warning Signals: detected before providers

Sep 2, 2025 By Colin Bartlett In StatusGator

In August, StatusGator’s Early Warning Signals detected hundreds of global service outages before official provider acknowledgments were published. Our alerts notified users early on—often minutes before providers confirmed issues—giving IT teams the critical lead time to respond. Below, we highlight three of the most significant outages we tracked in August, followed by a curated selection of other notable disruptions.

Read Post

StatusGator

Read more about August Early Warning Signals: detected before providers

The Debugging Bottleneck: A Manual Log-Sifting Expedition

Sep 2, 2025 By Mezmo In Mezmo

Imagine a developer at a fast-growing company. A customer support agent reports a critical issue: a user's recent order is stuck in a "pending" state. The agent provides a customer ID and a request ID. The developer's typical process is a familiar, painful dance: This process is slow, tedious, and prone to human error. The Mean Time to Resolution (MTTR) is measured in hours, not minutes, and it's a huge drain on engineering resources.

Read Post

Mezmo

Read more about The Debugging Bottleneck: A Manual Log-Sifting Expedition

Serverless Monitoring: Essential Metrics Every Developer Should Track

Sep 2, 2025 By Pavithra Parthiban In Atatus

Serverless applications have become one of the most efficient ways to build and deploy software. With platforms like AWS Lambda, Azure Functions, and Google Cloud Functions, teams can focus on writing code while the provider handles infrastructure, scaling, and availability. But going serverless doesn’t mean monitoring stops being important. In fact, monitoring becomes even more critical because you don’t have direct control over the servers, containers, or VMs.

Read Post

Atatus

Read more about Serverless Monitoring: Essential Metrics Every Developer Should Track

What is Vector Search? (Ft. Symphonic Metal)

Sep 2, 2025 By Coroot In Coroot

Valkey OSS Developer Advocate Roberto Luna Rojas explains what a Vector is, how Valkey (and Vector) search works, and gives us an example using an interesting combination of music tastes.

View Video

Coroot

Read more about What is Vector Search? (Ft. Symphonic Metal)

Everything You Ever Wanted to Know About DEXOps (But Were Afraid to Ask)

Sep 2, 2025 By Nexthink In Nexthink

Reality Bytes is back! In this episode, Tim, Tom, Megan, and Sean dive deep into DEXOps—the practice of operationalizing Digital Employee Experience. Building on insights from the show's recent webinar series, the team explores how IT leaders can shift from reactive firefighting to a proactive, structured approach that drives measurable business value. They cover the key pillars of DEXOps—from people development and process rigor to technology selection, communication strategies, and leadership alignment. You’ll hear why DEXOps isn’t a side project or “hobby,” but a mission-critical discipline, as essential as security or uptime.

View Video

Nexthink

Read more about Everything You Ever Wanted to Know About DEXOps (But Were Afraid to Ask)

Azure Data Factory Monitoring Integration

Sep 2, 2025 By Babu Sundaram In eG Innovations

Microsoft Azure Data Factory is a cloud-based data integration service provided by Microsoft Azure. It enables you to create, manage, and automate data workflows that move and transform data from different sources to various destinations. Essentially, ADF allows you to design, orchestrate, and manage data pipelines, making it easier to work with large volumes of data across on-premises and cloud environments.

Read Post

eG Innovations

Read more about Azure Data Factory Monitoring Integration

Actionable insights into the end-user experience: an overview of Grafana Cloud Frontend Observability dashboards

Sep 2, 2025 By Bukola Ayodele In Grafana

One of the biggest challenges in frontend development is identifying when and why users encounter performance issues, whether it’s slow page loads, JavaScript errors, or failed HTTP requests. With Grafana Cloud Frontend Observability — a hosted service for real user monitoring (RUM) — you get immediate, clear, and actionable insights into the end-user experience of your web applications.

Read Post

Grafana

Read more about Actionable insights into the end-user experience: an overview of Grafana Cloud Frontend Observability dashboards

Database monitoring for beginners

Sep 2, 2025 By Grace Nalini In Site24x7

Understand what's happening inside your database before your users do. Modern applications live and breathe through their databases. But when slow queries, connection spikes, or failed transactions start to pile up, the impact isn't just technical—it's customer-facing. That's why tracking your databases gives you the visibility into how your databases are performing under the hood.

Read Post

Site24x7

Read more about Database monitoring for beginners

Netdata AI Troubleshooting is Now Generally Available with On-Demand Credits

Sep 2, 2025 By Netdata Team In netdata

Since launching our AI investigations and insights in a research preview, one thing has become clear: automated root cause analysis delivers a significant return on investment. Teams have confirmed that instant insights don’t just save a few minutes; they fundamentally shorten incident response cycles, free up valuable engineering hours, and reduce the business impact of downtime.

Read Post

netdata

Read more about Netdata AI Troubleshooting is Now Generally Available with On-Demand Credits

Upgrading from InfluxDB 3 Core to InfluxDB 3 Enterprise

Sep 2, 2025 By Suyash Joshi In InfluxData

InfluxDB 3 Enterprise builds on Core with powerful features for production workloads, such as high availability, long-range query support, and advanced security. The good news? Upgrading is seamless, thanks to InfluxDB 3’s modern architecture and easy installation.

Read Post

InfluxData

Read more about Upgrading from InfluxDB 3 Core to InfluxDB 3 Enterprise

What Are Hits and Aggregations in Elasticsearch? (Explained)

Sep 2, 2025 By Elastic In Elastic

Most people think search is only about finding results, but Elasticsearch delivers much more. In this quick guide, discover the difference between hits (the documents that match your query) and aggregations (the analytics that reveal patterns, counts, and trends beyond the results page).

View Video

Elastic

Read more about What Are Hits and Aggregations in Elasticsearch? (Explained)

Your Network Disaster Recovery Plan is Only as Good as its Execution

Sep 2, 2025 By Yann Guernion In Broadcom

A disaster recovery plan (DRP) is the strategic backbone of your organization’s resilience. It defines your objectives, outlines responsibilities, and sets the critical promise you make to the business: your recovery time objective (RTO). This plan is indispensable. However, a strategy is worthless without the tactical ability to implement it.

Read Post

Broadcom

Read more about Your Network Disaster Recovery Plan is Only as Good as its Execution

What is Single Pane of Glass Monitoring and How Can Enterprises Leverage It for Enhanced Visibility?

Sep 2, 2025 By david.arrowsmith In Interlink

Large enterprises today grapple with increasingly complex IT environments - spanning multiple cloud services, hybrid infrastructures and countless applications. Exacerbated by technology silos, the sheer volumes of data generated in such environments can quickly overwhelm IT teams, impairing their ability to identify and respond to customer impacting issues before outages strike.

Read Post

Interlink

Read more about What is Single Pane of Glass Monitoring and How Can Enterprises Leverage It for Enhanced Visibility?

IT Monitoring News | September '25 Edition

Sep 1, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

Latest updates, insights, events, and more regarding Microsoft SCOM, SCOM MI, and Azure Monitor.

Read Post

NiCE IT Mgmt

Read more about IT Monitoring News | September '25 Edition

kubectl logs: How to View & Tail Kubernetes Pod Logs

Sep 1, 2025 By Favour Daniel In SigNoz

When debugging containerized applications in Kubernetes, kubectl logs serves as your primary command-line tool for accessing container logs directly. Understanding how to effectively retrieve, filter, and analyze logs becomes essential for maintaining application health and resolving issues quickly, especially in multi-container environments where correlation across services can make or break your troubleshooting efforts.

Read Post

SigNoz

Read more about kubectl logs: How to View & Tail Kubernetes Pod Logs

Top 10 Serverless Monitoring Tools in 2025

Sep 1, 2025 By Pavithra Parthiban In Atatus

Monitoring serverless applications is critical to ensure optimal performance, reduce errors, and maintain end-to-end observability. Choosing the right serverless monitoring tools can help track serverless performance metrics, cold starts, and distributed traces across cloud functions. Below, we explore the top 10 cloud-native and third-party serverless monitoring solutions, highlighting their features, pros, cons, and best use cases.

Read Post

Atatus

Read more about Top 10 Serverless Monitoring Tools in 2025

Ecommerce Security Incidents: Stripe, Pandora, and OpenCart

Sep 1, 2025 By Georgina Grant-Muller In RapidSpike

Cyberattacks against ecommerce businesses are accelerating, and recent incidents show just how many different angles attackers are exploiting. Whether it’s phishing campaigns, third-party data breaches, or malware injections, ecommerce stores are a prime target. Here are three recent incidents making headlines, and what they mean for ecommerce operators.

Read Post

RapidSpike

Read more about Ecommerce Security Incidents: Stripe, Pandora, and OpenCart

AIOps Is Consolidating Fast, Here's Where HEAL Delivers Results

Sep 1, 2025 By renuka In HEAL Software

As of September 2025, the Artificial Intelligence for IT Operations (AIOps) market is a rapidly expanding and dynamic sector, projected to surpass $20 billion. The landscape is defined by a major consolidation trend, with large enterprise technology vendors acquiring key AIOps capabilities to integrate into their broader portfolios.

Read Post

HEAL Software

Read more about AIOps Is Consolidating Fast, Here's Where HEAL Delivers Results

How to Reduce Errors and Improve Reliability in High-Traffic Node.js Applications with APM?

Sep 1, 2025 By Mohana Ayeswariya J In Atatus

Node.js has become the go-to runtime for building modern, high-performance applications. Its event-driven, non-blocking I/O model makes it particularly well-suited for apps that demand speed and scalability, such as real-time chats, gaming backends, streaming platforms, fintech dashboards, and e-commerce systems. It’s no surprise that some of the world’s largest companies like Netflix, PayPal, LinkedIn, Walmart rely on Node.js to deliver services at scale.

Read Post

Atatus

Read more about How to Reduce Errors and Improve Reliability in High-Traffic Node.js Applications with APM?

Simpler Access and Broader Vendor Support - New in DataStream

Sep 1, 2025 By VirtualMetric In VirtualMetric

This month we’re rolling out several new capabilities designed to simplify daily work for SOC teams, MSSPs, and enterprise security operations. The focus is on easier access control, streamlined log management, and faster vendor onboarding.

Read Post

VirtualMetric

Read more about Simpler Access and Broader Vendor Support - New in DataStream

Get started with Grafana Alerting: Template your alert notifications

Sep 1, 2025 By Grafana In Grafana

In this tutorial you will learn how to template your alerts. Don't miss the rest of the "Get started with Grafana Alerting" series! Each part dives into a different feature to help you get the most out of alerting in Grafana.

View Video

Grafana

Read more about Get started with Grafana Alerting: Template your alert notifications

Technical Blog: Remote Debugging for RTOS Firmware: How Continuous Observability Changes the Game

Sep 1, 2025 By Percepio In Percepio

Debugging embedded software has never been easy, but today’s systems are more complex and interconnected than ever. Real-time operating systems (RTOS) and continuous integration pipelines can make development faster—but certain classes of bugs are hard to reproduce and diagnose. These elusive issues often appear only under rare conditions, such as timing-sensitive race conditions or field-only failures. This is where Continuous Observability, powered by Percepio Detect, changes the game.

Read Post

Percepio

Read more about Technical Blog: Remote Debugging for RTOS Firmware: How Continuous Observability Changes the Game

The Essential Guide to Azure Infrastructure, Monitoring, and Management Tools

Sep 1, 2025 By Mélanie Dallé In Qovery

Master Azure infrastructure management with this comprehensive guide. Learn the four critical pillars—governance, cost control, security, and operations—and discover the essential native and third-party tools needed to scale your cloud strategy effectively.

Read Post

Qovery

Read more about The Essential Guide to Azure Infrastructure, Monitoring, and Management Tools

A Single Hub for Telemetry: OpenTelemetry Gateway

Sep 1, 2025 By Anjali Udasi In Last9

The OpenTelemetry Gateway (OTel Gateway) is a centralized service that collects, processes, and routes telemetry data—metrics, traces, and logs—across your infrastructure. In a typical setup, each service pushes telemetry directly to an observability backend. While this approach works well for small environments, it becomes increasingly difficult to manage as systems grow.

Read Post

Last9

Read more about A Single Hub for Telemetry: OpenTelemetry Gateway

Operations | Monitoring | ITSM | DevOps | Cloud