Monthly Archive

Monitoring OAuth 2.0 Client Credentials Flows in Web APIs

Dec 31, 2025 By Dotcom-Monitor In Dotcom-Monitor

OAuth 2.0 client credentials flows are a core mechanism for machine-to-machine API authentication. They enable background jobs, microservices, and system integrations to securely access APIs without user interaction. However, while most teams spend time configuring these flows, far fewer ensure they are continuously monitored in production. This creates a critical blind spot: OAuth failures often surface only after dependent services begin failing.

Read Post

Dotcom-Monitor

Read more about Monitoring OAuth 2.0 Client Credentials Flows in Web APIs

Why High-Cardinality Metrics Break Everything

Dec 31, 2025 By Prathamesh Sonpatki In Last9

High-cardinality metrics are one of those ideas that sound obviously right - until you try to use them in production. In theory, they promise precision. Instead of averages and rollups, you get specificity: per-request, per-userid, per-container, per-feature insights. The kind of detail we all immediately want when something is on fire. And then things start breaking. Not immediately. Not loudly.But quietly.

Read Post

Last9

Read more about Why High-Cardinality Metrics Break Everything

The Python Backend Framework Decision Guide for 2026

Dec 31, 2025 By Rollbar In Rollbar

Three frameworks dominate Python backend development in 2026: Django, FastAPI, and Flask. This guide helps you choose between them (plus specialized alternatives like Falcon, Tornado, and Litestar) using a simple decision tree. Answer three questions about your project, understand each framework's strengths, and pick the right tool for your needs.

Read Post

Rollbar

Read more about The Python Backend Framework Decision Guide for 2026

Beep boop: How to visualize Grafana Cloud IRM alerts in the real world

Dec 31, 2025 By Joe McManus In Grafana

You know the situation: You're in a meeting and your alerts start to go off, but no one on the other side of the camera knows why you have to abruptly drop from the call. What if, instead, you had a robot in the background of your Zoom meeting that started to blink when those same alerts went off? You could just point to it, type in the chat "I have to drop," and off you'd go.

Read Post

Grafana

Read more about Beep boop: How to visualize Grafana Cloud IRM alerts in the real world

Real User Monitoring Check Setup

Dec 31, 2025 By Uptime Website Monitoring In uptime

Learn how to set up an Uptime.com Real user Monitoring check. There’s more to website performance than “up/down” statuses. Use RUM to leverage the data collected from past user experiences to optimize for future ones.

View Video

uptime

Read more about Real User Monitoring Check Setup

AI-generated media: What's the point?

Dec 30, 2025 By Eric Roshaan In ManageEngine

If you have even a minor social media presence, you've probably been unfortunate enough to come upon the wonderfully disturbing world of AI slop content. We're talking wrestling matches featuring controversial mustached historical figures and Formula One-style races featuring Stephen Hawking in his wheelchair (if you have no idea what I'm talking about, I genuinely envy you).

Read Post

ManageEngine

Read more about AI-generated media: What's the point?

Zero code tracing: Kubernetes observability with Logz.io and eBPF

Dec 30, 2025 By Yotam Loewenbach In logz.io

Distributed tracing is a core tool for operating modern microservices platforms. For SREs and DevOps teams, it is often the fastest way to understand latency issues, service dependencies, and unexpected failure modes. But achieving comprehensive tracing coverage is resource-intensive and time-consuming. It usually requires application changes, language-specific instrumentation, agent lifecycle management, and ongoing coordination with development teams.

Read Post

logz.io

Read more about Zero code tracing: Kubernetes observability with Logz.io and eBPF

There's No Place Like the Homelab for the Holidays: A Proxmox Deep Dive

Dec 30, 2025 By VictoriaMetrics In VictoriaMetrics

Resources for Further Learning.

View Video

VictoriaMetrics

Monitoring

Read more about There's No Place Like the Homelab for the Holidays: A Proxmox Deep Dive

Normalize any logs for Cloud SIEM with Datadog's OCSF processor

Dec 30, 2025 By Vera Chan In Datadog

Security teams need visibility across every system they defend, including cloud platforms, SaaS applications, security controls, identity providers, and custom services. But those systems all produce logs in different formats, with inconsistent field names and structures. That lack of standardization makes it harder to correlate events, write reusable detections, and investigate incidents quickly.

Read Post

Datadog

Read more about Normalize any logs for Cloud SIEM with Datadog's OCSF processor

Shorten your 'inner loop' as a new hire and get past imposter syndrome with Grafana Assistant

Dec 30, 2025 By Julius Vogt In Grafana

Let's talk about being new. Four months ago, I joined Grafana Labs as a senior solutions engineer. It wasn’t just a new company, it was a new industry. I came from the visual workspace provider Miro, where I was comfortable doing discovery and talking about visual collaboration and innovation. But stepping into observability? I was in the deep end. And let me tell you, the imposter syndrome was real. Everyone around me was fluent in this language of metrics, logs, and traces.

Read Post

Grafana

Read more about Shorten your 'inner loop' as a new hire and get past imposter syndrome with Grafana Assistant

Episode 4 - 2025 AI Retrospective and What's Next for 2026

Dec 30, 2025 By Digitate In Digitate

In this special holiday episode of The Intelligent Enterprise, host Tom Stoneman takes a step back from the day-to-day pace of enterprise life to look at where AI has been in 2025 and where it might be heading next. To do it, he sits down with his colleague VS Joshi, Global Head of Product Marketing at Digitate, for a year-end retrospective and a 2026 outlook.

View Video

Digitate

Read more about Episode 4 - 2025 AI Retrospective and What's Next for 2026

5 Reasons Why Website Design is Now an Operational Concern

Dec 30, 2025 By OpsMatters In OpsMatters

There was a time when website design lived entirely in the marketing department-all about how your brand looked, how long visitors stayed, and how credible you seemed. A beautiful site meant trust, and a bad one meant lost sales. Simple as that. But that version of "web design" doesn't exist anymore. With the rise of JavaScript-heavy frameworks, cloud infrastructure, and performance-driven SEO, design has become an operational concern.

Read Post

OpsMatters

Read more about 5 Reasons Why Website Design is Now an Operational Concern

December updates

Dec 29, 2025 By Valeria Kurolapova In StatusGator

As we wrap up the year, we wanted to share a few highlights from StatusGator – along with some useful updates you may have missed. From new API capabilities to expanded service coverage and an overview of the biggest outages in 2025, we’re closing out the year strong and grateful for your continued support.

Read Post

StatusGator

Read more about December updates

GitHub Outage Tracker: 5 Real-Time Monitoring Methods

Dec 29, 2025 By Nuno Tomas In isDown

When GitHub goes down, everything stops. Your developers can't push code. CI/CD pipelines hang indefinitely. Pull requests pile up. Deployments freeze. And if you're like most engineering teams, you find out about it when your Slack channel explodes with "Is GitHub down for everyone?" The average GitHub outage could cost teams 2-4 hours of developer productivity. For a 50-person engineering org, that's 100-200 hours of lost work — assuming you catch the outage immediately. Most teams don't.

Read Post

isDown

Read more about GitHub Outage Tracker: 5 Real-Time Monitoring Methods

Top server monitoring tools for 2026: A comprehensive comparison guide

Dec 29, 2025 By Geoffrin Edwin In Site24x7

IT infrastructure is now hyper-distributed. We are in a scale-in-seconds era and that means, a typical IT landscape is spread across on-premises data centers, public clouds (AWS, Azure, GCP), containerized environments, and edge locations. With many components comes more points of failure. A single server outage can cascade into customer-facing incidents, SLA violations, and revenue loss measured in thousands per minute.

Read Post

Site24x7

Read more about Top server monitoring tools for 2026: A comprehensive comparison guide

Unified Observability: What It Is and Why It Matters for Large Enterprises

Dec 29, 2025 By david.arrowsmith In Interlink

Modern enterprises operate within a digital ecosystem of staggering complexity - spanning on-premises systems, private and public clouds, APIs, containers and SaaS platforms. Business-critical services often rely on a mix of legacy infrastructure and modern applications, each producing huge volumes of metrics, log messages, traces and events.

Read Post

Interlink

Read more about Unified Observability: What It Is and Why It Matters for Large Enterprises

High Bandwidth Usage Detected - Causes, Impact, and Response

Dec 29, 2025 By Greg Collins In WhatsUp Gold

You log into your network monitoring dashboard and see the alert: “High bandwidth usage detected.” This is not just a routine message; it’s a sign that something is putting pressure on your network. Bandwidth is the backbone of modern connectivity, and when usage spikes unexpectedly, the consequences can be severe. Applications slow down, cloud costs rise, and in some cases, spikes may point to a security threat.

Read Post

WhatsUp Gold

Read more about High Bandwidth Usage Detected - Causes, Impact, and Response

From Firefighting to Foresight: Bright Beginnings for a New Year of IT Confidence

Dec 29, 2025 By Teneo In Teneo

When I was invited to join one of our customer’s end-of-year team wrap-up sessions, it came as no surprise when the meeting opened with a familiar refrain: “Next year will be different. Next year, we’ll get ahead of the noise. Next year, tickets won’t pile up while we’re still triaging yesterday’s issues.

Read Post

Teneo

Read more about From Firefighting to Foresight: Bright Beginnings for a New Year of IT Confidence

Smarter Slack Alerts with Rollbar + Zapier AI

Dec 29, 2025 By Rollbar In Rollbar

For many engineering teams, Slack is the nerve center of daily work. It’s where incidents are discussed, decisions are made, and collaboration happens in real time. But when it comes to error alerts, Slack can quickly turn from helpful to overwhelming with noisy, context-poor notifications that developers learn to ignore. By integrating Rollbar with Zapier AI, teams can transform raw error data into clear, actionable, and meaningful Slack messages, resulting in faster triage, less alert fatigue, and smoother developer workflows.

View Video

Rollbar

Read more about Smarter Slack Alerts with Rollbar + Zapier AI

Authorization Code Flow & redirect_uri_mismatch Errors: Monitoring & Fixing

Dec 29, 2025 By Dotcom-Monitor In Dotcom-Monitor

If you’ve implemented OAuth 2.0 using the Authorization Code Flow, chances are you’ve encountered the redirect_uri_mismatch error at least once. It’s one of the most common (and most misunderstood) OAuth failures teams face when integrating authentication into web applications. On paper, the error is simple. The authorization server compares the redirect URI sent in the request with the redirect URIs registered for the application.

Read Post

Dotcom-Monitor

Read more about Authorization Code Flow & redirect_uri_mismatch Errors: Monitoring & Fixing

Smarter Slack Alerts with Rollbar + Zapier AI

Dec 29, 2025 By Rollbar In Rollbar

Read Post

Rollbar

Read more about Smarter Slack Alerts with Rollbar + Zapier AI

JSONPath & JSON Validation for Web API Monitoring Assertions

Dec 29, 2025 By Dotcom-Monitor In Dotcom-Monitor

Most API monitoring setups still rely on a narrow definition of success: Did the endpoint respond, and did it return a 200 status code? While availability is essential, it’s no longer enough for modern, API-driven systems. In real production environments, APIs frequently return successful HTTP responses with incorrect or incomplete payloads. Authentication endpoints may issue tokens missing required fields. Business-critical APIs may return empty objects instead of valid data.

Read Post

Dotcom-Monitor

Read more about JSONPath & JSON Validation for Web API Monitoring Assertions

Application Monitoring 101: How to Correlate Average Response Time With Other Metrics

Dec 29, 2025 By Sarah Morgan In Scout

Average response time has become the default metric on many dashboards. It's easy to compute, easy to explain, and provides a single number to track over time. Of all the metrics available in application monitoring, this one feels closest to the actual user experience. But this simplicity can create a trap if you treat the average as a complete picture of system health. In fact, it’s really the starting point for a deeper investigation.

Read Post

Scout

Read more about Application Monitoring 101: How to Correlate Average Response Time With Other Metrics

Observability for Feature Flags

Dec 29, 2025 By Lewis Isaac In Coralogix

Some of your users are having a party; dancing away, having a great time. But a couple of users are stuck outside in the rain, knocking on the door, trying to get in. Unfortunately, you can’t hear them because of all the noise happening inside. That’s what it feels like when you gradually roll out new features across your user base without the right monitoring.

Read Post

Coralogix

Read more about Observability for Feature Flags

The Year in Making - Fabrix.ai 2025: From CloudFabrix to Agentic AI Leadership

Dec 28, 2025 By Shailesh Manjrekar In Fabrix

Just as NASA’s Artemis II mission represents humanity’s return to the Moon after more than 50 years, marking a pivotal moment in space exploration, Fabrix.ai has embarked on its own transformative journey in 2025. Artemis II—targeted for launch in February 2026 completed its crucial countdown demonstration test in December 2025, symbolizing humanity’s readiness to venture beyond Earth for deep space exploration and eventually return to the lunar surface.

Read Post

Fabrix

Read more about The Year in Making - Fabrix.ai 2025: From CloudFabrix to Agentic AI Leadership

Sponsored Post

2026: The Year Agentic AI Disrupts Observability, Security, and Enterprise SaaS

Dec 28, 2025 By Shailesh Manjrekar In Fabrix

The enterprise technology market is at an inflection point. 2026 will be the year agentic AI fundamentally disrupts how organizations approach observability, security, and IT automation. The traditional SaaS model—with its sprawling ecosystem of disconnected point solutions—is collapsing under complexity. What’s replacing it is a consolidated platform layer powered by autonomous agents that operate across systems, consolidate data, and execute workflows autonomously.

Read Post

Fabrix

Read more about 2026: The Year Agentic AI Disrupts Observability, Security, and Enterprise SaaS

Online HTTP Clients vs Web API Monitoring: When Each Makes Sense

Dec 28, 2025 By Dotcom-Monitor In Dotcom-Monitor

When teams talk about online HTTP clients, they’re usually referring to quick, browser-based ways to send requests, especially HTTP POST requests, without standing up local tooling or infrastructure. These tools are popular for good reason. They make it easy to submit payloads, test headers, and inspect responses in real time. For developers, QA engineers, and DevOps teams, they’re often the fastest way to answer a simple question: Does this request work?

Read Post

Dotcom-Monitor

Read more about Online HTTP Clients vs Web API Monitoring: When Each Makes Sense

Monitoring JWT Tokens & OAuth Token Endpoints: How to Catch Authentication Failures Before APIs Break

Dec 28, 2025 By Dotcom-Monitor In Dotcom-Monitor

Modern APIs rarely fail because the application logic is down. More often, they fail because authentication breaks upstream, silently. OAuth token endpoints and JWT-based authentication sit at the front of nearly every protected API. When they degrade, misconfigure, or stop issuing valid tokens, every dependent API call fails, even if the API itself is healthy.

Read Post

Dotcom-Monitor

Read more about Monitoring JWT Tokens & OAuth Token Endpoints: How to Catch Authentication Failures Before APIs Break

Monitoring OAuth 2.0 & Secure Web API Authentication Flows

Dec 27, 2025 By Dotcom-Monitor In Dotcom-Monitor

OAuth 2.0 is often treated as a solved security problem; configured once, then forgotten. In reality, OAuth-based authentication is one of the most fragile dependencies in modern API ecosystems. When OAuth breaks, APIs don’t just degrade gracefully; they often fail completely. For DevOps and engineering teams, OAuth 2.0 authentication sits before application logic, before business rules, and before observability inside the service itself.

Read Post

Dotcom-Monitor

Read more about Monitoring OAuth 2.0 & Secure Web API Authentication Flows

Introducing Dossinth AI (formerly Silicon Sage)

Dec 27, 2025 By Lucian Daniliuc In Monitive

The AI saga continues. I feel like every business felt like they had to add some AI into their product to stay relevant, but reality shows us that's not the case. Same thing at Monitive. Not looking to add AI to our service until it proves its usefulness. However, using it as an internal tool is all doable. I am still not 100% convinced of its usefulness, but I am hoping to radically improve my relationship with our customers, and - thus - the service itself.

Read Post

Monitive

Read more about Introducing Dossinth AI (formerly Silicon Sage)

API Testing vs Web API Monitoring: Postman, Online Tools, and WebView

Dec 27, 2025 By Dotcom-Monitor In Dotcom-Monitor

APIs sit at the core of modern applications. They power mobile apps, connect microservices, and enable third-party integrations, making them critical to performance, reliability, and revenue. That’s why most teams invest heavily in API testing tools like Postman, automated test suites, and online API testers. And yet, production outages still happen. This disconnect (“our APIs were tested, so why did they fail?”) is where confusion between API testing and Web API monitoring begins.

Read Post

Dotcom-Monitor

Read more about API Testing vs Web API Monitoring: Postman, Online Tools, and WebView

Using Mobot with Query Agent (How to)

Dec 26, 2025 By Sumo Logic, Inc. In Sumo Logic

This video show how to access Mobot using Sumo Logic UI and demonstrates how Mobot helps you write log search queries in natural language.

View Video

Sumo Logic

Read more about Using Mobot with Query Agent (How to)

VictoriaMetrics Anomaly Detection: 2025 Roadmap & Features (vmanomaly)

Dec 26, 2025 By VictoriaMetrics In VictoriaMetrics

Discover the latest advancements in AI-driven monitoring with VictoriaMetrics. Fred Navruzov, Lead of the Anomaly Detection team, presents a comprehensive year-in-review for vmanomaly (part of the VictoriaMetrics Enterprise suite). This session dives into how we are making machine learning more accessible for SREs through new interactive tools and protocol integrations. Key Highlights: 2025 Recap: A look back at the major releases and improvements in vmanomaly. Interactive Playgrounds: A demo of our new environment for testing anomaly detection models before deployment. MCP Server Integration.

View Video

VictoriaMetrics

Read more about VictoriaMetrics Anomaly Detection: 2025 Roadmap & Features (vmanomaly)

From Postman Collections to 24/7 Web API Monitoring (Step-by-Step)

Dec 26, 2025 By Dotcom-Monitor In Dotcom-Monitor

Postman API test automation is a critical part of modern API development. Teams rely on Postman collections, scripts, and automated tests to validate endpoints, catch functional issues early, and ensure APIs behave correctly during development and CI/CD pipelines. But as APIs move into production, test automation alone leaves important gaps.

Read Post

Dotcom-Monitor

Read more about From Postman Collections to 24/7 Web API Monitoring (Step-by-Step)

A 2026 essential features checklist for choosing the best synthetic monitoring tools

Dec 26, 2025 By Dotcom-Monitor In Dotcom-Monitor

Advanced synthetic monitoring has become an essential for any strong digital plan, not simply a nice-to-have feature. Our reasoning is because users expect websites and apps to work well all the time. As we approach 2026, the top synthetic monitoring solutions have greatly improved, going from just checking if a service is online to ensuring a complete and positive digital experience.

Read Post

Dotcom-Monitor

Read more about A 2026 essential features checklist for choosing the best synthetic monitoring tools

Engineering robust monitoring scripts using advanced synthetic monitoring software

Dec 26, 2025 By Dotcom-Monitor In Dotcom-Monitor

Synthetic monitoring evolved from simple uptime checks to a complex technical field in modern digital operations. The real challenge for organizations that use synthetic monitoring software isn’t implementing the monitoring; it’s writing scripts that stay accurate, simple to maintain, and resistant to changes in the application.

Read Post

Dotcom-Monitor

Read more about Engineering robust monitoring scripts using advanced synthetic monitoring software

What is high cardinality, and is it as scary as people make it out to be?

Dec 26, 2025 By Dawid Dębowski In Grafana

Dawid Dębowski is a software engineer at G2A.COM and a Grafana Champion. Holding an MS of Computer Science, Dawid’s main fields of interest related to observability are PromQL and data visualizations using Grafana. If you’ve ever worked with custom metrics in a Prometheus environment, you've probably heard about something called "high cardinality"—or at least I hope you have.

Read Post

Grafana

Read more about What is high cardinality, and is it as scary as people make it out to be?

Optimizing Datadog at scale: Cost-efficient observability at Zendesk

Dec 26, 2025 By Anatoly Mikhaylov In Datadog

This guest blog post is authored by Anatoly Mikhaylov, a Principal Engineer at Zendesk and Datadog Ambassador, and by Nick Hefty, a Senior Engineer at Zendesk.

Read Post

Datadog

Read more about Optimizing Datadog at scale: Cost-efficient observability at Zendesk

How Browser Hijackers Impact Enterprise Observability and Monitoring Tools

Dec 26, 2025 By OpsMatters In OpsMatters

The browser is an essential component for enterprise execution. Given the browser's importance, observability relies on accurate, trustworthy telemetry. Browser hijackers are a dangerous threat because they operate below the radar and introduce operational risks that undermine monitoring reliability, degrade signal quality, and affect decision-making and telemetry across an enterprise's ecosystem.

Read Post

OpsMatters

Read more about How Browser Hijackers Impact Enterprise Observability and Monitoring Tools

Top Tips for Building a Knowledge-Sharing Culture

Dec 25, 2025 By Monideepa Mrinal Roy In ManageEngine

Top Tips is a weekly column where we highlight what’s trending in the tech world and list ways to explore these trends. This week, we’re tackling a critical challenge for modern organizations: creating a culture where knowledge flows freely. Let’s be honest—a team that doesn’t share knowledge eventually hits a wall. In today’s fast-paced environment, keeping information siloed only leads to slow decisions, repeated errors, and missed chances to improve.

Read Post

ManageEngine

Read more about Top Tips for Building a Knowledge-Sharing Culture

Using the Uptime com Transaction Recorder

Dec 25, 2025 By Uptime Website Monitoring In uptime

Learn how to easily set up a Transaction Check with Uptime.com's Google Chrome-based Transaction Recorder. This guide covers navigating the main checks page, recording and editing steps, and testing configurations.

View Video

uptime

Monitoring

Read more about Using the Uptime com Transaction Recorder

How Advanced Synthetic Monitoring Ensures Compliance and 24/7 Uptime in Financial Services

Dec 25, 2025 By Dotcom-Monitor In Dotcom-Monitor

Advanced synthetic monitoring has gone from being a technical convenience to a regulatory and operational necessity in today’s dynamic financial services ecosystem, where there is no scope for error. Traditional uptime testing just checks to determine if systems are accessible.

Read Post

Dotcom-Monitor

Read more about How Advanced Synthetic Monitoring Ensures Compliance and 24/7 Uptime in Financial Services

Site24x7 wrapped 2025: A year of growing together

Dec 24, 2025 By Sri Varalakshmi In Site24x7

The business world doesn’t expect its best gifts to be wrapped in ribbons. Gifts come in the form of quiet moments when everything you’ve invested in and count on comes through. For example, when you leave the office knowing your systems are in good hands. That's what 2025 was about for us at Site24x7. We evolved towards building the kind of observability that doesn’t just feel like surveillance, but more like consistent support by your side.

Read Post

Site24x7

Read more about Site24x7 wrapped 2025: A year of growing together

Simplify hybrid network monitoring with OpManager Plus

Dec 24, 2025 By Allan In ManageEngine

Enterprise networks have evolved from simple on-premises network into a sprawling ecosystem of on-premises infrastructure, cloud platforms, virtual machines, and containerized applications. Organizations are embracing hybrid and multi-cloud strategies to gain agility, scalability, and resilience. But with that evolution comes greater complexity and a new set of challenges for IT teams in maintaining the performance of the hybrid networks.

Read Post

ManageEngine

Read more about Simplify hybrid network monitoring with OpManager Plus

How to perform HTTP checks in Grafana Cloud Synthetic Monitoring

Dec 24, 2025 By Bukola Ayodele In Grafana

Your users should not be the first to know when your application goes down. When HTTP endpoints fail or respond sluggishly, users experience timeouts, connection errors, and degraded performance — often without clear indication of the root cause. This is where HTTP checks in Grafana Cloud Synthetic Monitoring come in, allowing you to proactively monitor your endpoints, verify they're online, measure response times, and ensure they're returning the correct status codes.

Read Post

Grafana

Read more about How to perform HTTP checks in Grafana Cloud Synthetic Monitoring

Heartbeat behind the metrics | The people behind Site24x7

Dec 24, 2025 By ManageEngine Site24x7 In Site24x7

The best products aren't built in isolation—they're built in conversation. This video brings together voices from across the Site24x7 leadership team for an honest conversation about what it takes to build an observability platform that teams rely on every single day. You'll hear from our product managers and support team heads who've spent years thinking about one persistent question: how do you turn data into clarity when systems continue to become more complex?

View Video

Site24x7

Monitoring

Read more about Heartbeat behind the metrics | The people behind Site24x7

Why Use a Purpose-Built Time Series Database

Dec 24, 2025 By Cole Bowden In InfluxData

A time series database has a straightforward definition: it’s a database purpose-built for efficiently ingesting, storing, and querying time series data. Time series data is any data with a timestamp, collected regularly or periodically, that you’ll often visualize on graphs where the X-axis is time. This definition doesn’t quite tell you what sets it apart from other types of databases, though.

Read Post

InfluxData

Read more about Why Use a Purpose-Built Time Series Database

2025: The year of the global cloud outage

Dec 23, 2025 By Colin Bartlett In StatusGator

StatusGator has been monitoring the world’s cloud services for more than 10 years now. We’ve seen outages, big and small, affect companies of all sizes for more than a decade. Yet as we close out 2025, it feels like the last 12 months brought us some of the biggest outages in the history of the internet. In fact, by our data, this is true! Never before in history have so many huge outages taken down so much of the internet, in such a short time.

Read Post

StatusGator

Read more about 2025: The year of the global cloud outage

Blameless Postmortem: Foundation of Site Reliability

Dec 23, 2025 By Nuno Tomas In isDown

When systems fail, the instinct to find someone to blame runs deep. But what if assigning fault actually makes your systems less reliable? A blameless postmortem culture transforms how teams learn from incidents, creating stronger systems and more effective incident response processes.

Read Post

isDown

Read more about Blameless Postmortem: Foundation of Site Reliability

Sponsored Post

Avantra 25.2: Enhancing Security and Reducing Complexity in Hybrid SAP Landscapes

Dec 23, 2025 By Oliver Rogers In Avantra

I am pleased to announce the release of Avantra 25.2! While 25.2 is a service release focused on software stability, it introduces several powerful new features designed to streamline SAP automation and improve operational resilience. Let's break down the key deliverables and benefits for Avantra users in this release.

Read Post

Avantra

Read more about Avantra 25.2: Enhancing Security and Reducing Complexity in Hybrid SAP Landscapes

Is Northern Virginia Still the Least Reliable AWS Region in 2025? We Analyzed the Data

Dec 23, 2025 By Andy Libby In StatusGator

This updated analysis is based on StatusGator outage data collected from January 1 to December 9, 2025. We decided to review our AWS analysis of outages in 2022 due to several new AWS incidents, especially another widely discussed AWS outage in us-east-1 (N. Virginia) that occurred on October 20, 2025. We’ve expanded the report with fresh 2025 regional data as well as a new breakdown of affected AWS services.

Read Post

StatusGator

Read more about Is Northern Virginia Still the Least Reliable AWS Region in 2025? We Analyzed the Data

Why DPDP compliance must include network configuration governance

Dec 23, 2025 By Rama Venkatesan In Site24x7

India’s Digital Personal Data Protection (DPDP) Act places accountability on how organizations collect, process, and store personal data to help organizations stay steps ahead of threat actors. Forrester’s CIO roadmap highlights a clear shift: compliance is no longer limited to policies and consent workflows. CIOs must extend governance deeper into the technology stack, including infrastructure that directly impacts data security.

Read Post

Site24x7

Read more about Why DPDP compliance must include network configuration governance

Empowering IT teams: Site24x7's mobile app updates in 2025

Dec 23, 2025 By Jyotsna R In Site24x7

Present IT structure requires flexibility, speed, and accessibility. This year marked a significant milestone for Site24x7 as we launched our enhanced mobile application, transforming how IT teams manage their infrastructure while on the go. Whether you're responding to critical alerts during your commute or reviewing performance metrics between meetings, Site24x7's mobile app puts its entire suite of monitoring capabilities directly in your hands.

Read Post

Site24x7

Read more about Empowering IT teams: Site24x7's mobile app updates in 2025

Powering modern IT with a smarter observability platform

Dec 23, 2025 By Janani Sekar In Site24x7

Since its inception, the Site24x7 platform has been the central pillar of monitoring. In 2025, it evolved beyond monitoring to become a comprehensive decision-making layer for modern IT operations. With a strong focus on usability, intelligence, governance, and scalability, this year’s enhancements were designed to help teams see clearly, act decisively, and plan confidently for the future.

Read Post

Site24x7

Read more about Powering modern IT with a smarter observability platform

Component statuses: Now in the API

Dec 23, 2025 By Valeria Kurolapova In StatusGator

The StatusGator API continues to expand with new end points to help support the wide variety of use cases our customers have. We just released two new APIs: In case you missed, it component filtering is one of StatusGator’s most important features, allowing you to filter your service monitor to just the specific products, regions, or features you use. It’s an essential setup step that helps minimize noise.

Read Post

StatusGator

Read more about Component statuses: Now in the API

Splunk Pricing & Costs: Free vs Enterprise

Dec 23, 2025 By Alexandr Bandurchin In Uptrace

Understanding Splunk pricing is crucial for organizations evaluating SIEM solutions. This guide examines licensing models, actual costs, and essential pricing factors to help you make an informed investment decision for your security and monitoring needs.

Read Post

Uptrace

Read more about Splunk Pricing & Costs: Free vs Enterprise

Driving AI ROI: How Datadog connects cost, performance, and infrastructure so you can scale responsibly

Dec 23, 2025 By Patrick Krieger In Datadog

AI innovation has accelerated faster than most organizations’ ability to monitor and manage it. The shift from experimentation to production-scale workloads has driven a new class of operational challenges: rising GPU costs, opaque model performance, and the difficulty of linking spend to business value. As AI investments grow, executives need a unified way to measure efficiency and return without slowing down innovation.

Read Post

Datadog

Read more about Driving AI ROI: How Datadog connects cost, performance, and infrastructure so you can scale responsibly

How to Integrate App Synthetic Monitoring into Your CI/CD Pipeline for Flawless Deployments Meta Description:

Dec 23, 2025 By Dotcom-Monitor In Dotcom-Monitor

In today’s age of continuous delivery, a failed deployment or a drop in performance can affect thousands of users in just a few minutes. Traditional testing happens before deployment, but what about after the code is live? This is where app synthetic monitoring becomes a critical part of your CI/CD pipeline. Integrating synthetic monitoring into CI/CD transforms your pipeline from a simple delivery mechanism into a proactive quality and performance gatekeeper.

Read Post

Dotcom-Monitor

Read more about How to Integrate App Synthetic Monitoring into Your CI/CD Pipeline for Flawless Deployments Meta Description:

2026 Observability Predictions: What Lies Ahead?

Dec 23, 2025 By Asaf Yigal In logz.io

What remains of the 2025 AI hype? After a year of “AI will fix everything” promises, engineering teams in 2025 hit a wall of reality: AI is a tool, not a magic bullet. We’re now seeing a more practical approach: identifying broken workflows and tasks where AI can help and leveraging AI strengths like data analysis at speed and scale to derive meaningful, valuable insights. Looking ahead, 2026 will reward organizations that combine AI innovation with a practical approach.

Read Post

logz.io

Read more about 2026 Observability Predictions: What Lies Ahead?

Part 3: What If IT Stopped Reacting to Incidents and Started Predicting Them?

Dec 23, 2025 By ScienceLogic In ScienceLogic

Enterprises are experiencing a turning point. Systems scale faster than teams can, AI is rewriting the rhythms of operations, and the cost of downtime grows heavier every quarter. In this new landscape, reacting is no longer enough. Teams need foresight. They need to get ahead of the issue. They need a different model entirely. This third installment centers on a simple but transformative idea. What if IT operations could finally step out of reaction mode and move into anticipation?

Read Post

ScienceLogic

Read more about Part 3: What If IT Stopped Reacting to Incidents and Started Predicting Them?

Detect, diagnose, and resolve network issues easily with CNM Network Health

Dec 23, 2025 By Kai Cai In Datadog

In many organizations, developers, SREs, network engineers, and security teams work in specialized domains, which can make it hard to establish a shared view of network health. As a result, engineers often struggle to determine when a network problem that originates outside of their domain of expertise is the root cause of an incident. This lack of visibility slows investigations and delays remediation.

Read Post

Datadog

Read more about Detect, diagnose, and resolve network issues easily with CNM Network Health

Platform Engineering: Error Budgets Explained Simply #shorts

Dec 23, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

Platform engineering provides powerful tools that handle a lot under the hood. Learn how to calculate your remaining error budget with a simple formula using real numbers and objective statements.

View Video

Last9

Read more about Platform Engineering: Error Budgets Explained Simply #shorts

Automate Cloud Cost Optimization Across Workloads

Dec 23, 2025 By Datadog In Datadog

See how Datadog Cloud Cost Management combines observability and cost data with actionable automation to help teams optimize spend. In this short demo, you’ll learn how to: With Datadog Cloud Cost Management, cost optimization is built into the same platform engineers use every day.

View Video

Datadog

Read more about Automate Cloud Cost Optimization Across Workloads

OpenTelemetry Metrics: Traces, Logs & Prometheus Integration #shorts

Dec 23, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

OpenTelemetry aims to link metrics to traces and logs, offering OpenCensus users a seamless migration path. Work with existing protocols like Prometheus. Leverage existing tooling without learning something completely new.

View Video

Last9

Read more about OpenTelemetry Metrics: Traces, Logs & Prometheus Integration #shorts

OpenTelemetry: Components, SDKs, and Middleware Explained #shorts

Dec 23, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

OpenTelemetry explained: standards, SDKs for various languages (Ruby, Python, Go), and middleware tools. Deploy these to pre-process data and send it to your destination.

View Video

Last9

Read more about OpenTelemetry: Components, SDKs, and Middleware Explained #shorts

Sampled analysis of 10 billion spans with Coralogix highlight comparison

Dec 23, 2025 By Coralogix Team In Coralogix

The CNCF reported that between 39% and 56% of organizations surveyed are now ingesting traces as part of their observability strategy. Tracing has become a cornerstone of any modern observability operation. Customers are regularly handling 10s of billions of spans every day, but with billions of spans, how can teams quickly figure out what is changing, what’s breaking, or what’s slowing down?

Read Post

Coralogix

Read more about Sampled analysis of 10 billion spans with Coralogix highlight comparison

Implementing SLOs: Our Scale Mistakes and Successes #shorts

Dec 23, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

30 minutes of eating crow! Learn from our SLO mistakes at Weave. Discover pitfalls and shortcuts to doing it right the first time. Avoid our wrong, wrong, wrong, wrongs!

View Video

Last9

Read more about Implementing SLOs: Our Scale Mistakes and Successes #shorts

Introducing Real-Time Conversations with Netdata AI

Dec 23, 2025 By Shyam Sreevalsan In netdata

Over the past few months, we’ve seen incredible adoption of our AI Investigations and Insights reports. Teams are using them to automate the deep, thoughtful analysis required for complex post-mortems, capacity planning, and performance optimization. These comprehensive reports are fantastic when you need a well-researched, shareable document. But what about the moments during an investigation?

Read Post

netdata

Read more about Introducing Real-Time Conversations with Netdata AI

Grafana community dashboards: Memorable use cases of 2025

Dec 23, 2025 By Colin Steele In Grafana

Every year, Grafana dashboards surface in new corners of the world. And this year, they even reached beyond this world—helping one team land on the moon and another monitor the planet’s health with orbiting satellites. Meanwhile, back here on Earth, the community used Grafana to track everything from wind turbines and wastewater to March Madness and Taylor Swift’s worldwide tour. Here’s a look back at some of the most memorable Grafana community dashboards of 2025.

Read Post

Grafana

Read more about Grafana community dashboards: Memorable use cases of 2025

Drive business outcomes with Unit Economics in Datadog Cloud Cost Management

Dec 23, 2025 By Datadog In Datadog

See how Datadog turns cloud usage and performance data into actionable business insights by helping teams calculate unit economics to measure and optimize the efficiency of every service. You’ll discover how to: Datadog bridges the gap between cloud costs and business value—helping organizations get the most value out of their cloud investment.

View Video

Datadog

Read more about Drive business outcomes with Unit Economics in Datadog Cloud Cost Management

From Waste to Asset: Transforming Inefficient Systems into Strategic Business Power

Dec 23, 2025 By OpsMatters In OpsMatters

Is your technology working for you or against you? For many business leaders, the answer feels obvious. You see the symptoms every day: frequent downtime, slow performance that grinds productivity to a halt, and a constant stream of frustrating disruptions that pull your team away from their real work. These aren't just minor annoyances; they are significant financial liabilities.

Read Post

OpsMatters

Read more about From Waste to Asset: Transforming Inefficient Systems into Strategic Business Power

CloudSpend in 2025: Making cloud cost management easier at scale

Dec 22, 2025 By Sinjan Ballav In ManageEngine

In 2025, cloud environments became more distributed, and cloud costs followed suit. Managing spend across multiple providers, teams, and business units required a more deliberate, governed approach, when visibility alone was no longer enough. Organizations needed clearer ownership, better structure, and tools that could scale alongside their cloud usage.

Read Post

ManageEngine

Read more about CloudSpend in 2025: Making cloud cost management easier at scale

Sponsored Post

Cloud Outages Are Rising: How Early Signals Help IT Teams Respond Faster in 2026

Dec 22, 2025 By StatusGator In StatusGator

Cloud outages used to be rare, headline-making events. Today, they're part of the daily reality of running digital operations. Whether triggered by a configuration error, network routing issue, API failure, or global infrastructure disruption, cloud incidents now occur frequently, propagate quickly, and affect more services than ever before. In 2025, one trend has become undeniable: Teams that detect cloud outages early experience less downtime, respond faster to incidents, and avoid unnecessary internal chaos.

Read Post

StatusGator

Read more about Cloud Outages Are Rising: How Early Signals Help IT Teams Respond Faster in 2026

Digital Risk Analyzer 2025: Digital security fortified

Dec 22, 2025 By Janani Sekar In Site24x7

As digital adoption accelerates, the attack surface expands just as rapidly. Digital Risk Analyzer consistently evolves to secure your digital frontiers, and 2025 is no exception. This year-end recap highlights the key enhancements introduced in past months and how we continue to deliver tangible value to end users.

Read Post

Site24x7

Read more about Digital Risk Analyzer 2025: Digital security fortified

Why OTT platforms crash and what it teaches us about traffic surges

Dec 22, 2025 By Rama Venkatesan In Site24x7

Minutes after the newest episodes of a beloved series dropped, a well-known streaming OTT (over-the-top) platform crashed. The impact was instant: streams wouldn’t load, logins failed, and users across regions started refreshing their screens, wondering if the issue was on their end. Outages like this don’t often happen, especially for an engineered and distributed platform—which is precisely why this incident caught attention.

Read Post

Site24x7

Read more about Why OTT platforms crash and what it teaches us about traffic surges

Faster resolution, better outcomes: Site24x7's digital experience monitoring innovation recap

Dec 22, 2025 By Jyotsna R In Site24x7

Another year, another leap forward in digital experience monitoring. As we wrap up 2025, we're thrilled to reflect on the transformative capabilities we've brought to Site24x7—innovations designed around one core mission: helping you deliver a flawless user journey worldwide. This year, we focused on closing visibility gaps, eliminating blind spots, and putting effective insights directly into your hands.

Read Post

Site24x7

Read more about Faster resolution, better outcomes: Site24x7's digital experience monitoring innovation recap

New Option: Preserve URL Casing

Dec 22, 2025 By Todd H. Gardner In Request Metrics

Most web servers treat URLs as case-insensitive. A request to /About-Us lands on the same page as /about-us or /ABOUT-US. So when Request Metrics captures your traffic, we normalize all URLs to lowercase to prevent these duplicates from cluttering your reports. But not every system works that way. Some web frameworks (looking at you, Node and Python) treat URL casing as meaningful. /User/Profile and /user/profile might be completely different routes.

Read Post

Request Metrics

Read more about New Option: Preserve URL Casing

Get started with Grafana Alerting: Link alerts to visualizations

Dec 22, 2025 By Grafana In Grafana

In this tutorial you will learn how to link alert rules to time series panels for better visualization. Don't miss the rest of the "Get started with Grafana Alerting" series! Each part dives into a different feature to help you get the most out of alerting in Grafana.

View Video

Grafana

Read more about Get started with Grafana Alerting: Link alerts to visualizations

How microservice architectures have shaped the usage of database technologies

Dec 22, 2025 By Bowen Chen In Datadog

In the late 2000s, the big question in database design was SQL or NoSQL. While relational databases had long held their ground, document and key-value stores were emerging as serious alternatives. Many predicted a zero-sum, winner-take-all outcome. But when we look at how organizations are using database technologies today, no single tool or category has dominated the landscape.

Read Post

Datadog

Read more about How microservice architectures have shaped the usage of database technologies

Securing customer logins with breach intelligence

Dec 22, 2025 By Gaëtan Piquenot In Datadog

Account takeovers (ATOs) are one of the most common threats facing online platforms. Attackers buy leaked usernames and passwords on underground markets then test them at scale across websites, hoping that password reuse will give them easy access. Today, ATOs have grown so sophisticated and fast-moving that manual incident response often can’t keep pace, requiring intelligent defense systems for detecting compromised credentials and preventing misuse at scale.

Read Post

Datadog

Read more about Securing customer logins with breach intelligence

The Ultimate Blueprint for Successful Synthetic Monitoring Implementation

Dec 22, 2025 By Dotcom-Monitor In Dotcom-Monitor

In today’s digital world, the performance of websites and apps has a direct effect on sales, customer satisfaction, and brand reputation. Synthetic performance monitoring provides the proactive intelligence needed to ensure your application is always performing optimally. By simulating real user interactions from global locations before issues affect actual users, you transform from reactive problem-solving to proactive performance excellence.

Read Post

Dotcom-Monitor

Read more about The Ultimate Blueprint for Successful Synthetic Monitoring Implementation

Christmas in the Cloud

Dec 22, 2025 By solarwindsinc In SolarWinds

Tis the season for love... and optimization.

View Video

SolarWinds

Read more about Christmas in the Cloud

Top 3 Trends Defining Network Observability in 2026

Dec 22, 2025 By Yann Guernion In Broadcom

As we enter 2026, the dust has settled on the initial explosion of hybrid work and cloud adoption. The "new normal" is no longer new; it is simply operations as usual. However, the tools we use to manage this ecosystem are undergoing a massive correction. The fragmented, tool-sprawl approach of the early 2020s is proving unsustainable in the face of growing network complexity. Network operations teams are no longer looking for more data; they are looking for better answers.

Read Post

Broadcom

Read more about Top 3 Trends Defining Network Observability in 2026

Reducing OpenTelemetry Bundle Size in Browser Frontend

Dec 22, 2025 By Elizabeth Mathew In SigNoz

When I was building applications, I used to always rely on the DevTools console of my web browser to examine logs in the frontend. But, with UI log messages only being accessible within your browser rather than forwarded to a file somewhere, which is the common pattern with backend services, losing visibility of this resource when triaging user issues was a real dilemma.

Read Post

SigNoz

Read more about Reducing OpenTelemetry Bundle Size in Browser Frontend

Simplifying Microsoft Sentinel Integration: VirtualMetric DataStream Connectors in Content Hub

Dec 22, 2025 By VirtualMetric In VirtualMetric

Microsoft Sentinel adoption often introduces unexpected complexity. While the platform delivers powerful SIEM and XDR capabilities, organizations frequently struggle with manual DCR configuration, inconsistent data quality, rising ingestion costs, and security risks associated with credential-based integrations. VirtualMetric DataStream is now available in the Microsoft Sentinel Content Hub, reducing the effort required to deploy normalized and cost-optimized data ingestion.

Read Post

VirtualMetric

Read more about Simplifying Microsoft Sentinel Integration: VirtualMetric DataStream Connectors in Content Hub

OTel Updates: OpenTelemetry Deprecates Zipkin Exporters

Dec 22, 2025 By Anjali Udasi In Last9

OpenTelemetry is deprecating the Zipkin exporter specification. Zipkin now supports OTLP ingestion natively, so the custom exporter logic in OTel SDKs is no longer necessary.

Read Post

Last9

Read more about OTel Updates: OpenTelemetry Deprecates Zipkin Exporters

IoT Sensor Data into Graylog: A Lab Guide

Dec 22, 2025 By Jeff Darrington In Graylog

Graylog has always been associated with log management, metrics, SIEM and security monitoring—but it’s also a great tool for creative, low-cost experiments in a home lab. I wanted to use it for real-world sensor data, so I built a DIY temperature and humidity monitor using an ESP-WROOM-32 development board and a DHT22 sensor.

Read Post

Graylog

Read more about IoT Sensor Data into Graylog: A Lab Guide

How to Audit AI-Written Pull Requests Without Burning Out

Dec 22, 2025 By Rollbar In Rollbar

If it feels like your GitHub notifications are a targeted DDoS attack on your brain, you aren't imagining it. Data from GitHub's Octoverse 2025 report shows an average of 43.2 million pull requests merged every month, a 23% jump from just a year ago. This surge in activity coincides with the widespread adoption of AI tools to write code. The temptation to just click "Approve" on a well-formatted AI-written pull request is higher than ever.

Read Post

Rollbar

Read more about How to Audit AI-Written Pull Requests Without Burning Out

Calm Under Pressure: Ending the Year Without the Fire Drills

Dec 22, 2025 By Teneo In Teneo

From the outside looking in, I have seen that year end in financial services is not for the faint-hearted. Markets tighten, trading volumes swell, payment systems hit their annual peak, and regulatory reporting deadlines stack up like dominoes. In this environment, even a few seconds of lag can mean missed trades, delayed transactions, frustrated clients, or worse, financial loss and reputational damage. This is precisely when IT needs to be at its calmest.

Read Post

Teneo

Read more about Calm Under Pressure: Ending the Year Without the Fire Drills

Configuration as Intelligence: The New Operating System of Resilience

Dec 22, 2025 By ScienceLogic In ScienceLogic

Modern IT operations live in constant flux. New tools appear, workloads shift to the cloud, architectures fragment, and every device, application, and user brings its own update rhythm. In this state of constant motion, reliability isn’t a static condition; it’s a dynamic discipline. For years, organizations have relied on observability and monitoring to keep systems running. But those tools only tell half the story.

Read Post

ScienceLogic

Read more about Configuration as Intelligence: The New Operating System of Resilience

5 Best Proxies for Geo-Targeted Campaign Monitoring (2026)

Dec 22, 2025 By OpsMatters In OpsMatters

Geo-targeted campaign monitoring is only as accurate as the vantage point you test from. If your checks always come from the same IP range, country, or data center footprint, you'll miss the exact issues you're trying to catch: region-specific ad delivery problems, localized SERP differences, mismatched landing pages, price inconsistencies, affiliate compliance issues, and unnecessary blocks triggered by anti-bot systems.

Read Post

OpsMatters

Read more about 5 Best Proxies for Geo-Targeted Campaign Monitoring (2026)

A Deep Dive into Synthetic API Monitoring

Dec 21, 2025 By Dotcom-Monitor In Dotcom-Monitor

Consider this scenario: Your mobile app shows “Network Error” to 30% of users. Your dashboard shows that all of your servers are green. Your support team is quite busy. After four hours of feverish searching, you discover an issue. One of your 47 microservices is responding with a 200 OK status but returning malformed JSON that crashes client applications.

Read Post

Dotcom-Monitor

Read more about A Deep Dive into Synthetic API Monitoring

How to Monitor SSL Certificate Expiration - Complete 2025 Guide

Dec 21, 2025 By Dotcom-Monitor In Dotcom-Monitor

Nowadays, it is very essential to keep your website secure. One of the simplest yet overlooked ways to protect your website is by monitoring SSL certificate expiration. Many website owners do not realise how quickly an SSL certificate expired. This can be damaging for the website. Imagine if you wake up in the morning and get to know that the red “Not SECURE” sign appears to the visitors. This is going to create a bad impression on the audience.

Read Post

Dotcom-Monitor

Read more about How to Monitor SSL Certificate Expiration - Complete 2025 Guide

The Ultimate Guide to Kafka Monitoring Best Practices, Metrics, and Tools

Dec 21, 2025 By Staff Member In SolarWinds

If you’re operating modern, data-driven applications—which, let’s face it, you likely are—Kafka serves as the central streaming platform, delivering data in real-time. It’s impressive, extremely fast, and exceptionally powerful for achieving high throughput and scalability. But here’s the catch: with significant power comes the need for vigilant oversight. Neglecting your Kafka environment is like driving a racecar with your eyes closed. It’s bound to end badly.

Read Post

SolarWinds

Read more about The Ultimate Guide to Kafka Monitoring Best Practices, Metrics, and Tools

How AI Security Cameras Transform Traditional CCTV into Proactive Protection

Dec 21, 2025 By OpsMatters In OpsMatters

Here are the key takeaways on how AI is revolutionising security.

Read Post

OpsMatters

Read more about How AI Security Cameras Transform Traditional CCTV into Proactive Protection

Microsoft Teams outage on December 19, 2025

Dec 20, 2025 By Andy Libby In StatusGator

On December 19, 2025, Microsoft Teams experienced a performance degradation that affected communication for various users. Despite a significant volume of reports from the community, official health dashboards remained in a normal status throughout the event. This incident serves as a case study for why IT teams benefit from secondary monitoring sources.

Read Post

StatusGator

Read more about Microsoft Teams outage on December 19, 2025

Looking back at 2025: Innovations that shaped DevOps and observability

Dec 19, 2025 By Sangavi Dass In Site24x7

The year 2025 has been exciting for Site24x7, packed with innovations designed to make monitoring smarter, faster, and more intuitive. From enhanced APM insights and deeper database observability to a more powerful log management experience and AI-driven plugin enhancements, we’ve focused on giving teams the tools they need to troubleshoot faster, gain clearer insights, and manage complex environments with ease. Let’s rewind and see our 2025 highlights.

Read Post

Site24x7

Read more about Looking back at 2025: Innovations that shaped DevOps and observability

ManageEngine recognized in the 2025 Gartner® Magic Quadrant for Digital Experience Monitoring!

Dec 19, 2025 By Applications Manager In ManageEngine

ManageEngine has been recognized in the Gartner® Magic Quadrant™ for Digital Experience Monitoring (DEM), affirming our commitment to delivering superior enterprise service and user experience.

Read Post

ManageEngine

Read more about ManageEngine recognized in the 2025 Gartner® Magic Quadrant for Digital Experience Monitoring!

Five Worthy Reads: Greener IT Starts Here: How AI Is Transforming Operational Efficiency

Dec 19, 2025 By meghna.menon@zohocorp.com In ManageEngine

Five worthy reads is a regular column on five noteworthy items we’ve discovered while researching trending and timeless topics. This week, we learn more about how AI transforms sustainability in the enterprise. As enterprises accelerate their digital journeys, IT teams are under pressure to deliver faster, smarter, and more sustainable operations.

Read Post

ManageEngine

Read more about Five Worthy Reads: Greener IT Starts Here: How AI Is Transforming Operational Efficiency

ManageEngine Recognized in IDC MarketScape for Observability 2025

Dec 19, 2025 By OpManager Plus In ManageEngine

ManageEngine earns Major Player status for helping enterprises achieve resilient, high-performance digital ecosystems through unified full-stack observability.

Read Post

ManageEngine

Read more about ManageEngine Recognized in IDC MarketScape for Observability 2025

Sponsored Post

Free Versus Paid Monitoring Tools

Dec 19, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

Choosing the right monitoring strategy is critical in today's hybrid IT environments. This whitepaper explores open-source, commercial, and hybrid approaches through real-world scenarios, highlighting trade-offs in cost, flexibility, compliance, and operational efficiency. Learn how organizations of all sizes optimize observability, integrate legacy and cloud-native systems, and scale monitoring with confidence.

Read Post

NiCE IT Mgmt

Read more about Free Versus Paid Monitoring Tools

Elastic and Google Cloud's powerful partnership in 2025

Dec 19, 2025 By Brian Bergholm In Elastic

In 2025, Elastic and Google Cloud created a powerhouse of AI-driven insights, providing an end-to-end search, observability, and security journey for our joint customers. We continue to partner on many opportunities for success and have made even further progress this year to empower all our users, especially around generative AI (GenAI). This blog highlights our collaboration with Google Cloud to help you harness the power of data at scale as well as our top moments from Google Cloud Next ‘25.

Read Post

Elastic

Read more about Elastic and Google Cloud's powerful partnership in 2025

Synthetic Monitoring & WooCommerce: Detecting Hidden Failures

Dec 19, 2025 By Dotcom-Monitor In Dotcom-Monitor

WooCommerce powers a massive portion of the internet’s commerce layer, largely because it looks simple. Install a plugin, connect Stripe, choose a theme, and suddenly WordPress becomes a store. That perceived simplicity is also what makes WooCommerce fragile in production. WooCommerce stores are not single systems.

Read Post

Dotcom-Monitor

Read more about Synthetic Monitoring & WooCommerce: Detecting Hidden Failures

Accelerating Sentinel data lake deployment | Webinar | VirtualMetric & Microsoft

Dec 19, 2025 By VirtualMetric | Telemetry Pipeline In VirtualMetric

Microsoft Sentinel data lake is becoming a core component of modern security architectures. In this on-demand webinar, Microsoft and VirtualMetric discuss how security teams can approach Sentinel data lake adoption to improve visibility, control cost, and prepare their data for AI-driven security workflows.

View Video

VirtualMetric

Read more about Accelerating Sentinel data lake deployment | Webinar | VirtualMetric & Microsoft

Extended Check Settings

Dec 19, 2025 By Uptime Website Monitoring In uptime

Learn how to configure extended check settings for Uptime.com checks that influence how you are alerted. Build alert and escalation structures, set maintenance windows, even define target SLA metrics.

View Video

uptime

Monitoring

Read more about Extended Check Settings

A FinOps engineer's guide to governing custom metrics

Dec 19, 2025 By Dieter Matzion In Datadog

This guest blog post is authored by Dieter Matzion, a seasoned cloud practitioner who has operated exclusively in public cloud environments since 2013, with experience at leading technology companies including Google, Netflix, Intuit, and Roku. Custom metrics play a crucial role in enabling teams to monitor their applications and businesses. The flexibility of these metrics allows engineers to measure what matters most to their domain.

Read Post

Datadog

Read more about A FinOps engineer's guide to governing custom metrics

Turning errors into product insight: How early-stage teams can connect engineering data to user impact

Dec 19, 2025 By Candace Shamieh In Datadog

Early-stage engineering teams ship fast and learn in production. While speed is a competitive advantage, it can also lead to a high volume of noisy signals, like stack traces, timeouts, and dashboards full of red. Some of those problems can affect your users and revenue, but many don’t.

Read Post

Datadog

Read more about Turning errors into product insight: How early-stage teams can connect engineering data to user impact

Inside Prometheus 3.0: Native Histograms, UTF-8, and the Future of Metrics | Big Tent S3E2

Dec 19, 2025 By Grafana In Grafana

From a SoundCloud side project to a global standard, Prometheus has come a long way — and 3.0 marks a major new chapter. In this episode of Grafana’s Big Tent, hosts Mat Ryer and Tom Wilkie sit down with.

View Video

Grafana

Read more about Inside Prometheus 3.0: Native Histograms, UTF-8, and the Future of Metrics | Big Tent S3E2

Why You Need "Always-On" Website Tracking This Holiday Season

Dec 19, 2025 By Pingdom In SolarWinds

Holiday shoppers are notoriously impatient, and in 2025, they’re increasingly impatient when it comes to slow websites. Keywords like “website downtime tracking” and “ecommerce site reliability” are often trending because businesses are realizing that slow is the new down. This holiday season, the goal is to safeguard your website against business-critical slowdowns without adding “manual monitoring” to your already busy plate.

Read Post

SolarWinds

Read more about Why You Need "Always-On" Website Tracking This Holiday Season

How Spotify R&D Migrated from Heroic to VictoriaMetrics at Massive Scale

Dec 19, 2025 By VictoriaMetrics In VictoriaMetrics

How do you monitor tens of trillions of data points without building your own database?

View Video

VictoriaMetrics

Read more about How Spotify R&D Migrated from Heroic to VictoriaMetrics at Massive Scale

How to Set Up Private Probe | Grafana Synthetic Monitoring

Dec 19, 2025 By Grafana In Grafana

Learn how to set up a private probe using Grafana Cloud Synthetic Monitoring. In this video, we walk through the steps you need to create a private probe.

View Video

Grafana

Read more about How to Set Up Private Probe | Grafana Synthetic Monitoring

How to share and analyze survey data (or other business metrics) in Grafana

Dec 19, 2025 By Tobias Skarhed In Grafana

Our annual Observability Survey provides some great insights on the state of industry and all things observability. And for the third edition of the survey, published last March, we wanted to bring the results into a Grafana dashboard—not just because we could, but because it was quite a nice way to interact with the data. After all, Grafana isn't just for IT observability. You can use it to monitor everything from BI data to lunar landings to pet pythons—and now, survey data.

Read Post

Grafana

Read more about How to share and analyze survey data (or other business metrics) in Grafana

Site24x7 2025 wrapped: Full-stack monitoring scale

Dec 19, 2025 By ManageEngine Site24x7 In Site24x7

2025 was Site24x7’s year of scale and intelligence. We stood by IT teams everywhere, keeping their world running with visibility that never misses a beat. We’ve evolved monitoring into a heartbeat, one that pulses with the rhythm of your business. AI-powered. Built for the future. Security built IN, not AROUND. Full-stack visibility that scales with you. At Site24x7, we’re committed to redefining what observability means for modern IT operations — unifying metrics, traces, logs, and real-user insights across clouds, containers, networks, and applications.

View Video

Site24x7

Monitoring

Read more about Site24x7 2025 wrapped: Full-stack monitoring scale

VictoriaMetrics 2025 Developer Experience: A Year in Review

Dec 19, 2025 By Diana Todea In VictoriaMetrics

2025 was a landmark year for VictoriaMetrics — defined not only by product improvements, new capabilities, and wider adoption, but by a strong and consistent presence across the global open-source and cloud-native ecosystem. Our mission has always been clear: to build open-source monitoring and observability solutions that are simple, reliable, and efficient for metrics, logs, and traces.

Read Post

VictoriaMetrics

Read more about VictoriaMetrics 2025 Developer Experience: A Year in Review

ServiceNow Integration for Obkio's Network Monitoring Tool: Coming Soon

Dec 19, 2025 By Andrii Kernitskyi In Obkio

We're launching a ServiceNow integration in early January that brings Obkio's network monitoring capabilities directly into your IT service management workflow.

Read Post

Obkio

Read more about ServiceNow Integration for Obkio's Network Monitoring Tool: Coming Soon

Spotify outage on December 17, 2025

Dec 18, 2025 By Andy Libby In StatusGator

On December 15, 2025, Spotify experienced a widespread outage that disrupted playback, logins, and app functionality for users around the world. While Spotify’s official status page remained silent throughout the incident, StatusGator detected the problem early using real user signals and issued an Early Warning Signal within minutes.

Read Post

StatusGator

Read more about Spotify outage on December 17, 2025

Reflecting on a year of smarter network monitoring: 2025

Dec 18, 2025 By Rama Venkatesan In Site24x7

This year, the world leaned heavily on words like reimagine, rebuild, renew, reshape, and reinvent, and the same spirit defined our journey. As promised last year, we reimagined key capabilities, reshaped workflows, and restructured critical parts of our network monitoring tool to meet modern demands. At the same time, we reinforced the core foundation you've trusted for more than a decade: delivering reliable, usable features with uncompromising security.

Read Post

Site24x7

Read more about Reflecting on a year of smarter network monitoring: 2025

Elevate your MSP operations: Key Site24x7 features you shouldn't miss in 2025

Dec 18, 2025 By Jyotsna R In Site24x7

Managing multiple customer accounts as an MSP can be overwhelming. With the constant demands of configuring monitors, generating reports, and maintaining security across numerous customers, efficiency becomes critical. Throughout this year, we've focused on making your life easier with powerful new features that automate repetitive tasks, enhance security, and give you better visibility into customer health.

Read Post

Site24x7

Read more about Elevate your MSP operations: Key Site24x7 features you shouldn't miss in 2025

Notes from the Field: Migrating from VMware to XenServer

Dec 18, 2025 By GripMatix In GripMatix

The customer was already using Citrix Provisioning Services (PVS) to deliver Virtual Delivery Agent (VDA) machines. Rather than attempting to migrate existing VMware-based VDAs, which often introduces driver conflicts and legacy dependencies, we followed a proven best-practice approach. We provisioned new VDA machines directly on XenServer using the PVS Virtual Desktop Setup Wizard. This ensured clean builds, free from VMware-specific components, and fully optimized for the XenServer platform.

Read Post

GripMatix

Read more about Notes from the Field: Migrating from VMware to XenServer

Confessions of a software engineer who enjoyed being paged at 5am

Dec 18, 2025 By Annie Freeman In Coralogix

It’s 5:14am, and I wake up to the squawking geese sound of my PagerDuty alert (anyone else have this sound? No?). I’m four months into working for my new team as a junior software engineer, and this is my first time being paged in the middle of the night. Most software engineers probably dread this moment, but I kind of love it. Agile ceremonies and Jira tickets suddenly don’t matter, and you’re fully focussed on stopping a customer-impacting fire.

Read Post

Coralogix

Read more about Confessions of a software engineer who enjoyed being paged at 5am

Day 2 with Cilium: Small configurations that keep large clusters boring

Dec 18, 2025 By Candace Shamieh In Datadog

Operating Cilium at a small scale is straightforward. You install the Helm chart, choose a routing mode, and apply a few network policies. Day 1 is about getting packets to flow. Day 2 is about keeping them boring. At Datadog, we run Cilium across hundreds of Kubernetes clusters, tens of thousands of nodes, and hundreds of thousands of pods in multiple clouds. When operating at this scale, small configuration choices stop being minor details and start becoming risk multipliers.

Read Post

Datadog

Read more about Day 2 with Cilium: Small configurations that keep large clusters boring

Last9 integration with TrueFoundry AI Gateway

Dec 18, 2025 By Sahil Khan In Last9

If you're using TrueFoundry to manage your LLM traffic, you can now send those traces directly to Last9 and view them alongside your existing infrastructure telemetry.

Read Post

Last9

Read more about Last9 integration with TrueFoundry AI Gateway

Elastic at AWS re:Invent: Concluding a year of partnership in agentic AI innovation

Dec 18, 2025 By Brian Bergholm In Elastic

Highlights of another laudable year of customer-centric collaboration The integration of Elastic’s capabilities, including vector databases and context engineering, with AWS services helps customers build intelligent, scalable, and secure applications faster and with greater flexibility. Our ongoing collaboration has resulted in another year of notable innovation with AWS. This blog highlights our continued collaboration with AWS throughout 2025 to help you capitalize on the power of AI.

Read Post

Elastic

Read more about Elastic at AWS re:Invent: Concluding a year of partnership in agentic AI innovation

Python memory profiling: Common pitfalls and how to avoid them

Dec 18, 2025 By Bowen Chen In Datadog

Continuous profiling has established itself as core observability practice, so much so that we’ve referred to it as the fourth pillar of observability. But despite the capabilities and growing adoption of continuous profiling, it can still be confusing to approach profiling as a newcomer and correctly apply it to different troubleshooting scenarios.

Read Post

Datadog

Read more about Python memory profiling: Common pitfalls and how to avoid them

Application Monitoring 101: Queue Time Can Alert Before a Breakdown

Dec 18, 2025 By Aspen Clevenger In Scout

Regular monitoring practices can emphasize application response time, but queue time is also often an early and important warning sign. If it rises, you’ll quickly see downstream effects: tail latency, timeouts, and error spikes. This means that this metric can give you a head start tackling app issues before they become user problems. In this post, we’ll discuss queue time, how things can go off track, and practical steps to turn it around.

Read Post

Scout

Read more about Application Monitoring 101: Queue Time Can Alert Before a Breakdown

Gartner I&O and Cloud Strategies Conference 2025: From Observability to Outcome-Driven Operations

Dec 18, 2025 By ScienceLogic In ScienceLogic

This year’s Gartner IT Infrastructure, Operations and Cloud Strategies Conference made one thing abundantly clear: the industry is moving beyond reactive monitoring and isolated dashboards toward autonomous, outcome-driven IT operations. While AI and agentic automation dominated keynotes and vendor messaging, conversations on the show floor reflected a more grounded reality.

Read Post

ScienceLogic

Read more about Gartner I&O and Cloud Strategies Conference 2025: From Observability to Outcome-Driven Operations

Centrally set up and scale monitoring of your infrastructure and apps with Datadog Fleet Automation

Dec 18, 2025 By Ethan Debnath In Datadog

Setting up and scaling observability across large, distributed environments often requires platform and SRE teams to coordinate access to infrastructure hosts and switch between configuration management tools and product-specific documentation. These tasks increase setup time and create delays in establishing visibility of critical services in Datadog. As teams expand their infrastructure, they need to coordinate Datadog configuration changes in a consistent and auditable way.

Read Post

Datadog

Read more about Centrally set up and scale monitoring of your infrastructure and apps with Datadog Fleet Automation

Text-to-Alert: Generating Netdata Alerts from Natural Language

Dec 18, 2025 By Shyam Sreevalsan In netdata

Netdata has an incredibly powerful alerting engine. But this can sometimes be a double-edged sword: the flexibility to build incredibly specific, intelligent alerts is immense, but mastering its syntax can feel like learning a new language. We’ve heard this from so many of you. You tell us that configuring alerts is often the steepest part of the learning curve, a task that falls to the one “Netdata expert” on the team who has spent the time digging through the documentation.

Read Post

netdata

Read more about Text-to-Alert: Generating Netdata Alerts from Natural Language

A Year in Internet Analysis: 2025

Dec 18, 2025 By Doug Madory In Kentik

This year-end wrap-up covers topics from BGP security (including ASPA and excessive AS-SETs) and the geopolitical (Ukraine’s IPv4 exodus, the Iran internet shutdown, and Red Sea cable cuts) to the year’s most significant outages (TikTok, the Spain/Portugal blackout, and cloud failures at AWS, Azure, and Cloudflare). Plus, we explore Starlink’s new Community Gateways, and revisit the evolving landscape of AS ranking and OTT service tracking.

Read Post

Kentik

Read more about A Year in Internet Analysis: 2025

Tail sampling vs. head sampling in distributed tracing

Dec 18, 2025 By Grafana In Grafana

In this video, Grafana Labs' Robin Gustafsson (CEO for K6 + VP, Product) and Sean Porter (Distinguished Engineer) discuss the differences between head sampling and tail sampling approaches in distributed tracing. They explore why head sampling often amounts to sampling randomly and hoping for the best, while tail sampling — the approach used by Adaptive Traces in Grafana Cloud — allows you to intelligently capture the traces that actually matter to you.

View Video

Grafana

Read more about Tail sampling vs. head sampling in distributed tracing

Logging Best Practices (Grafana OpenTelemetry Community Call)

Dec 18, 2025 By Grafana In Grafana

We’re back with a new Grafana OpenTelemetry Community Call episode, and this time we’re diving into logging with OpenTelemetry and Grafana Loki! Even better, we’re joined by two fantastic guests: Jack Berg, OTel logging expert, and Ed Welch, Loki guru. Getting both of them in one conversation makes for an amazing deep-dive into all things logging. Logs come in every shape and size, from simple CLI output to massive distributed systems generating petabytes of structured data. In this episode, we’ll talk about.

View Video

Grafana

Read more about Logging Best Practices (Grafana OpenTelemetry Community Call)

Building a Code Review system that uses prod data to predict bugs

Dec 18, 2025 By Giovanni Guidini In Sentry

This post takes a closer look at how Sentry’s AI Code Review actually works. As part of Seer, Sentry’s AI debugger, it uses Sentry context to accurately predict bugs. It runs automatically or on-demand, pointing out issues and suggesting fixes before you ship. We know AI tools can be noisy, so this system focuses on finding real bugs in your actual changes—not spamming you with false positives and unhelpful style tips.

Read Post

Sentry

Read more about Building a Code Review system that uses prod data to predict bugs

Datadog's MCP Server connects AI agents by ingesting prompts and mapping them to Datadog resources.

Dec 18, 2025 By Datadog In Datadog

Learn more by watching the full episode our latest This Month in Datadog series.

View Video

Datadog

Read more about Datadog's MCP Server connects AI agents by ingesting prompts and mapping them to Datadog resources.

VictoriaMetrics Virtual Meet Up - December 2025

Dec 18, 2025 By VictoriaMetrics In VictoriaMetrics

Warm up VictoriaMetrics roadmap updates Spotify - Customer Story: "How & why we use VictoriaMetrics" Presenter: Lauren Roshore, Engineering Manager, Observability at Spotify.

View Video

VictoriaMetrics

Monitoring

Read more about VictoriaMetrics Virtual Meet Up - December 2025

Site24x7's Kubernetes monitoring | Proactive, scalable, AI-powered

Dec 18, 2025 By ManageEngine Site24x7 In Site24x7

Kubernetes drives modern cloud-native applications, but its distributed nature creates visibility and performance challenges at scale. In this video, discover how Site24x7 provides real-time monitoring, AI-powered anomaly detection, and scalability for Kubernetes environments, helping you to proactively manage resources and resolve issues faster. Key features of Site24x7 Kubernetes Monitoring: Whether you're running a single Kubernetes cluster or managing multiple environments, Site24x7 helps you ensure peak performance and faster decision-making with minimal manual intervention.

View Video

Site24x7

Read more about Site24x7's Kubernetes monitoring | Proactive, scalable, AI-powered

What is DEX? And Why DEX is Important

Dec 18, 2025 By Rachel Berry In eG Innovations

Digital Employee Experience (DEX) refers to how employees interact with the digital tools, systems, and technologies they use at work-and how those interactions affect their productivity, satisfaction, and overall work experience. DEX encompasses the quality of the digital interactions and services that employees encounter while using workplace technologies. It includes various factors such as application performance, network connectivity, device usability, and overall user satisfaction.

Read Post

eG Innovations

Read more about What is DEX? And Why DEX is Important

Debug Faster with Chrome + Rollbar Debugging Assistant

Dec 18, 2025 By Rollbar In Rollbar

Context switching is one of the biggest hidden productivity killers in debugging. Jumping between multiple open browser tabs slows momentum and increases cognitive load, especially when you’re trying to diagnose an issue under pressure. Google Chrome's new split screen feature, paired with Rollbar Debugging Assistant, enables a faster, more focused way to troubleshoot errors without constantly losing your place.

View Video

Rollbar

Read more about Debug Faster with Chrome + Rollbar Debugging Assistant

From Zero to Open Source Contributor

Dec 18, 2025 By Datadog In Datadog

Never contributed to open source and feeling intimidated? Same. Before joining Datadog, Alessandro had zero open source experience. Now he's a regular contributor to Apache Iceberg. Here's exactly how he got started. Step 1: Join the Slack community and answer user questions. Step 2: Look for "good first issue" tags in the repo. Step 3: Remember that opening bug reports and doing code reviews count as contributions too.

View Video

Datadog

Read more about From Zero to Open Source Contributor

The Observability Stack is Collapsing: Why Context-First Data is the Only Path to AI-Powered Root Cause Analysis

Dec 18, 2025 By Mezmo In Mezmo

By Bill Balnave, VP of Customer Success at Mezmo The core promise of modern observability is simple: cut Mean Time To Resolution (MTTR). Yet, despite a boom in tooling and investment over the last four years, the data tells a sobering story: our industry is actually getting worse at finding and resolving issues. Dashboards, once our trusted guide, have become the starting point for a chaotic "dashboard hunt" that rarely leads to the definitive root cause.

Read Post

Mezmo

Read more about The Observability Stack is Collapsing: Why Context-First Data is the Only Path to AI-Powered Root Cause Analysis

What's New in InfluxDB 3.8: Linux Service Management, Kubernetes Helm Chart, and Smarter Ask AI

Dec 18, 2025 By Peter Barnett In InfluxData

InfluxDB 3.8 is now available for both Core and Enterprise, alongside the 1.6 release of the InfluxDB 3 Explorer UI. This release is focused on operational maturity and making InfluxDB easier to deploy, manage, and run reliably in production. InfluxDB 3 Core remains free and open source under MIT and Apache 2 licenses, optimized for recent data. InfluxDB 3 Enterprise builds on that foundation with long-range querying, clustering, security, and full operational tooling.

Read Post

InfluxData

Read more about What's New in InfluxDB 3.8: Linux Service Management, Kubernetes Helm Chart, and Smarter Ask AI

How To Connect Your Prometheus Server to a Grafana Datasource

Dec 18, 2025 By Benjamin Pitts In MetricFire

Prometheus is one of the most popular open-source monitoring systems in the world. It’s lightweight, easy to deploy, and pairs beautifully with Grafana for dashboards and alerting. If you're running applications or infrastructure on Linux, Prometheus plus one of many Exporters (Redis, NVIDIA GPU, Nginx, etc.) gives you deep visibility into service performance - quickly and reliably.

Read Post

MetricFire

Read more about How To Connect Your Prometheus Server to a Grafana Datasource

Why OpenTelemetry instrumentation needs both eBPF and SDKs

Dec 18, 2025 By Fabian Stäber In Grafana

As a vendor-neutral open standard, OpenTelemetry has become the default choice for application instrumentation. However, it’s important to remember that OpenTelemetry isn’t a single technology — it’s an ecosystem. Under the hood, it provides multiple options for instrumenting your applications. In this blog post, we explore two instrumentation approaches: OpenTelemetry eBPF Instrumentation and runtime-specific OpenTelemetry SDKs, like the OpenTelemetry Java agent.

Read Post

Grafana

Read more about Why OpenTelemetry instrumentation needs both eBPF and SDKs

Episode 3 - Where AI Meets Legacy Systems

Dec 18, 2025 By Digitate In Digitate

In this episode of The Intelligent Enterprise, host Tom Stoneman gets inside a challenge many enterprises are facing right now: how to integrate AI with complex legacy systems without breaking what already works. This week, Tom sits down with Yael Gómez, Fractional Chief Technology Officer and Chief Information Officer at Pet Madness, and former technology leader at Walgreens Boots Alliance.

View Video

Digitate

Read more about Episode 3 - Where AI Meets Legacy Systems

Capture high-value traces without managing a pipeline: Tail sampling with Adaptive Traces

Dec 18, 2025 By Chris Marchbanks In Grafana

Tracing is the richest observability signal in common use today. In distributed systems, it reveals how requests flow across multiple services, allowing you to uncover and address performance bottlenecks. Teams often scale back or abandon tracing altogether, however, because most successful requests produce redundant data that’s noisy and expensive to store.

Read Post

Grafana

Read more about Capture high-value traces without managing a pipeline: Tail sampling with Adaptive Traces

7 Strategies for IT Ops Teams to Monitor and Optimize Real-Time Commodity Pricing Systems for Financial Reliability

Dec 18, 2025 By OpsMatters In OpsMatters

Real-time commodity pricing systems have become mission-critical infrastructure for financial institutions, trading desks, and enterprise resource planning operations. As of December 2025, with 72% of trading firms migrating to cloud-native CTRM and ETRM platforms, IT Ops teams face mounting pressure to maintain pricing accuracy, minimize latency, and ensure system resilience during volatile market conditions.

Read Post

OpsMatters

Read more about 7 Strategies for IT Ops Teams to Monitor and Optimize Real-Time Commodity Pricing Systems for Financial Reliability

Unlock Reliable ONTAP Monitoring on Microsoft SCOM

Dec 17, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

NiCE is excited to announce the general availability of NiCE NetApp ONTAP Management Pack (MP) version 1.2, delivering enhanced monitoring, expanded ONTAP version support, and improved visibility across NetApp environments.

Read Post

NiCE IT Mgmt

Read more about Unlock Reliable ONTAP Monitoring on Microsoft SCOM

Apple TV+ outage: StatusGator detected issues before provider acknowledgment

Dec 17, 2025 By Colin Bartlett In StatusGator

On the evening of December 12, 2025, Apple TV+ experienced a significant service disruption during prime streaming hours that left thousands of users unable to access content.

Read Post

StatusGator

Read more about Apple TV+ outage: StatusGator detected issues before provider acknowledgment

Top SaaS Vendors DevOps Teams Should Monitor in 2025

Dec 17, 2025 By Nuno Tomas In isDown

Modern applications rely on dozens of third-party services to function properly. When these services fail, your application fails too. DevOps teams need to identify and monitor the top SaaS vendors that could impact their infrastructure and user experience. This guide covers the essential SaaS vendors DevOps teams should monitor, organized by category and criticality. We'll explore why each vendor matters and what specific aspects require monitoring.

Read Post

isDown

Read more about Top SaaS Vendors DevOps Teams Should Monitor in 2025

IT infrastructure monitoring: Leaner, stronger, more intelligent, and a huge progression

Dec 17, 2025 By Geoffrin Edwin In Site24x7

IT infrastructure as a technology has leapfrogged in 2025 and Site24x7 is no exception. CTOs, SREs, sysadmins, and other IT personnel wanted more from server monitoring and observability tools—and we stepped up. We listened to you, the industry leaders in your respective spaces, and re-envisioned our product platforms. The result?

Read Post

Site24x7

Read more about IT infrastructure monitoring: Leaner, stronger, more intelligent, and a huge progression

StatusIQ in 2025: Reliability and credibility, strengthened

Dec 17, 2025 By Janani Sekar In Site24x7

2025 has been a milestone year for StatusIQ, focused on strengthening resilience, accessibility, and security. Together, these enhancements reinforce our commitment to secure, transparent, and always-available status communication for teams and their users.

Read Post

Site24x7

Read more about StatusIQ in 2025: Reliability and credibility, strengthened

Cloud observability in focus: How Site24x7 strengthened cloud monitoring in 2025

Dec 17, 2025 By Elizebeth JB In Site24x7

Cloud monitoring became more important as enterprises scaled distributed systems, multi-region deployments, and hybrid environments. Teams needed better cloud performance insights, clearer resource usage visibility, and stronger automation to prevent outages and control costs. This year, Site24x7 delivered a rich set of cloud monitoring updates across AWS, Azure, GCP, and OCI, helping teams stay ahead of issues and optimize their cloud footprint.

Read Post

Site24x7

Read more about Cloud observability in focus: How Site24x7 strengthened cloud monitoring in 2025

Datadog re:Invent recap 2025

Dec 17, 2025 By Datadog In Datadog

View Video

Datadog

Read more about Datadog re:Invent recap 2025

The Hidden Costs and Concerns of Iceberg Maintenance

Dec 17, 2025 By Datadog In Datadog

Everyone talks about how great Apache Iceberg is, but nobody warns you about this: without proper maintenance, your tables will bloat, queries will slow down, and your catalog will run out of memory. Here are the 4 critical operations you MUST run regularly. Expiring snapshots prevents metadata bloat (Datadog learned this the hard way with catalog memory pressure). Deleting orphan files cleans up failed writes. Compacting data files keeps streaming workloads fast. Compacting manifests optimizes query planning.

View Video

Datadog

Read more about The Hidden Costs and Concerns of Iceberg Maintenance

Improve log utilization with Datadog log exclusion filters | Datadog Tips & Tricks

Dec 17, 2025 By Datadog In Datadog

Want to make your logs easier to work with? Excluding unneeded logs from indexing reduces noise and may reduce log management costs. In this video, you’ll learn how to: See for yourself how to improve log utilization with Datadog Log Patterns and log exclusion filters. Then set up an alert to track ingestion spikes.

View Video

Datadog

Read more about Improve log utilization with Datadog log exclusion filters | Datadog Tips & Tricks

From Prompt to Production: Deploying AI Agents for Better Code

Dec 17, 2025 By Lightrun In Lightrun

We’re entering a new era of AI-accelerated software development. Teams that successfully integrate AI coding assistants into their daily workflows are already seeing significant productivity gains, while those that don’t risk falling behind.

View Video

Lightrun

Read more about From Prompt to Production: Deploying AI Agents for Better Code

The Lonely Little Packet: A Kentik Holiday Story

Dec 17, 2025 By Kentik In Kentik

Happy Holidays from your friends at Kentik! We hope you enjoy our traditional, heartwarming tale of a data packet struggling to reach its destination.

View Video

Kentik

Read more about The Lonely Little Packet: A Kentik Holiday Story

Creating the IPM Category: Catchpoint's Journey to Leadership and the LogicMonitor Era

Dec 17, 2025 By Catchpoint Team In Catchpoint

On December 15, 2022, Catchpoint launched Internet Performance Monitoring (IPM) as a new category for monitoring solutions with our foundational article, “What is Internet Performance Monitoring and How is it Different from APM?” In it, we said: How prophetic those words turned out to be.

Read Post

Catchpoint

Read more about Creating the IPM Category: Catchpoint's Journey to Leadership and the LogicMonitor Era

The 2026 VMUG Report: Why Network Observability is the Heart of the New VCF Era

Dec 17, 2025 By Alec Pinkham In Broadcom

The cloud landscape is no longer just about "getting to the cloud"—it is about mastering the complexity once you are there. For organizations using VMware Cloud Foundation (VCF), the stakes have never been higher. As infrastructure converges, the margin for error shrinks, and the need for precision grows. To understand how the industry is navigating these changes, we dive into the VMUG Cloud Operations and VCF User Experience Report 2026.

Read Post

Broadcom

Read more about The 2026 VMUG Report: Why Network Observability is the Heart of the New VCF Era

Why 2025 Shattered the Old Rules of Network Management

Dec 17, 2025 By Yann Guernion In Broadcom

December has arrived. The change freeze is looming, and the holiday requests are likely piling up in your inbox right now. It is the natural time for you to look back at the last twelve months, not just to measure your team's performance, but to consider how much the game itself has changed. If you look at the trajectory of your industry this year, a clear pattern emerges. You didn't just face new technical challenges; you faced a genuine shift in what it means to manage a network. The old metrics broke.

Read Post

Broadcom

Read more about Why 2025 Shattered the Old Rules of Network Management

Migrating from SolarWinds to WhatsUp Gold: The Ultimate Guide

Dec 17, 2025 By Jason Alberino In WhatsUp Gold

Looking for a reliable SolarWinds alternative? If you’ve been comparing SolarWinds and WhatsUp Gold, you’re not alone. Many IT teams are evaluating which network monitoring solution offers better performance, flexibility and cost efficiency. In this guide, we’ll walk you through a step-by-step migration from SolarWinds to WhatsUp Gold, highlighting key differences, benefits and best practices to maintain a smooth transition.

Read Post

WhatsUp Gold

Read more about Migrating from SolarWinds to WhatsUp Gold: The Ultimate Guide

About us - Sumo Logic

Dec 17, 2025 By Sumo Logic, Inc. In Sumo Logic

Security teams are flooded with thousands, or even millions, of signals every day. Sumo Logic’s entity-based SIEM and Dojo AI agents automate the manual work of detection, triage, and remediation so you can act faster on the alerts that matter. Discover how Sumo Logic simplifies security operations, helping you cut through the noise and protect your digital world.

View Video

Sumo Logic

Read more about About us - Sumo Logic

OpenTelemetry Agents - The Complete Beginner's Guide (2025)

Dec 17, 2025 By Dhruv Ahuja In SigNoz

If you search for “OpenTelemetry Agent”, you will likely encounter two completely different definitions. This ambiguity often leads to confusion between infrastructure teams and application developers. SREs and DevOps engineers would describe it as a component deployed as a sidecar, whereas application developers would understand it as a language-specific library. Let’s break it down in the next section.

Read Post

SigNoz

Read more about OpenTelemetry Agents - The Complete Beginner's Guide (2025)

.NET Web API Monitoring: REST, ASP.NET & WCF Compared

Dec 17, 2025 By Dotcom-Monitor In Dotcom-Monitor

Modern.NET applications rely on three primary Web API architectures: lightweight REST APIs, middleware-driven ASP.NET Core Web APIs, and contract-heavy WCF SOAP services. Each exposes functionality over HTTP, but each behaves very differently in production. More importantly, each architecture fails in different ways, which means teams must monitor them differently to maintain reliability, uptime, and predictable performance.

Read Post

Dotcom-Monitor

Read more about .NET Web API Monitoring: REST, ASP.NET & WCF Compared

Easy-to-Use SSL Certificate Management Tool: The Complete Guide

Dec 17, 2025 By Dotcom-Monitor In Dotcom-Monitor

Managing SSL certificates can feel complicated for many teams. It often involves remembering renewal dates, keeping track of certificate details, ensuring that websites stay secure, and avoiding costly downtime caused by expired or invalid certificates. While many people call these solutions “SSL certificate management tools,” most organizations do not need a full PKI automation platform.

Read Post

Dotcom-Monitor

Read more about Easy-to-Use SSL Certificate Management Tool: The Complete Guide

Synthetic Application Monitoring: Proactive Strategy to Prevent Downtime

Dec 17, 2025 By Dotcom-Monitor In Dotcom-Monitor

Imagine this: It’s three in the morning on Black Friday. Your phone appears with alerts, your online store’s checkout isn’t functioning properly. Your team is in a panic, sales are dropping by the minute, and social media is full of complaints from your clients. Determining that the problem is an expired third-party payment gateway means you’ve lost hours of sales and your customers’ trust.

Read Post

Dotcom-Monitor

Read more about Synthetic Application Monitoring: Proactive Strategy to Prevent Downtime

Mobile App Synthetic Monitoring enables proactive testing across devices and networks

Dec 17, 2025 By Dotcom-Monitor In Dotcom-Monitor

In the mobile-first digital economy, your application’s performance is your brand’s frontline. Your backend is fast. Your APIs respond in milliseconds. Yet, somewhere on a slow network in a bustling city center, a user is staring at a frozen login screen. This scenario highlights a critical truth. App synthetic monitoring is the proactive discipline of simulating real user interactions—like app launches, logins, searches, and checkouts—from real devices and networks worldwide.

Read Post

Dotcom-Monitor

Read more about Mobile App Synthetic Monitoring enables proactive testing across devices and networks

How to Connect Your MySQL Instance to a Grafana Datasource

Dec 17, 2025 By Benjamin Pitts In MetricFire

Grafana’s MySQL datasource makes it easy to turn raw database rows into clean, interactive dashboards. Whether you're testing out a new monitoring setup or experimenting with time-series data, MySQL + Grafana gives you a powerful foundation for building visualizations quickly.

Read Post

MetricFire

Read more about How to Connect Your MySQL Instance to a Grafana Datasource

Gisual Enters the Stack: Power, AI, and the Next Phase of Observability

Dec 17, 2025 By ScienceLogic In ScienceLogic

ScienceLogic recently partnered with Gisual—a leader in AI power intelligence—to bring real-time power insight directly into the ScienceLogic AI Platform. On the surface, that might sound like a straightforward integration story. In reality, it signals something much bigger: observability continues to expand well beyond the digital stack, and operators now treat power as a first-class operational signal.

Read Post

ScienceLogic

Read more about Gisual Enters the Stack: Power, AI, and the Next Phase of Observability

[Workshop] Building and Monitoring AI Agents and MCP servers

Dec 17, 2025 By Sentry In Sentry

See how Agent Monitoring gives you a better look at all things model usage, call duration, prompting, and more Go under the hood with MCP Monitoring - and learn how to debug client connection issues, tool call performance, transports, and all things MCP When things start breaking, use Seer, Sentry's AI Debugging Agent to troubleshoot those vague issues that are crashing and get help from a team of robots using Sentry’s AI PR Review.

View Video

Sentry

Read more about [Workshop] Building and Monitoring AI Agents and MCP servers

Monitor and reduce your mobile app size with Size Analysis (beta)

Dec 17, 2025 By Max Topolsky In Sentry

Note: This blog post was originally published for the Early Access of Size Analysis. if you're already familiar with Size Analysis in Sentry, go to the section titled What's new in the beta. If you're not familiar with Size Analysis, start at the section titled The curious case of man.jpg.

Read Post

Sentry

Read more about Monitor and reduce your mobile app size with Size Analysis (beta)

Instrumentation Hub: a guided, scalable way to roll out observability coverage without losing control

Dec 17, 2025 By Paschalis Tsilias In Grafana

Getting started with observability in a modern, fast-moving environment is harder than it should be. Open-standards-based observability promises flexibility and vendor neutrality, but in practice it often introduces significant complexity and delays meaningful coverage by months or even years. Each layer of the stack requires its own instrumentation approach, and every technology, runtime, and library version comes with unique setup steps, tradeoffs, and rough edges.

Read Post

Grafana

Read more about Instrumentation Hub: a guided, scalable way to roll out observability coverage without losing control

The year in AI at Grafana Labs

Dec 17, 2025 By Trevor Jones In Grafana

2025 was the year we at Grafana Labs went all-in on AI—and boy, what a year it was. Not only did we establish and start to execute our overarching strategy (build actually useful AI), we also took one of our most exciting new features (Grafana Assistant) from idea to general availability in just nine months! Yes, there's no shortage of articles singing the praises of AI these days, but let's dispense with the hyperbole and focus on some actually useful content.

Read Post

Grafana

Read more about The year in AI at Grafana Labs

Nexthink Recognized as a Customers' Choice in Gartner Peer Insights Voice of the Customer for Digital Employee Experience Management Tools

Dec 16, 2025 By Ebba Kalderén In Nexthink

We’re thrilled to share the exciting news that Nexthink has been recognized as a Customers’ Choice in the inaugural 2025 Gartner Peer Insights Voice of the Customer for DEX Tools. In our view, what makes this recognition truly special is that it comes directly from the people who know our platform best – the IT leaders who use Nexthink every single day. Apart from this, we are recognised as a Leader in the Gartner Magic Quadrant for DEX Management Tools for the second consecutive year.

Read Post

Nexthink

Read more about Nexthink Recognized as a Customers' Choice in Gartner Peer Insights Voice of the Customer for Digital Employee Experience Management Tools

Spotify's performance & control across large monitoring environments with VictoriaMetrics

Dec 16, 2025 By Adam Yates In VictoriaMetrics

When your active time series is in the billions and the total number of data points you need to monitor runs into the tens of trillions, you need a high-performance observability solution with operational simplicity. Streaming behemoth Spotify is one such case. Their observability team chose VictoriaMetrics as the fastest monitoring and observability solution on the market.

Read Post

VictoriaMetrics

Read more about Spotify's performance & control across large monitoring environments with VictoriaMetrics

Tech Talk - Splunk Observability for AI

Dec 16, 2025 By Splunk In Splunk

In this Tech Talk, we’ll show you how Splunk’s agentic, AI observability delivers end-to-end visibility of the entire AI stack, from agents and large language models (LLMs) to the underlying infrastructure. You’ll see how AI Infrastructure Monitoring provides teams with data-dense dashboards and detectors for surfacing trends, patterns, and outliers to correlate application health with underlying AI infrastructure performance.

View Video

Splunk

Read more about Tech Talk - Splunk Observability for AI

Tracealyzer v4.11 now available

Dec 16, 2025 By Percepio In Percepio

We are delighted to release Percepio Tracealyzer version 4.11. The main news are: Users with an active subscription may download the new version from the update page. New users may sign up for free evaluation here.

Read Post

Percepio

Read more about Tracealyzer v4.11 now available

Grafana Assistant Generally Available + Assistant Investigations in Public Preview

Dec 16, 2025 By Grafana In Grafana

During ObservabilityCON 2025, we announced the GA of Grafana Assistant. It's free to use until Dec. 31, 2025. Assistant Investigations is now in Public Preview.

View Video

Grafana

Read more about Grafana Assistant Generally Available + Assistant Investigations in Public Preview

Tech Talk - Take action automatically on Splunk alerts with Red Hat Ansible Automation Platform

Dec 16, 2025 By Splunk In Splunk

As digital and AI applications become more prevalent, the need for fast, efficient, and consistent management of IT operations is critical. This session will show you how to automate responses to Splunk Observability Platform alerts using Red Hat Ansible Automation Platform's Event-Driven Ansible.

View Video

Splunk

Read more about Tech Talk - Take action automatically on Splunk alerts with Red Hat Ansible Automation Platform

Tech Talk - Observe and Secure All Apps with Splunk

Dec 16, 2025 By Splunk In Splunk

In this session, you will learn how to.

View Video

Splunk

Read more about Tech Talk - Observe and Secure All Apps with Splunk

Setting up OpenTelemetry Demo in Kubernetes with Splunk Observability Cloud

Dec 16, 2025 By Splunk In Splunk

Are you looking to explore the power of OpenTelemetry and Splunk Observability Cloud in a Kubernetes environment? This video provides a comprehensive, step-by-step walkthrough on how to deploy the OpenTelemetry Demo application in Kubernetes and seamlessly integrate it with Splunk Observability Cloud for metrics, traces, and logs! In this tutorial, you'll learn.

View Video

Splunk

Read more about Setting up OpenTelemetry Demo in Kubernetes with Splunk Observability Cloud

Building visibility and resilience across Kubernetes

Dec 16, 2025 By Hemant Bansal In Coralogix

Kubernetes has transformed how modern applications are deployed and scaled. Its flexibility and automation power innovation but also expand the attack surface. From control plane access to runtime drift, Kubernetes introduces layers of complexity that can obscure visibility if not properly monitored. For security leaders, Kubernetes is both an opportunity and a risk. While it enables agility, it also decentralizes security responsibility across teams, tools, and cloud layers.

Read Post

Coralogix

Read more about Building visibility and resilience across Kubernetes

Heroku vs. Kubernetes

Dec 16, 2025 By Muhammed Ali In Honeybadger

If you are deciding where to deploy a web app, you will almost always run into a choice between a platform like Heroku and running on Kubernetes. This article will compare Heroku and Kubernetes. They are two popular platforms for deploying and managing applications. This article breaks down the key differences in architecture, use cases, complexity, cost, and scalability to help engineers choose the right go-to platform for their needs.

Read Post

Honeybadger

Read more about Heroku vs. Kubernetes

Introducing the Databricks Destination: Powering governed, scalable analytics from day one

Dec 16, 2025 By Ryan Conway and In Cribl

Modern enterprises are generating more high-volume observability and security data than ever, which means the cost and complexity of getting analytics-ready data into Databricks are only growing. With the new Databricks Destination for Cribl Stream, organizations finally have a governed, scalable, and cost-efficient way to take full control of their data pipelines, accelerate AI-driven analytics, and unlock real business value from their Databricks investment.

Read Post

Cribl

Read more about Introducing the Databricks Destination: Powering governed, scalable analytics from day one

ServiceNow and Grafana: How to receive Grafana alert payloads via ServiceNow's scripted REST API

Dec 16, 2025 By George Reyes In Grafana

When you integrate Grafana-managed alert rules with ServiceNow, you can automatically capture and process alerts in ServiceNow’s events table—a common entry point for incident workflows, escalations, and ticket creation. And if you configure ServiceNow to receive Grafana Alerting payloads using ServiceNow’s scripted REST API, you can parse Grafana’s JSON alert payloads and insert them into a ServiceNow table.

Read Post

Grafana

Read more about ServiceNow and Grafana: How to receive Grafana alert payloads via ServiceNow's scripted REST API

GEOFF WRIGHT RETURNS: 2025 EOY SPECIAL EPISODE!

Dec 16, 2025 By Nexthink In Nexthink

In our tradtional end-of-year DEX Show special episode, Mondelez’s Geoff Wright returns to unpack a wild 2025 for IT, AI and employee experience. Tim, Tom and Geoff riff on AI agents that shop, plan travel and work across your browser tabs, the coming street fight between Windows and Chromebooks, and why younger workers just want a browser and to be left alone. Geoff explores shadow AI, culture and the human resistance to change, plus his Q1 predictions: Google’s big enterprise push, soaring laptop costs, and why experience, empathy and a good laugh still matter more than any shiny new model.

View Video

Nexthink

Read more about GEOFF WRIGHT RETURNS: 2025 EOY SPECIAL EPISODE!

DevOps AI Tools: Root Cause Analysis + eBPF + Clickhouse

Dec 16, 2025 By Coroot In Coroot

Watch Coroot’s Root Cause Analysis AI pinpoint the exact cause of an incident and suggest fixes in seconds.

View Video

Coroot

Read more about DevOps AI Tools: Root Cause Analysis + eBPF + Clickhouse

Synthetic End User Monitoring simulates complex user journeys across global environments

Dec 16, 2025 By Dotcom-Monitor In Dotcom-Monitor

Traditional monitoring solutions provide valuable infrastructure metrics, they fundamentally lack the capability to understand what users actually experience. There is a significant technical gap between server-side metrics and client-side experience. Research shows that traditional monitoring fails to detect 52–68% of user-facing errors since they happen outside of the server infrastructure.

Read Post

Dotcom-Monitor

Read more about Synthetic End User Monitoring simulates complex user journeys across global environments

Setup and Explore OpenTelemetry Demo Application (with Examples)

Dec 16, 2025 By Elizabeth Mathew In SigNoz

Everyone knows that debugging is twice as hard as writing a program in the first place. So, if you’re as clever as you can be when you write it, how will you ever debug it? — Brian W. Kernighan and P. J. Plauge, The Elements of Programming Style, 2nd ed. Maybe you can let SigNoz do some heavy lifting for you!

Read Post

SigNoz

Read more about Setup and Explore OpenTelemetry Demo Application (with Examples)

Best Certificate Monitoring Solutions With Slack/Teams Integration: The Complete Guide

Dec 16, 2025 By Dotcom-Monitor In Dotcom-Monitor

SSL certificates expire silently. When they do, websites instantly break. Users see warnings. Traffic drops. Security trust is damaged. This is why businesses now rely on certificate monitoring solutions that send alerts before a certificate expires. A growing number of teams want these alerts directly inside Slack or Microsoft Teams, because that’s where their operations already work every day.

Read Post

Dotcom-Monitor

Read more about Best Certificate Monitoring Solutions With Slack/Teams Integration: The Complete Guide

Training Foundation Models on a Trillion Data Points with Apache Iceberg

Dec 16, 2025 By Datadog In Datadog

Training an AI foundation model on over a trillion data points sounds impossible without hitting your production systems. Here's how Datadog did it with Apache Iceberg for their time series forecasting model TOTO. The key challenge: extracting massive historical observability data (metrics spanning years) and running incremental preprocessing pipelines without overwhelming production services. Iceberg solved this by providing schema governance, consistency guarantees, and seamless integration with ML tools like Ray and PyTorch.

View Video

Datadog

Read more about Training Foundation Models on a Trillion Data Points with Apache Iceberg

Why Monitoring the Physical Environment Matters: From Data Centers to Factory Floors

Dec 16, 2025 By OpsMatters In OpsMatters

Physical environment monitoring is the practice of measuring and tracking environmental conditions that directly affect equipment, people, and operational continuity. While digital systems dominate modern operations, physical conditions still determine whether those systems perform reliably or fail unexpectedly. A single temperature spike, humidity imbalance, or power fluctuation can undo layers of software redundancy.

Read Post

OpsMatters

Read more about Why Monitoring the Physical Environment Matters: From Data Centers to Factory Floors

What broke during the Trello outage on December 12

Dec 15, 2025 By Colin Bartlett In StatusGator

In the early hours of December 12, 2025, Trello experienced a disruption that affected teams around the world. Users began reporting that boards would not load, workspaces were inaccessible, and error messages appeared without warning. For a period of time, Trello’s official status page continued to show normal operations, even as real world usage indicated otherwise.

Read Post

StatusGator

Read more about What broke during the Trello outage on December 12

Save the logs, save the planet: How to make your observability stack greener

Dec 15, 2025 By Coralogix Team In Coralogix

If data centres were a country, they’d rank fifth in electricity consumption by 2026. Over the past few years, the resulting carbon footprint of the technology industry has sparked the fast-growing green software movement, led by the Green Software Foundation. How can we continue to innovate software in a way that also minimises its impact on the environment? This has been a fascinating problem I’ve been exploring for a few years now.

Read Post

Coralogix

Read more about Save the logs, save the planet: How to make your observability stack greener

Bright Ideas: Measuring the ROI of AI Adoption in Financial Services

Dec 15, 2025 By Teneo In Teneo

If there is one truth I have learned working with financial services firms in 2025, it is this: AI is no longer optional, it is operational. From risk modeling to customer experience, algorithmic trading to automated compliance checks, AI is now embedded into the fabric of modern finance. But there is a second, quieter truth. AI only creates value when it is used responsibly, measurably, and at scale.

Read Post

Teneo

Read more about Bright Ideas: Measuring the ROI of AI Adoption in Financial Services

VictoriaMetrics Achieves Red Hat OpenShift Operator Certification

Dec 15, 2025 By Adam Yates / Vadim Rutkovsky In VictoriaMetrics

VictoriaMetrics has achieved Red Hat OpenShift Certification, awarded to Red Hat partners who meet requirements for delivering a scalable, supported, and secure operator designed for enterprise cloud deployments. VictoriaMetrics available on the Red Hat OpenShift OperatorHub The program certified VictoriaMetrics as a solution that allows for portability and operational efficiency across hybrid and multi-cloud environments.

Read Post

VictoriaMetrics

Read more about VictoriaMetrics Achieves Red Hat OpenShift Operator Certification

OpenTelemetry Metrics with 5 Practical Examples

Dec 15, 2025 By Elizabeth Mathew In SigNoz

Picture this, your observability tool already nails the basics like request rates, latency and memory usage, but you need more insight. Think user churn rates, engagement spikes, or even how many carts get abandoned mid-checkout. That’s where OpenTelemetry steps in, providing a way to track those critical custom metrics with ease.

Read Post

SigNoz

Read more about OpenTelemetry Metrics with 5 Practical Examples

How Inkeep Monitors Their AI Agent Framework with SigNoz

Dec 15, 2025 By Anushka Karmakar In SigNoz

AI agents are fundamentally different beasts to monitor compared to traditional applications. A single user request can trigger a cascade of 10+ internal operations: sub-agent transfers, tool executions, LLM calls, API requests, each with unpredictable latency and failure modes. When something goes wrong (and with LLMs, things go wrong in creative ways), you need to see the entire execution flow to debug effectively.

Read Post

SigNoz

Read more about How Inkeep Monitors Their AI Agent Framework with SigNoz

What Broken Checkouts Really Cost: Why Transaction Monitoring Pays For Itself

Dec 15, 2025 By Richa Gupta In WebSitePulse

Broken checkouts lead to lost transactions, drain revenue, undermine customer trust, and damage brand credibility. Unfortunately, most companies don't realize their checkout is failing until sales drop or customers start complaining. According to statistics, technical issues cause checkout abandonment in at least 17% of cases. This means nearly one-fifth of lost conversions are preventable. For any online business, even a small checkout failure can result in significant revenue loss.

Read Post

WebSitePulse

Read more about What Broken Checkouts Really Cost: Why Transaction Monitoring Pays For Itself

How to Use MCP to Optimize Your Graylog Security Detections

Dec 15, 2025 By The Graylog Product Team In Graylog

Security teams face a critical question: “What logs should we collect, and what detections should we enable to protect against threats targeting our industry?” For a bank in the northeast, this isn’t academic. Threat groups like FIN7, Lazarus Group, and Carbanak specifically target financial institutions with sophisticated attacks ranging from SWIFT compromise to ransomware.

Read Post

Graylog

Read more about How to Use MCP to Optimize Your Graylog Security Detections

Overcoming ClickHouse's JSON constraints to build a high-performance JSON log store

Dec 15, 2025 By Elizabeth Mathew In SigNoz

Customer logs data is always messy. Being (and building!) an observability platform, we get to see all the beautiful, creative ways it can be messy, every single day. And yet, our customers expect, quite fairly, I might add, perfect query results and peak performance. Info SigNoz is an open-source observability platform that can be your one-stop solution for logs, metrics and traces.

Read Post

SigNoz

Read more about Overcoming ClickHouse's JSON constraints to build a high-performance JSON log store

How to Track Cloud Costs in Real-Time Instead of Waiting Days

Dec 15, 2025 By Datadog In Datadog

Tired of waiting days to see your AWS bill spike? Datadog solved this problem using Apache Iceberg to deliver real-time cloud cost visibility - updating every 15 minutes instead of waiting for billing data. Here's how it works: They sync real-time resource inventory (EC2 instances, Kubernetes pods) into Iceberg tables, then use Trino to join those snapshots with unit pricing data. The result? FinOps teams can catch cost anomalies before they become budget disasters.

View Video

Datadog

Read more about How to Track Cloud Costs in Real-Time Instead of Waiting Days

AI Observability in 2026: Why the data layer means everything

Dec 15, 2025 By Coralogix Team In Coralogix

If there was ever a year for AI observability, it was 2025. Vendors released assistants to cover a variety of use cases. Coralogix released the first agent (distinct from assistants!), Olly, an autonomous, multi-agent observability platform. The direction of travel is clear, but many vendors and users are about to run into some significant problems with their data layer.

Read Post

Coralogix

Read more about AI Observability in 2026: Why the data layer means everything

Complete web Performance Strategy with Web Synthetic Monitoring

Dec 15, 2025 By Dotcom-Monitor In Dotcom-Monitor

You’ve optimized your code, implemented caching strategies, and configured your CDN perfectly. Your analytics dashboard shows respectable load times, and your development team reports everything is running smoothly. Yet, conversion rates remain stagnant, bounce rates climb during peak hours, and your competitors consistently outperform you in user experience metrics. What’s missing?

Read Post

Dotcom-Monitor

Read more about Complete web Performance Strategy with Web Synthetic Monitoring

AI can do what now?! Multiply your defense impact

Dec 15, 2025 By Elastic In Elastic

With AI at your side, you can multiply your defense impact. GenAI acts as your smart assistant, helping you: ⦁ Distill and summarize alerts⦁ Create and convert detection logic⦁ Build threat hunting queries⦁ Expedite investigations⦁ Support incident response⦁ And more!

View Video

Elastic

Read more about AI can do what now?! Multiply your defense impact

Top OpenTelemetry Backends for Storage & Visualization

Dec 15, 2025 By Vladimir Mihailenco In Uptrace

OpenTelemetry backends provide storage, analysis, and visualization for telemetry data (traces, metrics, logs). This guide lists available OpenTelemetry-compliant backend options, categorized by use case: APM platforms, storage backends, visualization tools, and distributed tracing systems. For detailed comparison, see OpenTelemetry Backend Comparison.

Read Post

Uptrace

Read more about Top OpenTelemetry Backends for Storage & Visualization

Accelerating IT Transformation with Agentic AI

Dec 15, 2025 By Digitate In Digitate

As enterprises face increasing pressure to manage vast and complex IT environments, the demand for faster and more efficient IT management is rising. Traditional operating methods are proving insufficient, making the adoption of Agentic AI essential for organizations aiming to achieve truly autonomous IT operations. This innovative technology enhances decision-making and enables businesses to remain agile in a rapidly evolving digital landscape.

Read Post

Digitate

Read more about Accelerating IT Transformation with Agentic AI

From performance to impact: Bridging frontend teams through shared context

Dec 15, 2025 By Addie Beach In Datadog

Connecting day-to-day development work to real user outcomes can be challenging. As a result, engineers and product teams often struggle to effectively prioritize projects together. While the goal of improving user experience (UX) is the same, each team relies heavily on different—and often siloed—forms of monitoring to understand their app, creating a disconnect in metrics and visualizations that can be hard to communicate.

Read Post

Datadog

Read more about From performance to impact: Bridging frontend teams through shared context

Monitor your Kubernetes operators to keep applications running smoothly

Dec 15, 2025 By David Lentz In Datadog

The performance of your Kubernetes operators often influences the behavior of the applications they manage. Operators automate the day-to-day management of your applications by executing critical activities, which may include scaling replicas, performing upgrades, and recovering from failures. For example, a PostgreSQL operator can ensure that standby servers are always deployed, that the database’s failover is correctly configured, and that data is backed up on schedule.

Read Post

Datadog

Read more about Monitor your Kubernetes operators to keep applications running smoothly

Beyond the Dashboard: Integrating Network Monitoring with Your IT Ecosystem

Dec 15, 2025 By Progress WhatsUp Gold In WhatsUp Gold

Discover how Progress WhatsUp Gold network monitoring can be extended with built-in and community-driven integrations by joining us for our webinar, Beyond the Dashboard: Integrating Network Monitoring with Your IT Ecosystem. Our product experts will showcase: NetBox-WUG Sync for automated asset management WhatsUp Gold PS PowerShell module for scripting with the REST API Native and custom integrations with ServiceNow, Microsoft Teams and Slack.

View Video

WhatsUp Gold

Read more about Beyond the Dashboard: Integrating Network Monitoring with Your IT Ecosystem

Reporting Exceptions to Honeycomb with Frontend Observability

Dec 15, 2025 By Ken Rimple In Honeycomb

So you've built a client application and you've started sending telemetry. The information sent back by this client is vital to you, and one of the first things you care about is capturing and reporting errors. There are at least two ways to report error details in OpenTelemetry. Web applications generally place exceptions in trace spans as span events, and mobile applications send exceptions as log messages instead.

Read Post

Honeycomb

Read more about Reporting Exceptions to Honeycomb with Frontend Observability

Lean Operations for a Fragmented Middleware World: Why Efficiency, Resilience and Compliance Now Depend on a New Model

Dec 14, 2025 By meshIQ In meshIQ

Fragmented middleware estates create hidden costs, operational drag, and growing compliance risk. Learn why lean operations, unified visibility, and built-in auditability are now essential for modern messaging and streaming environments.

Read Post

meshIQ

Read more about Lean Operations for a Fragmented Middleware World: Why Efficiency, Resilience and Compliance Now Depend on a New Model

The latest in Cribl Packs

Dec 13, 2025 By Cribl In Cribl

Join us to chat about the latest Cribl Packs release, how AI comes into play, and a demo to see it all in action.

View Video

Cribl

Read more about The latest in Cribl Packs

Graylog Guided Demo

Dec 13, 2025 By Graylog In Graylog

Have a sneak peek at Graylog V7.0. Graylog V7.0 introduces a major step forward in speed, usability, and visibility across your entire security and operations workflow. In this demo, we walk through the newest capabilities designed to help teams detect, investigate, and respond faster than ever. You’ll see how the updated interface streamlines daily tasks, how the enhanced search and pipeline tools simplify complex data handling, and how powerful additions like built-in correlation and modernized dashboards give you clearer insight with less effort.

View Video

Graylog

Read more about Graylog Guided Demo

Scrapers Take Down GitHub: December 11 Outage Timeline

Dec 12, 2025 By Colin Bartlett In StatusGator

On December 11, 2025, GitHub experienced intermittent disruptions that frustrated users across the globe. Developers everywhere started seeing random errors, 503s, unicorns, and CI pipeline failures. Very quickly it became clear something was wrong, even though GitHub’s status page still said ALL SYSTEMS OPERATIONAL. After the incident was over, GitHub published a postmortem that revealed the cause: scrapers. Automated tools hit GitHub with enough traffic to overwhelm key backend systems.

Read Post

StatusGator

Read more about Scrapers Take Down GitHub: December 11 Outage Timeline

The Impact of Network Downtime on Enterprise Productivity - and How Monitoring Helps

Dec 12, 2025 By Arpit Sharma In Motadata

Enterprise IT teams operate under relentless pressure to maintain seamless connectivity, yet many business leaders underestimate the financial gravity of Network Downtime. Studies consistently show that even a brief outage can cost enterprises hundreds of thousands of dollars per hour, positioning downtime as one of the most disruptive threats to business continuity.

Read Post

Motadata

Read more about The Impact of Network Downtime on Enterprise Productivity - and How Monitoring Helps

Grafana Tempo: Upcoming 2.10/3.0 Releases (Community Call December 2025)

Dec 12, 2025 By Grafana In Grafana

Upcoming 2.10/3.0 Releases New maintainer, Oleg Have questions? Please bring them! Can't comment in the chat? You may need to create a channel -- you can do this by clicking your photo in the top right corner. Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, traces, and profiles. Our forever-free tier includes access to 10k metrics, 50GB logs, 50GB traces and more.

View Video

Grafana

Read more about Grafana Tempo: Upcoming 2.10/3.0 Releases (Community Call December 2025)

k8s-monitoring-helm Chart Office Hours (December 2025)

Dec 12, 2025 By Grafana In Grafana

In the December edition of the Kubernetes Monitoring Helm chart office hours, we discuss the version 3.6 release, the upcoming releases and features, and we include special guest, Matt Nolf, to tell us about Database Observability.

View Video

Grafana

Read more about k8s-monitoring-helm Chart Office Hours (December 2025)

Obkio 2025 Year in Review

Dec 12, 2025 By Andrii Kernitskyi In Obkio

2025 was big! This year, we stopped talking about what Obkio could be and started showing what it is: a full network observability platform built for the networks you actually run. We released features that solve real problems. We showed up where network pros gather. And we proved that a Canadian-built tool can compete with anyone. Here's what happened.

Read Post

Obkio

Read more about Obkio 2025 Year in Review

Using AI + Rollbar's Session Replay to Understand Complex Errors

Dec 12, 2025 By Rollbar In Rollbar

Front‑end bugs are notoriously hard to reproduce. By the time an error shows up in your monitoring tool, the most important context is already gone: what the user actually did. Session replay helps—but only if someone has the time and patience to scrub through recordings, correlate events, and form a hypothesis. That’s where Rollbar’s MCP server, paired with an AI agent like Github Copilot, changes the game.

Read Post

Rollbar

Read more about Using AI + Rollbar's Session Replay to Understand Complex Errors

OTel Updates: OpenTelemetry Proposes Changes to Stability, Releases, and Semantic Conventions

Dec 12, 2025 By Anjali Udasi In Last9

Over the past year, the Governance Committee ran user interviews and surveys with organizations deploying OpenTelemetry at scale. A few patterns came up consistently: Stability levels aren't always obvious. When you install an OTel distribution, some components might be experimental or alpha without clear markers. This makes it harder to evaluate what's production-ready. Instrumentation libraries sometimes wait on semantic conventions.

Read Post

Last9

Read more about OTel Updates: OpenTelemetry Proposes Changes to Stability, Releases, and Semantic Conventions

How to Handle Cloud Monitoring Overload?

Dec 12, 2025 By Anjali Udasi In Last9

Reduce alert noise by 70% through intelligent aggregation, clear ownership boundaries, and filtering metrics that don't map to user-facing issues. Monitoring starts with a straightforward goal: understand your system's health and identify issues before users notice them. You set up metrics, create dashboards, and configure some alerts. At first, it works well. Over time, your stack gets bigger and more complicated. New services get added.

Read Post

Last9

Read more about How to Handle Cloud Monitoring Overload?

Let's Encrypt 45-Day Certificate Expiration: Monitoring & More

Dec 12, 2025 By Dotcom-Monitor In Dotcom-Monitor

The move by Let’s Encrypt from 90-day certificates to 45-day certificates is more than a policy shift. It changes how teams must manage renewals, detect failures, and validate that certificates are deployed consistently across distributed systems. A shorter lifecycle compresses the margin of error. Automation that previously limped along unnoticed now breaks on a far tighter schedule. And every misconfiguration hits users faster.

Read Post

Dotcom-Monitor

Read more about Let's Encrypt 45-Day Certificate Expiration: Monitoring & More

How AI Agents automate incident response #ai #cybersecurity #telemetry

Dec 12, 2025 By Cribl In Cribl

Clint Sharp demonstrates how Cribl Search leverages AI to streamline incident investigation. Starting from a Slack channel, the AI builds an interactive notebook, analyzes order processing logs, and identifies suspicious traffic spikes. It connects high CPU usage to a recent Jenkins deployment, hypothesizing a supply chain attack, and ultimately recommends a rollback. This isn't a far off concept. It is the future of operations arriving right now.

View Video

Cribl

Read more about How AI Agents automate incident response #ai #cybersecurity #telemetry

Why AI agents need a common data model #ai #telemetry

Dec 12, 2025 By Cribl In Cribl

Clint Sharp explains why a common model like OCSF is critical for the future of AI. Agents need standardized data to analyze information effectively on your behalf. He contrasts the traditional manual workflow of checking Slack, tickets, and wikis while asking colleagues with a future where AI fuses this human context with machine data. Instead of just search results, AI agents will hand you examined hypotheses so you know exactly where to take your investigation.

View Video

Cribl

Read more about Why AI agents need a common data model #ai #telemetry

How to use AI to analyze and visualize CAN data with Grafana Assistant

Dec 12, 2025 By Martin Falch In Grafana

Note: A version of this post originally appeared on the CSS Electronics blog. Martin Falch, co-owner and head of sales and marketing at CSS Electronics, is an expert on CAN bus data. Martin works closely with end users, typically OEM engineers, across diverse industries, including automotive, maritime, and industrial. He is passionate about data visualization and AI—and he’s been working extensively with Grafana Assistant.

Read Post

Grafana

Read more about How to use AI to analyze and visualize CAN data with Grafana Assistant

Using AI + Rollbar's Session Replay to Understand Complex Errors

Dec 12, 2025 By Rollbar In Rollbar

Front‑end bugs are notoriously hard to reproduce. By the time an error shows up in your monitoring tool, the most important context is already gone: *what the user actually did*. By letting an AI agent like Copilot analyze Rollbar's session replay data directly, teams can move from *“something broke”* to *“here’s exactly why it broke”* in minutes, not hours.

View Video

Rollbar

Read more about Using AI + Rollbar's Session Replay to Understand Complex Errors

What else is new in #kubernetes 1.35? SPDY gets replaced with #websockets #kubernetesdeployment

Dec 12, 2025 By Sysdig In Sysdig

View Video

Sysdig

Read more about What else is new in #kubernetes 1.35? SPDY gets replaced with #websockets #kubernetesdeployment

How Aerospace Companies Use InfluxDB

Dec 12, 2025 By Charles Mahler In InfluxData

Over the past two decades, we’ve witnessed the instrumentation of virtually everything in the aerospace industry, from manufacturing floors to satellites orbiting Earth. And it’s no longer just NASA and other government organizations leading the charge. The commercial space industry has grown exponentially, with private companies developing everything from GPS satellites to electric VTOL aircraft.

Read Post

InfluxData

Read more about How Aerospace Companies Use InfluxDB

Elastic and Microsoft partnership achievements in 2025

Dec 12, 2025 By Jake Pollock In Elastic

Highlights of another successful year of customer-centric collaboration Once again, our partnership delivered an impressive year of innovation with Microsoft Azure, Azure AI Foundry, and Azure OpenAI. This blog highlights our continued collaboration with Microsoft to better serve customers throughout 2025 and our key moments at Microsoft Ignite.

Read Post

Elastic

Read more about Elastic and Microsoft partnership achievements in 2025

Major Cloud Outages of 2025

Dec 12, 2025 By Hrishikesh Barua In IncidentHub

Cloud outages in 2025 ranged from minor ones affecting some sections of users, to major ones affecting hundreds or thousands of users. Services like Cloudflare and AWS on which many other services depend experienced outages that affected many due to the cascading effect. Let's look at some of the major cloud outages in 2025.

Read Post

IncidentHub

Read more about Major Cloud Outages of 2025

Google SecOps Forwarder Deprecation: Migrate to Bindplane and OpenTelemetry

Dec 12, 2025 By Bindplane In ObservIQ

Google Cloud Security Operations is deprecating the legacy SecOps Forwarder, and OpenTelemetry with Bindplane is the official telemetry ingestion method. In this workshop, you’ll learn how to migrate from the SecOps Forwarder to Bindplane and OpenTelemetry Collectors, the officially supported ingestion model for Google SecOps going forward. We walk through the why, the what, and the how — with practical guidance you can apply immediately.

View Video

ObservIQ

Read more about Google SecOps Forwarder Deprecation: Migrate to Bindplane and OpenTelemetry

Agentic AI demands a new data architecture #ai #telemetry

Dec 12, 2025 By Cribl In Cribl

Clint Sharp explains why traditional schema-on-read systems cannot handle the query loads of the future. Agentic telemetry requires a 360-degree view, but structuring data only when you read it is too slow for AI-driven workloads. The solution is using LLMs to drive the cost of building parsers to near zero. Tools like Copilot Editor allow teams to map data to OCSF instantly, effectively building factories of parsers to handle the scale of agentic AI.

View Video

Cribl

Read more about Agentic AI demands a new data architecture #ai #telemetry

Microsoft Teams outage - December 10th, 2025

Dec 11, 2025 By Colin Bartlett In StatusGator

On the morning of December 10, 2025, Microsoft Teams experienced a service disruption affecting users across Australia. Although Microsoft 365 users reported issues across several apps, the hardest hit service was Microsoft Teams which became completely unusable for many organizations. While Microsoft did not acknowledge the incident until 03:46 UTC StatusGator identified the issue at 02:52 UTC through incoming outage reports and delivered an Early Warning Signal at 03:01 UTC.

Read Post

StatusGator

Read more about Microsoft Teams outage - December 10th, 2025

AI-Powered Observability: From Reactive to Predictive

Dec 11, 2025 By Rox Williams In Honeycomb

If there’s one thing clear from our AI-powered observability webinar, it’s that observability has officially graduated from a “nice-to-have” to a business-critical discipline, and AI is helping lead that charge. Our webinar brought together guest speaker Stephen Elliott, Group VP at IDC, and Ranbir Chawla, former SVP of Engineering at RB Global, for an hour of insights that mixed data, experience, and hard-won lessons from the trenches.

Read Post

Honeycomb

Read more about AI-Powered Observability: From Reactive to Predictive

Home Assistant Hardware: Requirements and Recommendations

Dec 11, 2025 By Community In InfluxData

Choosing the proper Home Assistant hardware can be overwhelming. Whether you’re new to home automation or a seasoned pro, the hardware you select can make or break your experience. This comprehensive guide will demystify the requirements, delve into the various options, and help you make an informed decision. From the compact Raspberry Pi to the powerful Intel NUC, we’ve got you covered. So, strap in, and let’s dive into the world of Home Assistant hardware!

Read Post

InfluxData

Read more about Home Assistant Hardware: Requirements and Recommendations

How to Build a Clear AI Implementation Strategy

Dec 11, 2025 By Nexthink In Nexthink

Organizations see AI’s transformative potential, but success requires more than technology – it demands a clear strategy led by IT. A structured AI implementation roadmap aligns initiatives with business goals, establishes governance, and enables measurable ROI, while improving employee and customer experiences. Yet, 66% of organizations view AI as critical, but only 38% report meaningful competitive advantage, highlighting the need for disciplined adoption.

Read Post

Nexthink

Read more about How to Build a Clear AI Implementation Strategy

Why Web Synthetic Monitoring essential for Modern Web Performance

Dec 11, 2025 By Dotcom-Monitor In Dotcom-Monitor

Your analytics dashboard is green, which indicates that your application is up 99.9% of the time, pages load in under three seconds on average, and conversion rates are stable. But here’s the uncomfortable reality, you’re probably missing 40% to 60% of the actual performance problems which impact real customers every day.

Read Post

Dotcom-Monitor

Read more about Why Web Synthetic Monitoring essential for Modern Web Performance

Bindplane Community Call in December 2025

Dec 11, 2025 By Bindplane In ObservIQ

Join us live on Wednesday, December 10th at 11am EDT for the December Community Call. We’ll cover: Hands-on demos of the new Bindplane features you’ve been asking for Recaps of KubeCon+CloudNativeCon NA in Atlanta New Bindplane feature guides and blog posts As always, we’ll wrap with an interactive Q&A, so bring your questions!

View Video

ObservIQ

Read more about Bindplane Community Call in December 2025

Application Monitoring 101: Decoding Throughput: Understanding the Signals Between Spikes and Drops

Dec 11, 2025 By Aspen Clevenger In Scout

Throughput is one of the most foundational metrics in application performance monitoring. It tells you how many requests your app is handling over time and offers a direct look at system load, responsiveness, and scalability. But throughput rarely speaks for itself. The key is knowing how to interpret it, and when to act. In this post, we’ll look at how throughput works in the real world: what healthy looks like, what broken looks like, and what lives in between.

Read Post

Scout

Read more about Application Monitoring 101: Decoding Throughput: Understanding the Signals Between Spikes and Drops

#kubernetes 1.35 is nearly out! Let's see what's new. #kubernetessecurity #kubernetesdeployment

Dec 11, 2025 By Sysdig In Sysdig

View Video

Sysdig

Read more about #kubernetes 1.35 is nearly out! Let's see what's new. #kubernetessecurity #kubernetesdeployment

Writing High Performance Queries in Sumo Logic - Customer Brown Bag - December 11th, 2025

Dec 11, 2025 By Sumo Logic, Inc. In Sumo Logic

Join us as Diego teaches how to build, optimize, and refine high-performing queries, including using key operators and the new Query Agent for natural-language searches.

View Video

Sumo Logic

Read more about Writing High Performance Queries in Sumo Logic - Customer Brown Bag - December 11th, 2025

Secure SSL Monitoring Software: A Complete Guide to Safe & Automated Certificate Management

Dec 11, 2025 By Dotcom-Monitor In Dotcom-Monitor

Secure SSL monitoring software has become essential for every business that depends on websites, web applications, APIs, or cloud services. With increasing security threats, expired certificates, and hidden configuration mistakes, companies need reliable tools to ensure their SSL certificates stay valid, updated, and fully compliant. The right monitoring solution helps avoid service outages, failed transactions, and data breaches caused by unmanaged or forgotten certificates.

Read Post

Dotcom-Monitor

Read more about Secure SSL Monitoring Software: A Complete Guide to Safe & Automated Certificate Management

HTTP API vs REST API vs Web API: Architectures & How to Monitor Them

Dec 11, 2025 By Dotcom-Monitor In Dotcom-Monitor

APIs power everything. From login flows to checkout systems to internal microservice communication. But as teams scale, so does the confusion around the terminology: HTTP API vs REST API vs Web API. Many articles treat these as interchangeable, but the differences are real, and they affect reliability, performance, caching behavior, authentication flows, and ultimately how you monitor your endpoints.

Read Post

Dotcom-Monitor

Read more about HTTP API vs REST API vs Web API: Architectures & How to Monitor Them

AI Advisor: Automating Network Troubleshooting with AI Runbooks

Dec 11, 2025 By Phil Gervasi In Kentik

Kentik AI Runbooks are machine-readable instructions that codify tribal knowledge into specific diagnostic workflows. By guiding AI Advisor’s reasoning and tool selection, Runbooks turn alerts into actionable, automated investigations, dramatically accelerating MTTR.

Read Post

Kentik

Read more about AI Advisor: Automating Network Troubleshooting with AI Runbooks

Grafana Labs: Top 10 moments of 2025

Dec 11, 2025 By Kristin Knapp In Grafana

For Grafana Labs, 2025 was a year defined by innovation, growth, and the power of our community. We celebrated the release of Grafana 12 at our 10th annual GrafanaCON event, and marked major milestones across open source projects, including Mimir, k6, Beyla, Faro, and Alloy. It was also a year of taking bold steps forward in how teams interact with their systems and data.

Read Post

Grafana

Read more about Grafana Labs: Top 10 moments of 2025

How to Monitor VPN Performance for Remote Users

Dec 11, 2025 By Andrii Kernitskyi In Obkio

Remote workers depend on VPNs to access corporate resources. When VPN performance tanks, productivity stops. The problem? Most IT teams troubleshoot blindly. They can't tell if slow performance is caused by VPN encryption overhead, ISP issues, or corporate infrastructure problems. Here's the reality: Your remote workers are calling the help desk, saying "the VPN is slow", but you have no visibility into what's actually happening on their end. You're guessing. Maybe you ask them to restart their router.

Read Post

Obkio

Read more about How to Monitor VPN Performance for Remote Users

A better way to monitor your AI agents in .NET apps

Dec 11, 2025 By Alex Sohn In Sentry

We launched agent monitoring earlier this year, allowing our users to instrument LLM usage and tool calls in their applications. However, we only had Agent Monitoring support for Python and JavaScript. We’ve been working on creating an Agent Monitoring SDK for.NET — specifically for Microsoft.Extensions.AI.Abstractions.

Read Post

Sentry

Read more about A better way to monitor your AI agents in .NET apps

This Month in Datadog - December 2025

Dec 11, 2025 By Datadog In Datadog

For our last episode of 2025, we’re focusing on Datadog releases announced at AWS re:Invent. Join Jeremy to see how you can manage logs at petabyte scale in your infrastructure, eliminate unneeded costs in Amazon S3 buckets, build agentic workflows, and detect credential leaks. Later in the episode, Scott spotlights how you can connect your AI agents to Datadog tools and context with our MCP Server.

Read Post

Datadog

Read more about This Month in Datadog - December 2025

Highlights from AWS re:Invent 2025: Making sense of applied AI, trust, and going faster

Dec 11, 2025 By Andrew Krug In Datadog

After four days of AWS re:Invent—a 65,000-step marathon that included 60,000 attendees spread across five Las Vegas campuses—and navigating the latest installment of this 13-year-old cloud pilgrimage, we’re all a little dehydrated but significantly wiser. The volume of announcements felt less like a single flood and more like a river branching into three powerful currents. Making sense of this massive technological convergence requires zooming out.

Read Post

Datadog

Read more about Highlights from AWS re:Invent 2025: Making sense of applied AI, trust, and going faster

Planning a Smooth Cutover When You Change Critical Business Tools

Dec 11, 2025 By OpsMatters In OpsMatters

A cutover is when you change from an old IT system to a new one. With technology advancing at a rapid rate, more and more businesses are learning about and implementing cutovers. If now's the time for you to execute a cutover, it's important that you plan everything effectively so that it goes smoothly. Planning a smooth cutover is easier said than done, however. There are some important things you need to know first. Until you conduct extensive online research, you're never going to be able to effectively plan a cutover.

Read Post

OpsMatters

Read more about Planning a Smooth Cutover When You Change Critical Business Tools

3 Questions I Expect You to Ask Me

Dec 10, 2025 By Yann Guernion In Broadcom

As a product specialist, I’ve had countless conversations about network observability. I’ve seen the good, the bad, and the downright confusing. The market is flooded with vendors, all claiming to have the magic bullet for your network woes. Everywhere I go, the story is the same. The neat and tidy world of the on-premises data center is gone, replaced by a sprawling environment that stretches across multiple clouds, your own facilities, and out to the edge.

Read Post

Broadcom

Read more about 3 Questions I Expect You to Ask Me

Automate Weekly Rollbar Reports with Zapier + Google Sheets

Dec 10, 2025 By Rollbar In Rollbar

In this video, we cover how you can use Rollbar, Zapier AI, and Google Sheets to create a completely automated reporting pipeline—one that generates weekly reports of Rollbar occurrences, organizes them in Sheets, and arms PMs with insights they can use to guide roadmap decisions, reduce risk, and improve user experience.

View Video

Rollbar

Monitoring

Read more about Automate Weekly Rollbar Reports with Zapier + Google Sheets

Docker Logs Command Reference: tail, follow, since Options

Dec 10, 2025 By Alexandr Bandurchin In Uptrace

Managing Docker container logs is essential for debugging and monitoring application performance. Tailoring Docker logs allows for real-time insights, quick issue resolution, and optimized performance. This guide focuses on efficient methods for tailing Docker logs, with clear examples and command options to streamline log management.

Read Post

Uptrace

Read more about Docker Logs Command Reference: tail, follow, since Options

Observability trends for 2026: Maturity, cost control, and driving business value

Dec 10, 2025 By Elastic Observability Team In Elastic

The observability landscape has undergone a fundamental transformation over the past several years. In a recent report, The Landscape of Observability in 2026: Balancing Cost and Innovation conducted by Dimensional Research and sponsored by Elastic, over 500 IT decision-makers were surveyed. It revealed that observability has definitively transitioned from an optional capability to a mission-critical business function.

Read Post

Elastic

Read more about Observability trends for 2026: Maturity, cost control, and driving business value

Imposter Syndrome in Tech

Dec 10, 2025 By solarwindsinc In SolarWinds

Jon Collins of GigaOm has a nugget of wisdom everyone in tech needs to hear.

View Video

SolarWinds

Read more about Imposter Syndrome in Tech

Lightrun 'Runtime Context' Empowers AI Coding Agents to Build Software That Works in the Real World

Dec 10, 2025 By Gideon Freud In Lightrun

Safe, Direct Access to Runtime Code Across Staging, Pre-prod and Production via MCP Enables Fundamental Step Forward in Autonomous Software Delivery and Reliability for Enterprises NEW YORK, December 10, 2025 – Lightrun, a leader in software reliability, today launched its new Model Context Protocol (MCP) solution, enabling the industry’s first fully integrated Runtime Context for AI coding agents.

Read Post

Lightrun

Read more about Lightrun 'Runtime Context' Empowers AI Coding Agents to Build Software That Works in the Real World

Monitoring Node.js Express Application Performance with AppSignal

Dec 10, 2025 By Damilola Olatunji In AppSignal

As your application scales to serve hundreds, thousands, or even millions of users, understanding its performance becomes essential. Performance monitoring helps you make informed decisions based on data instead of guesswork or user complaints. Imagine users reporting that your app feels"slow". Without proper instrumentation and monitoring, you're left troubleshooting blindly.

Read Post

AppSignal

Read more about Monitoring Node.js Express Application Performance with AppSignal

SSL Certificate Management: A Complete Guide to Monitoring SSL Expiry, Validity & Certificate Health

Dec 10, 2025 By Dotcom-Monitor In Dotcom-Monitor

Managing SSL certificates is essential for maintaining trust, security, and uptime across any website or online service. While many people think SSL certificate management refers to renewing or issuing certificates, one of the most critical aspects,often overlooked,is monitoring certificates for expiry, validity, and unexpected changes. That’s the area where monitoring platforms provide their highest value.

Read Post

Dotcom-Monitor

Read more about SSL Certificate Management: A Complete Guide to Monitoring SSL Expiry, Validity & Certificate Health

Automate Weekly Rollbar Reports with Zapier + Google Sheets

Dec 10, 2025 By Rollbar In Rollbar

Product Managers thrive on clarity. But when it comes to understanding application errors and trends, Rollbar’s rich occurrence data can sometimes feel overwhelming. With AI by Zapier + Google Sheets, you can turn this into a completely automated reporting pipeline—one that generates weekly reports of Rollbar occurrences, organizes them in Sheets, and arms PMs with insights they can use to guide roadmap decisions, reduce risk, and improve user experience.

Read Post

Rollbar

Read more about Automate Weekly Rollbar Reports with Zapier + Google Sheets

Sage AI: Dashboard, events, knowledge base

Dec 10, 2025 By Lucian Daniliuc In Monitive

It's starting to take shape. We have a dashboard, we're collecting some metrics, and I'm getting a daily briefing every morning. Also, I have an event log where all the events are going into (the spine of the system), and there's a knowledge base which consists of a GitHub repository which is vectorized and indexed. Its first use is adding context to Herald, the agent that sends me the morning briefing. More details to come.

Read Post

Monitive

Read more about Sage AI: Dashboard, events, knowledge base

Prioritizing Bugs with Sentry Logs

Dec 10, 2025 By Sentry In Sentry

Learn how to use Sentry Logs to measure how often a bug occurs and which users it impacts. In this example, a React Native app with an Express.js backend crashes when the diet value becomes undefined. After identifying the root cause, we use Explore Logs to count how many times users switch their diet to “none,” filter the related log messages, and group results by user type to understand the impact.

View Video

Sentry

Read more about Prioritizing Bugs with Sentry Logs

Kentik in Motion: How AI Transforms Network Chaos to Clarity

Dec 10, 2025 By Kentik In Kentik

Learn how artificial intelligence is transforming network operations through Kentik's AI Advisor platform. Philip Gervasi and Sean McGinley discuss the evolution from traditional network visibility to network intelligence, emphasizing that AI should augment, rather than replace, network engineers. They demonstrate how Kentik's AI Advisor uses natural language interfaces to perform automated root cause analysis, troubleshooting, and cost optimization.

View Video

Kentik

Read more about Kentik in Motion: How AI Transforms Network Chaos to Clarity

Runtime Context for AI Agents with Lightrun MCP

Dec 10, 2025 By Lightrun In Lightrun

Introducing Runtime Context for AI agents The next evolution in autonomous software development. The Lightrun MCP connects IDEs and AI assistants to real runtime data, giving agents and developers the context they need to write, validate, and debug code with confidence. With Runtime Context, AI agents can: Reliable, AI-accelerated engineering starts here.

View Video

Lightrun

Read more about Runtime Context for AI Agents with Lightrun MCP

Agentic AI by Design: Evolving Our Principles for the Next Chapter of Responsible AI

Dec 10, 2025 By solarwindsinc In SolarWinds

Join SolarWinds CISO Tim Brown and CTO Sai Krishna for the SolarWinds Day Closing Keynote, where they share how SolarWinds is evolving from Secure by Design to AI by Design—a bold next step in building trusted, intelligent, and future-ready IT operations. As organizations adopt AI-driven systems, embedding trust, transparency, and accountability into product development becomes essential. In this forward-looking discussion, Tim and Sai reveal how the AI by Design framework ensures responsible AI adoption while enhancing performance, reliability, and security.

View Video

SolarWinds

Read more about Agentic AI by Design: Evolving Our Principles for the Next Chapter of Responsible AI

Become a 10x investigator with Cribl Notebooks

Dec 10, 2025 By Cribl In Cribl

Cribl Notebooks aims to streamline the investigation process by bringing everything into a single interactive interface. It functions as a virtual war room where teams can collaborate in real time. You can view AI queries and code alongside charts without switching between scattered tabs or workstations. This persistence makes it easier to document the root cause and share the story behind the data.

View Video

Cribl

Read more about Become a 10x investigator with Cribl Notebooks

How Datadog Manages 50,000 Apache Iceberg Tables at Scale

Dec 10, 2025 By Datadog In Datadog

Think managing a few database tables is hard? Try 50,000 production Iceberg tables storing petabytes of data with 8 million scans per day. In this clip, Datadog's platform team reveals the architecture choices behind their managed Iceberg implementation that serves hundreds of internal engineering teams.

View Video

Datadog

Read more about How Datadog Manages 50,000 Apache Iceberg Tables at Scale

Datadog at AWS re:Invent, Bits AI SRE, MCP Server, CloudPrem, and more | This Month in Datadog

Dec 10, 2025 By Datadog In Datadog

Get a closer look at features we announced at AWS re:Invent in the latest episode of This Month in Datadog. Tune in for spotlights of Bits AI SRE, now generally available, and Datadog’s MCP Server, which connects AI agents to our platform by ingesting prompts and mapping them to Datadog resources and data. Plus, we cover how to: This Month in Datadog brings you the latest updates on our newest product features, announcements, resources, and events.

View Video

Datadog

Read more about Datadog at AWS re:Invent, Bits AI SRE, MCP Server, CloudPrem, and more | This Month in Datadog

Fixing Performance Issues Fast with Logs & Tracing

Dec 10, 2025 By Sentry In Sentry

Learn how to quickly track down performance bottlenecks using Sentry Logs and Tracing. In this video, we walk through identifying a slow screen, jumping into the connected trace, and pinpointing slow backend steps, database calls, and AI/LLM operations. See how logs, issues, and traces work together to show the full picture of what happened in a single session.

View Video

Sentry

Read more about Fixing Performance Issues Fast with Logs & Tracing

Expose Hidden State Bugs with Sentry Logs

Dec 10, 2025 By Sentry In Sentry

See how Sentry Logs can surface hidden state bugs that stack traces alone can’t explain. In this walkthrough, we debug a React Native app with an Express.js backend where a missing diet value causes a crash. We inspect the issue, pull in the connected logs, and confirm whether the problem comes from an initial render or from real backend data. By combining issues, traces, and logs from the same session, you get the full story—and a faster path to the fix.

View Video

Sentry

Read more about Expose Hidden State Bugs with Sentry Logs

Introducing Workspace: Where DEX Work Happens

Dec 10, 2025 By Pedro Bados In Nexthink

Today marks another milestone for Nexthink as we introduce a powerful evolution of our platform, one that will meaningfully expand how customers derive value and empower many more teams across IT, HR, and the business to use Infinity. Welcome to Workspace: a new destination where the future of DEX and IT work comes together.

Read Post

Nexthink

Read more about Introducing Workspace: Where DEX Work Happens

Building a Stronger Defense with Network Observability and Real-Time Monitoring

Dec 10, 2025 By OpsMatters In OpsMatters

In today's rapidly evolving digital landscape, the importance of network security and performance has never been more pronounced. Businesses are increasingly relying on their network infrastructure to support a wide array of critical applications, services, and user activities. As cyber threats become more sophisticated and network architectures more complex, maintaining visibility into network performance and security is essential. This is where a network observability platform becomes indispensable.

Read Post

OpsMatters

Read more about Building a Stronger Defense with Network Observability and Real-Time Monitoring

FinOps Insights for IT Leaders

Dec 9, 2025 By Kristy Slimmer In Galileo

FinOps insights for IT leaders often focus on cloud spend, but IT leaders know that real cost drivers extend across hybrid environments. Achieving clarity requires more than budget reports. It requires understanding how workloads behave over time, how performance and capacity shift, and where visibility gaps hide operational and financial risk. To support those efforts, we sat down with Tim Conley, creator of Galileo, to explore practical FinOps insights for IT leaders.

Read Post

Galileo

Read more about FinOps Insights for IT Leaders

How to Track Down the Real Cause of Sudden Latency Spikes

Dec 9, 2025 By Anjali Udasi In Last9

Start with distributed tracing to find which service is slow, then use continuous profiling to see why the code is slow, and finally apply high-cardinality analysis to identify which users or conditions trigger the problem. It's 2 AM. Your phone buzzes. Users are reporting timeouts. The metrics dashboard shows p99 latency spiking from 200ms to 4 seconds, but everything looks normal—CPU at 60%, memory stable, no error spikes. A quick pod restart helps briefly, then latency climbs right back up.

Read Post

Last9

Read more about How to Track Down the Real Cause of Sudden Latency Spikes

Elastic's move to free on-demand training

Dec 9, 2025 By Nick Mezhir In Elastic

Students can now learn what they need within the Elastic stack anytime. The Elastic Training team has shifted its on-demand training strategy from paid to free! Yes, you heard that right — complimentary on-demand training is now readily available to everyone. The Elastic Training team is continuously developing and releasing bite-sized training modules designed to align with Elastic solutions and highlight key features.

Read Post

Elastic

Read more about Elastic's move to free on-demand training

Improve Your Observability With This CPU Metric

Dec 9, 2025 By Coroot In Coroot

🐧🐝 Learn what classic CPU metrics are (Load average, Node usage, and Container CPU usage) and why Delay Accounting can provide better, kernel-level insights into your system: https://t.ly/HQrWx

#DevOps #Kubernetes #AI #tech #observability #Linux #eBPF #Sysadmin #Cloud #Monitoring

View Video

Coroot

Read more about Improve Your Observability With This CPU Metric

Bindplane in 12 Minutes: A Complete Overview of the Telemetry Pipeline for OpenTelemetry at Scale

Dec 9, 2025 By Bindplane In ObservIQ

Bindplane is a unified telemetry pipeline that helps teams cut observability spend by 50% or more. In this overview, you will learn how to route telemetry from any source to any destination, manage large fleets of OpenTelemetry Collectors, and gain real visibility into collector health, state, throughput, and routing behavior.

View Video

ObservIQ

Read more about Bindplane in 12 Minutes: A Complete Overview of the Telemetry Pipeline for OpenTelemetry at Scale

How to Check SSL Certificate Expiration Date: Complete Guide to SSL Monitoring

Dec 9, 2025 By Dotcom-Monitor In Dotcom-Monitor

SSL certificates are critical for securing websites, web applications, and APIs. They encrypt data in transit, verify server authenticity, and build user trust. However, SSL certificates have a limited lifespan, typically ranging from 90 days to one year. When a certificate expires, visitors encounter security warnings, some services stop working, and it can affect search engine rankings. Monitoring SSL certificate expiration is essential to maintain secure and uninterrupted online services.

Read Post

Dotcom-Monitor

Read more about How to Check SSL Certificate Expiration Date: Complete Guide to SSL Monitoring

Ultimate Guide to DevOps API Monitoring for Modern SaaS Teams

Dec 9, 2025 By Dotcom-Monitor In Dotcom-Monitor

APIs form the operational backbone of SaaS platforms. They authenticate users, deliver application data, process transactions, and connect multiple services into a cohesive ecosystem. When an API slows down or fails, the impact is immediate: login delays, frozen dashboards, broken customer workflows, and degraded user experience. For DevOps teams, this means monitoring must go far beyond checking status codes.

Read Post

Dotcom-Monitor

Read more about Ultimate Guide to DevOps API Monitoring for Modern SaaS Teams

Cribl Search Pack for AWS WAF

Dec 9, 2025 By Cribl In Cribl

Get visibility into your AWS WAF logs — without shipping data to a SIEM or building dashboards from scratch. In this video, we walk through the Cribl Search Pack for AWS WAF, letting you search and visualize WAF logs directly in Amazon S3 using search-in-place.

View Video

Cribl

Read more about Cribl Search Pack for AWS WAF

Part 2: What If Automation Didn't Just Execute Tasks but Earned Our Trust While It Worked?

Dec 9, 2025 By ScienceLogic In ScienceLogic

Every leap forward in technology begins with a question that feels almost human in its curiosity. In this series, we’re examining those questions, the ones that reveal where intelligence meets intention. If data was the foundation of understanding in our first conversation, automation is where that understanding begins to act.

Read Post

ScienceLogic

Read more about Part 2: What If Automation Didn't Just Execute Tasks but Earned Our Trust While It Worked?

Datadog on Apache Iceberg

Dec 9, 2025 By Datadog In Datadog

Historically, Datadog has relied on technologies like Snowflake and Apache Spark on raw parquet files (lacking consistent table structure) to power internal analytics and data science at scale. As usage grew across product teams, more features depended on data science teams, and our datasets grew to include more telemetry data, these systems became complex to manage and govern both technically and financially. The need for a more flexible and scalable solution led Datadog to adopt Apache Iceberg, an open source table format for data lakes that brings reliability and performance while remaining SQL-friendly.

View Video

Datadog

Read more about Datadog on Apache Iceberg

Configuring the Alerting Plugin in InfluxDB 3

Dec 9, 2025 By Allyson Boate In InfluxData

Monitoring starts with data, but action depends on timely alerts. When an alerting workflow relies on scheduled queries or external checks, engineers miss short windows where values shift and conditions form. The alerting plugin closes that gap by evaluating alert rules inside InfluxDB 3 as new values arrive, enabling faster detection and more responsive monitoring.

Read Post

InfluxData

Read more about Configuring the Alerting Plugin in InfluxDB 3

Bindplane | Notifications

Dec 9, 2025 By Bindplane In ObservIQ

Real-time alerts for your telemetry pipelines are here. In this quick overview, you’ll learn about the new Notifications panel in Bindplane. This update gives you real-time visibility into key changes across your configurations, fleets, and agents so nothing slips through the cracks. You’ll see how Notifications helps you stay ahead of: This new feature centralizes alerts you’d otherwise miss — making Bindplane easier to operate at scale. Email, Slack, and webhook notifications are also on the way.

View Video

ObservIQ

Read more about Bindplane | Notifications

Bindplane | Fleets

Dec 9, 2025 By Bindplane In ObservIQ

Bindplane is introducing Fleets — a brand-new way to organize, manage, and operate large groups of OpenTelemetry Collectors at scale. In this video, Ryan walks through how Fleets simplify the way you group agents, roll out configuration changes, monitor health, and keep your entire collector fleet up to date.

View Video

ObservIQ

Read more about Bindplane | Fleets

Keep service ownership up to date with Datadog Teams' GitHub integration

Dec 9, 2025 By Roxanne Moslehi In Datadog

Engineering organizations depend on clear team ownership to maintain reliable services and move quickly. But as codebases expand and teams shift, answering basic questions—Who owns this service? Who should be paged in an incident? Are teams meeting operational standards?—becomes harder.

Read Post

Datadog

Read more about Keep service ownership up to date with Datadog Teams' GitHub integration

Web API Sample Endpoints to Practice Monitoring & Testing

Dec 9, 2025 By Dotcom-Monitor In Dotcom-Monitor

APIs rarely fail in isolation. They fail under load, during token refresh, when a dependent service slows down, or when a multi-step workflow breaks halfway through. And yet most engineers still test and monitor APIs using mock endpoints that behave nothing like the real thing.

Read Post

Dotcom-Monitor

Read more about Web API Sample Endpoints to Practice Monitoring & Testing

Monitor One Icinga 2 Cluster From Another

Dec 9, 2025 By Alvar Penning In Icinga

Icinga is designed to be a highly dynamic monitoring software that can monitor your setup, regardless of its architecture. While most setups are hierarchical and fit well into the master, satellites, and agents scheme with different zones, it is sometimes impractical or impossible to create one large Icinga 2 cluster. Imagine that you are responsible for only some hosts within another organization.

Read Post

Icinga

Read more about Monitor One Icinga 2 Cluster From Another

HPE OpsRamp Software Named a Major Player in the IDC MarketScape for Worldwide Observability Platforms 2025

Dec 9, 2025 By Deepak Jannu In OpsRamp

Observability platforms help IT teams continuously monitor service health and performance, driving superior service quality and customer experience. Access to deeper diagnostics and actionable insights from observability tools lets IT operators drive scalability, resilience, and service reliability across complex, distributed environments.

Read Post

OpsRamp

Read more about HPE OpsRamp Software Named a Major Player in the IDC MarketScape for Worldwide Observability Platforms 2025

M-Dashes, the Cookie Monster & DEX: The BIG Reality Bites 2025 Finale

Dec 9, 2025 By Nexthink In Nexthink

It’s our favorite Reality Bites tradition: the end-of-year panel! Tom and Tim bring the whole crew together—Megan, Ariana, Sean, and Dina—for a joyful, honest, and insight-packed reflection on 2025. From global travel and AI breakthroughs to personal milestones, hard-won lessons, and the music that carried us through the year, the team shares what defined a transformative moment for DEX, for Nexthink, and for each of us. Expect candid takes on AI balance, ambition, slop, mediation, vibe-coding, human connection—and a full round of “song of the year” picks from the whole panel. A warm, funny, heartfelt wrap to a huge year.

View Video

Nexthink

Read more about M-Dashes, the Cookie Monster & DEX: The BIG Reality Bites 2025 Finale

Ep 22: re:Invent recap

Dec 9, 2025 By Sumo Logic, Inc. In Sumo Logic

In this episode of Masters of Data, we're breaking down AWS re:Invent 2025 through David's eyes (and probably a few cups of conference coffee). We dive into the massive crowds, killer customer conversations, and product demos that actually worked—because we're all about building real tech, not smoke-and-mirrors clickbait. David geeks out over Mobot, our AI tool that's making workflows smoother (not just another chatbot in disguise), and how attendees couldn't get enough of the live demos. We also throw some shade at the AI-washing epidemic and dig into why practical AI applications in security and observability actually matter.

View Video

Sumo Logic

Read more about Ep 22: re:Invent recap

Is your DR plan just wishful thinking? Prove your resilience with chaos engineering

Dec 9, 2025 By Deepanshu Kalra In Google Operations

Controlled chaos engineering experiments that simulate real-world disasters quantitatively measure the impact of failures on system performance.

Read Post

Google Operations

Read more about Is your DR plan just wishful thinking? Prove your resilience with chaos engineering

Introducing MetrixInsight for XenServer SCOM Management Pack

Dec 8, 2025 By GripMatix In GripMatix

Citrix XenServer is increasingly becoming the strategic hypervisor of choice for organizations running Citrix VAD and DaaS workloads. With XenServer Premium Edition now included in Citrix subscriptions, it offers a more aligned, predictable, and cost-effective platform, without compromising on stability, performance, or capabilities. A critical part of enabling that transition is delivering the right level of monitoring and operational control.

Read Post

GripMatix

Read more about Introducing MetrixInsight for XenServer SCOM Management Pack

Unified network performance monitoring reports for compliance

Dec 8, 2025 By Rama Venkatesan In Site24x7

Compliance audits can be stressful when your performance data and configuration logs live in separate tools. Site24x7 brings everything together in a single view, helping you track every device, configuration, and compliance status in one place. Unified reports make it easy to trace what changed, when it changed, and who changed it—giving you a clear line of sight for every audit and investigation.

Read Post

Site24x7

Read more about Unified network performance monitoring reports for compliance

Why FedRAMP In Process Matters for Federal Customers

Dec 8, 2025 By Cribl In Cribl

Chris Ebley from Blackwood explains why FedRAMP In Process is a major milestone. It gives federal teams confidence that the product can handle sensitive data, meets strict security controls, and comes from a company committed to operating at the maturity level the government expects. This opens new go to market opportunities and makes it easier for agencies to move forward with Cribl.

View Video

Cribl

Read more about Why FedRAMP In Process Matters for Federal Customers

Coralogix in G2 Winter 2026: Momentum, Progress, and 192 Badges

Dec 8, 2025 By Coralogix Team In Coralogix

As we wrap up 2025 and slowly come down from the re:Invent high, we’ve got one more reason to keep the celebration going. Coralogix has earned 192 badges in the G2 Winter 2026 reports and secured the position in the Momentum Grid Report for Observability Software. It is a strong finish to the year and a clear reflection of the steady progress the platform has been making.

Read Post

Coralogix

Read more about Coralogix in G2 Winter 2026: Momentum, Progress, and 192 Badges

Why should you demand OpAMP support from your vendor?

Dec 8, 2025 By Chris Cooney In Coralogix

Fleet management is the practice of monitoring and configuring your fleet of agents and collectors. Key functionality includes: Fleet management is the hallmark of an organisation that has realised the great importance of a healthy telemetry pipeline, and has taken steps to ensure that collectors & agents are every bit as robust as the production architecture for which they are responsible.

Read Post

Coralogix

Read more about Why should you demand OpAMP support from your vendor?

Why Cribl Lake Delivers the Best Price Performance for AI Workloads #ai #telemetry

Dec 8, 2025 By Cribl In Cribl

CMO Abby Strong explains how Cribl Lake is built for the real demands of modern AI. You get fast storage for high performance workloads and efficient architecture that scales without blowing up your budget. A smarter foundation for the AI era.

View Video

Cribl

Read more about Why Cribl Lake Delivers the Best Price Performance for AI Workloads #ai #telemetry

Seeing Everything: Shedding Light on Shadow IT and AI Usage

Dec 8, 2025 By Teneo In Teneo

I still remember the working with a leading insurance provider on an internal review of their IT estate and discovering a team quietly using an unapproved SaaS tool to speed up their reporting. It wasn’t malicious, they were trying to solve a problem faster. But as we stared at the dashboard, I could see the CIO’s mind racing: What data had they uploaded? Was it encrypted? Were they still compliant?

Read Post

Teneo

Read more about Seeing Everything: Shedding Light on Shadow IT and AI Usage

Bindplane in 200 Seconds: Windows Event Logs & Google SecOps

Dec 8, 2025 By Bindplane In ObservIQ

Learn how to configure Bindplane to collect and route Windows Event Logs from a Windows VM into Google SecOps. In this 200 second onboarding walkthrough, Chelsea shows how to build and configure a full SecOps-ready pipeline in just a few minutes. You’ll see how to: Create a Configuration Add the Windows Event Log source Configure the Google SecOps destination Roll out the configuration to an agent running on a Windows VM Start receiving security telemetry inside SecOps.

View Video

ObservIQ

Read more about Bindplane in 200 Seconds: Windows Event Logs & Google SecOps

Using Traces, Metrics, and Logs All in One Place, as Demonstrated by Pipeline Builder

Dec 8, 2025 By Tyler Helmuth In Honeycomb

When troubleshooting complex software, it’s important to be able to gain insight via its telemetry quickly and precisely. No one wants to waste time switching between tools or worrying about how to interact with different types of data. At Honeycomb, all your data is available in one place, accessible via our fast query engine. But what does that look like in practice?

Read Post

Honeycomb

Read more about Using Traces, Metrics, and Logs All in One Place, as Demonstrated by Pipeline Builder

Bindplane | Filter by Condition

Dec 8, 2025 By Bindplane In ObservIQ

Bindplane Growth Feature Guide: Filter by Condition — How to Reduce Noise & Control Your Telemetry Pipeline In this walkthrough, Chelsea from the Bindplane Customer Success team shows how to use Filter by Condition, part of Bindplane’s Growth-tier features, to reduce noisy telemetry, improve signal quality, and cut observability costs.

View Video

ObservIQ

Read more about Bindplane | Filter by Condition

Meet Web Vitals Performance Issues

Dec 8, 2025 By Ben Coe In Sentry

We’ve introduced a new type of Performance Issues, Web Vitals Performance Issues. These issues will be opened for the highest opportunity pages in your application if your Web Vitals metrics drop into our meh, or poor thresholds for performance. We’ve built these issues with Seer Issue Fix specifically in mind. Our goal is to not just alert you about low vitals scores, we want to give you actionable steps you can take to improve your scores and, when possible, fix the problem for you.

Read Post

Sentry

Read more about Meet Web Vitals Performance Issues

Faster, Simpler Root Cause Analysis with AI

Dec 8, 2025 By Coroot In Coroot

Incidents can quickly become costly, and digging through overwhelming amounts of telemetry can take hours. AI-Powered Root Cause Analysis automatically identifies the root cause of an incident and suggests fixes in seconds, so your team can get back to development (or if they’re on call at 3am, back to sleep.)

View Video

Coroot

Read more about Faster, Simpler Root Cause Analysis with AI

Bindplane Onboarding | Install Your First OTel Collector & Send Windows Events to Google SecOps

Dec 8, 2025 By Bindplane In ObservIQ

In this 10-minute step-by-step walkthrough, Chelsea from the Bindplane Customer Success team shows you how to install your first Bindplane OpenTelemetry Collector and start sending Windows Event telemetry from a Windows VM directly into Google SecOps.

View Video

ObservIQ

Read more about Bindplane Onboarding | Install Your First OTel Collector & Send Windows Events to Google SecOps

Solve bandwidth issues quickly with NetFlow reports

Dec 8, 2025 By ManageEngine Site24x7 In Site24x7

Gain complete visibility into your bandwidth usage with network traffic monitoring reports in Site24x7. In this video, we walk you through the key reports that turn raw traffic data into actionable insights—helping you troubleshoot issues faster, optimize bandwidth, and strengthen security. You'll learn: With these reports, you’ll always know what’s happening on your network—and how to respond before minor issues escalate.

View Video

Site24x7

Read more about Solve bandwidth issues quickly with NetFlow reports

AI-Driven Database Monitoring for Modern IT Teams | Site24x7

Dec 8, 2025 By ManageEngine Site24x7 In Site24x7

Databases power every business, but keeping them fast, reliable, and scalable is a daily challenge for IT teams. Discover how intelligent database monitoring helps you uncover performance bottlenecks, optimize queries, and maintain database health effortlessly. Whether you manage SQL or NoSQL systems, gain actionable insights across your infrastructure before issues affect your applications or users.

View Video

Site24x7

Read more about AI-Driven Database Monitoring for Modern IT Teams | Site24x7

Transaction Check Basics in less than 3 minutes

Dec 6, 2025 By Uptime Website Monitoring In uptime

In this video, we explore the basics of Transaction Checks on Uptime.com, an advanced multi-step monitoring tool for website elements. Learn how to create customized scripts to mimic user actions such as visiting a site, filling out forms, and clicking buttons. We walk through a step-by-step guide on setting up a Transaction Check to monitor a login process, including navigating to a URL, validating HTTP status codes, and using browser developer tools to configure field entries. Discover different monitoring intervals and tips for organizing your checks with tags and location settings.

View Video

uptime

Monitoring

Read more about Transaction Check Basics in less than 3 minutes

Cloudflare was down again: Here's what happened.

Dec 5, 2025 By Andy Libby In StatusGator

On December 5, 2025, the internet faced another major disruption – the second significant Cloudflare-related outage in just a few weeks. A similar widespread incident occurred on November 18, which we covered in detail in our post The internet broke again – StatusGator can help. Today’s outage reinforces how quickly issues within core internet infrastructure can ripple outward and impact thousands of services simultaneously.

Read Post

StatusGator

Read more about Cloudflare was down again: Here's what happened.

What Services Are Not Downdetector Alternatives - And Why StatusGator Actually Is

Dec 5, 2025 By Colin Bartlett In StatusGator

Search for Downdetector alternatives on Google, ask ChatGPT or any AI assistant, and you’ll usually get a list of tools like Datadog, Site24x7, New Relic, Atera, and other monitoring platforms. There’s just one problem: The AI-generated answers continue to lump these monitoring tools together, creating confusion for IT teams and muddying the category. This article exists to set the record straight.

Read Post

StatusGator

Read more about What Services Are Not Downdetector Alternatives - And Why StatusGator Actually Is

Microsoft SCOM 2025 UR1

Dec 5, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

System Center Operations Manager (SCOM) continues to evolve to meet modern infrastructure needs. With SCOM 2025, released in November 2024, Microsoft introduced major updates for security, compatibility, and platform support. Now, Update Rollup 1 (UR1) builds on that foundation with critical fixes and enhancements.

Read Post

NiCE IT Mgmt

Read more about Microsoft SCOM 2025 UR1

Towards a more resilient StatusGator

Dec 5, 2025 By Colin Bartlett In StatusGator

Between October 20 and December 5, 2025, a rapid succession of major outages across multiple cloud providers disrupted large portions of the internet. Each of these events affected StatusGator in different ways. After each incident, we implemented improvements to strengthen our reliability. This post summarizes the impact of each outage, the changes made, and the architectural work now underway to ensure StatusGator remains available during the moments when it is needed most.

Read Post

StatusGator

Read more about Towards a more resilient StatusGator

Which Observability Tool Helps with Visibility Without Overspend

Dec 5, 2025 By Anjali Udasi In Last9

If you’re trying to control observability spend without cutting visibility, the platforms that usually offer the best cost balance at enterprise scale are Last9, Grafana Cloud, Elastic, and Chronosphere — depending on the shape of your telemetry and the level of operational ownership you want.

Read Post

Last9

Read more about Which Observability Tool Helps with Visibility Without Overspend

Rollbar + Zapier AI: Automatically Generate Clear, Actionable Jira Tickets

Dec 5, 2025 By Rollbar In Rollbar

How do you turn raw error payloads into clean, meaningful ticket summaries without touching a line of code? Engineering teams rely on fast, accurate error context to resolve issues efficiently. Rollbar does a great job capturing rich payload data at the moment an error occurs, but getting that data into your issue-tracking workflow can still require manual triage—especially if you want clean, human-readable summaries in Jira.

View Video

Rollbar

Read more about Rollbar + Zapier AI: Automatically Generate Clear, Actionable Jira Tickets

Apache Kafka vs. Apache ActiveMQ: Deciding the Right Open-Source Platform for Your Use Case

Dec 5, 2025 By meshIQ In meshIQ

Learn the key differences between Apache Kafka and Apache ActiveMQ — from messaging models to performance, scalability and use cases — and see how meshIQ improves observability across both platforms.

Read Post

meshIQ

Read more about Apache Kafka vs. Apache ActiveMQ: Deciding the Right Open-Source Platform for Your Use Case

Resilient IBM MQ in Hybrid Cloud: Choosing the Right HA and DR Strategy

Dec 5, 2025 By meshIQ In meshIQ

Learn how to build a resilient IBM MQ architecture for hybrid cloud. This post breaks down HA vs. DR, explains RTO/RPO expectations, explores Native HA and cross-region replication, and shows how meshIQ adds essential visibility and control.

Read Post

meshIQ

Read more about Resilient IBM MQ in Hybrid Cloud: Choosing the Right HA and DR Strategy

Rollbar + Zapier AI: Automatically Generate Clear, Actionable Jira Tickets

Dec 5, 2025 By Rollbar In Rollbar

Read Post

Rollbar

Read more about Rollbar + Zapier AI: Automatically Generate Clear, Actionable Jira Tickets

AI Agents Need Structured Telemetry. Are You Preparing? #telemetry #ai

Dec 5, 2025 By Cribl In Cribl

Clint Sharp breaks down the shift from traditional observability to AI ready telemetry. Agents need well formed fields, consistent schemas, and predictable data models. If your environment is full of unstructured logs, agents will give inconsistent answers. The work starts now so your AI future can actually deliver value later.

View Video

Cribl

Read more about AI Agents Need Structured Telemetry. Are You Preparing? #telemetry #ai

Browser Monitoring Software: A Complete Buyer's Guide for Modern Web Applications

Dec 5, 2025 By Dotcom-Monitor In Dotcom-Monitor

Modern web applications rely on complex front-end frameworks, APIs, and third-party services to deliver seamless user experiences. Even minor performance issues—slow load times, broken workflows, or browser-specific errors—can lead to lost conversions, frustrated users, and reputational damage. Browser monitoring software provides IT teams, developers, and business stakeholders with visibility into application performance from the end-user perspective.

Read Post

Dotcom-Monitor

Read more about Browser Monitoring Software: A Complete Buyer's Guide for Modern Web Applications

AI Is Growing Your Data Faster Than Your Budget #telemetry #ai

Dec 5, 2025 By Cribl In Cribl

Clint Sharp explains why data is growing at a 30% CAGR while budgets stay flat. Teams are already running infrastructure at 80 to 90% capacity, and AI agents multiply query volume by ten or fifty. What got you to 2025 will not get you to 2035. You need a new approach to handle AI scale without blowing up cost.

View Video

Cribl

Read more about AI Is Growing Your Data Faster Than Your Budget #telemetry #ai

Monitoring Client-Side Routing Frameworks: SPA, CSR & Hybrid

Dec 5, 2025 By Dotcom-Monitor In Dotcom-Monitor

Modern web applications have shifted their center of gravity. The page is no longer the system— the runtime is. Frameworks like React, Angular, Vue, Next.js, SvelteKit, Remix, and Nuxt treat HTML as a bootloader, and the real application emerges only after hydration, routing, data fetching, and continual re-rendering. What users experience depends entirely on JavaScript execution, not static markup. Teams usually discover this shift when the UI appears to load but nothing works.

Read Post

Dotcom-Monitor

Read more about Monitoring Client-Side Routing Frameworks: SPA, CSR & Hybrid

Intro to Group Check

Dec 5, 2025 By Uptime Website Monitoring In uptime

Learn how to set up and use the Group Check in Uptime.com to monitor multiple services efficiently.

View Video

uptime

Monitoring

Read more about Intro to Group Check

Making Sense of Complex Data in Observability Tools

Dec 5, 2025 By Marta Barnych, Robert Rochon In Selector

Metrics, analytics, measurements, and parameters – can we truly see these abstractions? Data visualization helps us do just that, bridging the gap between raw information and human comprehension. Visualizing data is like rafting down a river – dynamic, unpredictable, and full of discoveries along the way. In this guide, we’ll explore how to craft visualizations that inform, engage, and inspire. So, grab your paddle and hop aboard!

Read Post

Selector

Read more about Making Sense of Complex Data in Observability Tools

Splunk MCP Server Troubleshooting Tips

Dec 5, 2025 By Splunk In Splunk

In this video, we'll go over 4 of the most common scenarios you might encounter when trying to setup your Splunk MCP server and provide you with troubleshooting tips to help you get you up and running.

View Video

Splunk

Read more about Splunk MCP Server Troubleshooting Tips

Visualising Sentry analytics with SquaredUp

Dec 5, 2025 By Squared Up In Squared Up

Sentry is a mature observability product with SDKs supporting nearly every major programming language. It has expert knowledge of each coding stack and is therefore capable of offering rich insights with a minimum of initialisation required by the developer. You don’t need to set up OpenTelemetry collectors or wrestle with endpoint configurations; simply drop the SDK initialisation into your application start-up process and telemetry begins flowing into the Sentry backend.

Read Post

Squared Up

Read more about Visualising Sentry analytics with SquaredUp

New Vehicle Monitoring Capabilities In The Works For 2026

Dec 5, 2025 By OpsMatters In OpsMatters

Technology for keeping track of vehicles is advancing and companies are gaining more control and oversight. But the project isn't yet complete. There's still room to improve. In 2026, we expect all sorts of new advancements to take center stage in the business world. These will offer managers new capabilities and allow them to really increase productivity to levels they never imagined.

Read Post

OpsMatters

Read more about New Vehicle Monitoring Capabilities In The Works For 2026

Sponsored Post

IT Ops vs DevOps: Same Goal, Different Mindset

Dec 4, 2025 By Nuno Tomas In isDown

The debate around IT Ops vs DevOps often creates confusion about whether these are competing approaches or complementary ones. While both aim to deliver reliable, efficient technology services, they approach this goal from fundamentally different perspectives. Understanding these differences helps organizations build stronger technology teams and choose the right operational model.

Read Post

isDown

Read more about IT Ops vs DevOps: Same Goal, Different Mindset

AI updates for all Sentry users

Dec 4, 2025 By Lindsay Piper In Sentry

Instead of giving you yet another chatbot, we built AI straight into the parts of Sentry where teams lose time, turning your existing data into instant context — and it’s now available to all Sentry users.

Read Post

Sentry

Read more about AI updates for all Sentry users

Key Metrics Your Browser Monitoring Software Should Track

Dec 4, 2025 By Dotcom-Monitor In Dotcom-Monitor

Modern web applications rely on seamless user experiences, fast load times, and reliable performance across every device and region. Browser monitoring tools make these features possible by tracking how real web browsers interact with your site revealing issues long before users notice them. To ensure your monitoring setup captures everything that matters, here are the five essential metrics every browser monitoring solution must track.

Read Post

Dotcom-Monitor

Read more about Key Metrics Your Browser Monitoring Software Should Track

Why Remote Work Just Works - Hear It From Our Grafanistas

Dec 4, 2025 By Grafana In Grafana

Several Grafanistas talk about their remote work experience at Grafana Labs. Being remote-first enables our team to be based where they feel most productive and to ensure that work and life aren't in competition. And remote-first is *not* remote only. Grafanistas enjoy the opportunity to come together during team offsites or in shared co-working spaces. Connection is important.

View Video

Grafana

Read more about Why Remote Work Just Works - Hear It From Our Grafanistas

Bits AI SRE, our first AI agent, now generally available! #datadog

Dec 4, 2025 By Datadog In Datadog

We introduced Bits AI SRE, our first AI agent, now generally available. Across industries, customers of all sizes are already seeing faster resolution, stronger reliability, and a better on-call experience for their teams.

View Video

Datadog

Read more about Bits AI SRE, our first AI agent, now generally available! #datadog

7 Senior-Level AI Debugging Tools Compared

Dec 4, 2025 By Rollbar In Rollbar

Every dollar spent on engineering is a bet on the future. But look at your engineering team's sprint backlog and you’ll see a non-trivial amount of that capital is spent on repairing the past. For the last ten years, if you asked a VP of Engineering what the solution was, the answer was always the same: better monitoring. Throw more telemetry at the wall. Build a bigger dashboard. Send more alerts at 3 AM. It was the only available tool, so it became the entire thesis.

Read Post

Rollbar

Read more about 7 Senior-Level AI Debugging Tools Compared

Explaining Icinga Director for Practitioners Webinar Recording

Dec 4, 2025 By Icinga In Icinga

Starting from a clean installation, we will guide you through the complete setup process and create a first monitoring configuration together. You will learn how to navigate the Icinga Director interface, discover its main features, and see how automation can simplify your daily work through data imports and synchronization rules. You'll learn: Resources: Some more questions from the FAQ section, we want to answer.

View Video

Icinga

Monitoring

Read more about Explaining Icinga Director for Practitioners Webinar Recording

OTel Updates: Unroll Processor Now in Collector Contrib

Dec 4, 2025 By Anjali Udasi In Last9

Some log sources bundle multiple events into a single record before shipping them. This is common with VPC flow logs, CloudWatch exports, and certain Windows endpoint collectors. While this batching approach is efficient for transport, it creates challenges when you need to filter, search, or correlate individual events. When a log record contains an array of 47 events, your analytics tool sees one entry instead of 47 distinct records.

Read Post

Last9

Read more about OTel Updates: Unroll Processor Now in Collector Contrib

Understanding How a Log Correlation Engine Enables Real-Time Insights

Dec 4, 2025 By Jeff Darrington In Graylog

Tax season is notoriously most people’s least favorite time of year. For people who complete their own tax returns, the process becomes an agonizing one of looking at small pieces of paper, matching numbers to the lines that ask for information, and comparing various inputs. In essence, doing your taxes makes you a correlation engine. Now, imagine taking this tedious process and applying it to the terabytes of data that your environment generates daily.

Read Post

Graylog

Read more about Understanding How a Log Correlation Engine Enables Real-Time Insights

AI for Observability: Honeycomb Canvas & MCP

Dec 4, 2025 By Honeycomb In Honeycomb

See how Honeycomb uses AI in our built-in assistant, Canvas. Then see how your agent can use Honeycomb with our MCP. Both can get from a vague question to the root cause of a latency spike in a few minutes, and the agent with MCP can even fix it!

View Video

Honeycomb

Read more about AI for Observability: Honeycomb Canvas & MCP

Send OpenTelemetry traces and logs from Cloudflare Workers to Grafana Cloud

Dec 4, 2025 By Ishan Jain In Grafana

Cloudflare Workers is a developer platform for deploying serverless functions, frontends, containers, and databases to a global network, spanning 330+ cities around the world. However, as your application scales, it becomes crucial to have the right observability tools to investigate issues, monitor performance, and get alerts when issues arise. Last month, Cloudflare Workers announced support for exporting OpenTelemetry logs and traces, letting you send this data directly to Grafana Cloud.

Read Post

Grafana

Read more about Send OpenTelemetry traces and logs from Cloudflare Workers to Grafana Cloud

Use Database Monitoring in Splunk Observability Cloud to Identify and Resolve Slow Queries

Dec 4, 2025 By Splunk In Splunk

In this video, I introduce Database Monitoring in Splunk Observability Cloud. I'll demonstrate how to spot and resolve slow queries by leveraging rich metrics and correlating database performance directly with traces in Splunk Observability Cloud APM. TOC.

View Video

Splunk

Read more about Use Database Monitoring in Splunk Observability Cloud to Identify and Resolve Slow Queries

A Week of Insight, Connection, and Innovation at Gartner IT Symposium/Xpo in Orlando

Dec 4, 2025 By ScienceLogic In ScienceLogic

Gartner IT Symposium/Xpo is always a standout experience for ScienceLogic, and this year’s event in Orlando was no exception. The event brought together seasoned IT leaders, analysts, and solution providers, creating a dynamic hub for meaningful conversations, hands-on demos, and translating future-driven insights into action. More than being honored to attend, ScienceLogic thrives on engaging with IT leaders on the show floor, in sessions, and throughout the event.

Read Post

ScienceLogic

Read more about A Week of Insight, Connection, and Innovation at Gartner IT Symposium/Xpo in Orlando

Automate infrastructure operations with Datadog Infrastructure Management

Dec 4, 2025 By Jessie Wu In Datadog

Many organizations struggle to track how their cloud infrastructure changes over time. Modern environments span tens of thousands of resources across hundreds of accounts and multiple clouds. Application teams add new services and regions at a rapid pace, increasing the number and variety of resources that need to be managed. These shifts can cause infrastructure configurations to drift from a well-architected state, increasing the risk of service reliability issues and unexpected cloud spend.

Read Post

Datadog

Read more about Automate infrastructure operations with Datadog Infrastructure Management

Sponsored Post

Adding a CDN to a load balancer (for a much faster website)

Dec 3, 2025 By Denny Mate In Raygun

Here at Raygun, we like to go fast. Really fast. That's what we do! When we see something that isn't zooming, we try to figure out how to make it go faster. So today, we're answering a simple (and relevant) question; how do we make our public site, raygun.com, much, much faster? The answer, at first glance, is simple-we build it into a Content Delivery Network (CDN). But what if you have a load balancer serving your website, and you don't want to rebuild everything to serve from a CDN? Well, that's more complicated. Let's start by describing the issue.

Read Post

Raygun

Read more about Adding a CDN to a load balancer (for a much faster website)

Digitate and BMC Unveil a Multi-Product Solution on the AWS Marketplace

Dec 3, 2025 By Digitate In Digitate

Joint solution announced at AWS re:Invent 2025 as an official launch partner.

Read Post

Digitate

Read more about Digitate and BMC Unveil a Multi-Product Solution on the AWS Marketplace

Shopify Cyber Monday outage - December 1, 2025

Dec 3, 2025 By Colin Bartlett In StatusGator

On December 1, 2025, Cyber Monday, the biggest online shopping day of the year, Shopify suffered a widespread outage that left many merchants unable to access their stores or process orders. At a time when every minute of uptime translates directly into revenue, the disruption caused immediate concern across the ecommerce community. StatusGator detected the issue within minutes, sending an Early Warning Signal 10 minutes before Shopify published its official acknowledgement.

Read Post

StatusGator

Read more about Shopify Cyber Monday outage - December 1, 2025

How Browser Monitoring Tools Improve Application Reliability and End-User Experience

Dec 3, 2025 By Dotcom-Monitor In Dotcom-Monitor

Browser monitoring tools, also known as Real User Monitoring (RUM) solutions, enhance application reliability and end-user experience by providing detailed, real-time visibility into how users interact with web applications. These tools track key performance metrics, identify front-end errors, and help development and DevOps teams detect and resolve issues that directly impact users before they escalate.

Read Post

Dotcom-Monitor

Read more about How Browser Monitoring Tools Improve Application Reliability and End-User Experience

Underrated Linux Kernel Feature: Delay Accounting

Dec 3, 2025 By Coroot In Coroot

The traditional metrics you’re probably using to determine CPU time shortages lack the precision of kernel-level insights.

View Video

Coroot

Read more about Underrated Linux Kernel Feature: Delay Accounting

What the Octopus Can Teach Us About AI (w/ Steve Wunker)

Dec 3, 2025 By Nexthink In Nexthink

Tim and Tom sit down with Steve Wunker — Managing Director of New Markets Advisors, author, and early pioneer of the smartphone — to explore the big ideas behind his latest book, AI and the Octopus Organization. Steve breaks down why AI shouldn’t just “bolt onto” old processes, how distributed intelligence reshapes the firm, and what leaders can learn from one of nature’s most adaptable creatures. From organizational plasticity to the changing role of middle managers, Steve offers a pragmatic roadmap for thriving amid rapid AI-driven transformation.

View Video

Nexthink

Read more about What the Octopus Can Teach Us About AI (w/ Steve Wunker)

Shift Happens: How to Make Your ITSM Incidentally Awesome

Dec 3, 2025 By solarwindsinc In SolarWinds

A modern service desk goes far beyond basic ticketing, serving as the central engine for IT operations. This THWACKcamp session from SolarWinds Day reveals how to streamline and standardize ITSM workflows, transforming the service desk into a strategic asset that eliminates administrative headaches. SolarWinds Sr. PMM Lauren Okruch and THWACK MVP Jeremy Mayfield, Director of IT at National Sugar Marketing, explore how modern service desks go beyond ticketing to become the hub of IT operations.

View Video

SolarWinds

Read more about Shift Happens: How to Make Your ITSM Incidentally Awesome

Drowning in Alert Fatigue? How to Regain Control of Your Monitoring

Dec 3, 2025 By Simona Omidkar In Icinga

If you’ve ever muted your phone during a maintenance window, only to miss a real outage an hour later, you’re not alone. Sysadmins on Reddit and beyond often describe feeling like they’re drowning in alerts: So many notifications that the important ones lose their meaning. This is alert fatigue, sometimes called notification fatigue or incident noise, and it’s one of the most common challenges in modern, growing IT operations.

Read Post

Icinga

Read more about Drowning in Alert Fatigue? How to Regain Control of Your Monitoring

Getting Started: OpenTelemetry and Sentry

Dec 3, 2025 By Sentry In Sentry

If you're using OpenTelemetry to collect traces or logs, shipping that data into Sentry is easy with the new OTLP endpoints. We're going live to take you through how you can get started using OpenTelemetry with Sentry.

View Video

Sentry

Read more about Getting Started: OpenTelemetry and Sentry

Cribl and Cloudflare give you full network visibility with real time telemetry

Dec 3, 2025 By Cribl In Cribl

Glenn Block explains how the new Cloudflare source and R2 destination in Cribl Stream lets you ingest WAF, DNS, and Zero Trust logs for full visibility and real time intelligence. Better security, better performance, and lower cost for modern IT and security teams.

View Video

Cribl

Read more about Cribl and Cloudflare give you full network visibility with real time telemetry

The Performance Revolution in JavaScript Tooling

Dec 3, 2025 By Damilola Olatunji In AppSignal

Over the last couple of years, we've witnessed a remarkable shift in the JavaScript ecosystem, as many popular developer tools have been rewritten in systems programming languages like Rust, Go, and Zig. This transition has delivered dramatic performance improvements and other innovations that are reshaping how developers build JavaScript-backed applications.

Read Post

AppSignal

Read more about The Performance Revolution in JavaScript Tooling

5 Network Issues That Affect Remote Offices (Not HQ)

Dec 3, 2025 By Andrii Kernitskyi In Obkio

Your headquarters runs flawlessly. Zero network complaints. But your remote offices? Constant connectivity problems, dropped video calls, and frustrated employees filing help desk tickets you can't solve. Remote offices experience 3x more network issues than headquarters, yet most of the IT teams have zero visibility into what's actually failing.

Read Post

Obkio

Read more about 5 Network Issues That Affect Remote Offices (Not HQ)

kubectl logs Command Reference and Documentation

Dec 3, 2025 By Alexandr Bandurchin In Uptrace

The kubectl logs command retrieves container logs from Kubernetes pods. It supports real-time log streaming with -f, time-based filtering with --since, viewing previous container instances with --previous, and accessing logs from specific containers in multi-container pods using -c.

Read Post

Uptrace

Read more about kubectl logs Command Reference and Documentation

What's new in the Grafana Image Renderer: higher-quality results, security enhancements, and more

Dec 3, 2025 By Mariell Hoversholm In Grafana

Whether it’s for an email or that upcoming presentation, many Grafana users like to share their favorite dashboards or panels outside of Grafana itself. The Grafana Image Renderer is a backend service for Grafana that helps you do just that by rendering panels and dashboards as images, such as PNGs and PDFs, via a headless browser. It’s commonly used to support Grafana features like exporting dashboards, generating images for alert notifications, and creating PDF reports.

Read Post

Grafana

Read more about What's new in the Grafana Image Renderer: higher-quality results, security enhancements, and more

You've Found the Waste In Your Network Operations. Now What?

Dec 3, 2025 By Yann Guernion In Broadcom

In a previous blog, we looked at your network operations through the lens of lean principles. We exposed the seven wastes that quietly drain your budget and burn out your teams. This constant cycle of reactive firefighting comes with a steep price. We outlined a concept in quality management known as the Cost of Poor Quality (COPQ), the total financial impact of wasted engineering hours, lost user productivity, and business risk.

Read Post

Broadcom

Read more about You've Found the Waste In Your Network Operations. Now What?

9 Third-Party Risk Monitoring Tools That Actually Cut Vendor Assessment Time

Dec 3, 2025 By OpsMatters In OpsMatters

Nearly one in three cyber breaches now start with a supplier, McKinsey found in 2024. A single vendor review cycle often spans 3 to 5 weeks due to manual evidence chasing, according to Forrester's 2024 State of Third-Party Risk Report. And a May 2025 Gartner brief warns that this "perfect storm" of attacks, supply-chain shocks and new regulations is forcing boards to modernize third-party risk-fast.

Read Post

OpsMatters

Read more about 9 Third-Party Risk Monitoring Tools That Actually Cut Vendor Assessment Time

Using the Downsampling Plugin in InfluxDB 3

Dec 2, 2025 By Allyson Boate In InfluxData

Modern systems generate huge volumes of time series data. Advances in hardware and edge instrumentation enable sensors and applications to capture new values every second—or faster—which makes high-frequency measurement easy and affordable. When applied effectively, this steady flow of data reveals early warning signs, highlights subtle performance shifts, and helps teams understand how systems behave in real-time.

Read Post

InfluxData

Read more about Using the Downsampling Plugin in InfluxDB 3

Part 1: What If Data Wasn't Just the Fuel for AI but the Foundation of Everything It Knows?

Dec 2, 2025 By ScienceLogic In ScienceLogic

Every breakthrough begins with a question. What if we looked beyond today’s tools, buzzwords, and hype and examined the design principles shaping tomorrow’s intelligent enterprises? The What If series explores those inflection points: moments where technology meets human judgment, where automation meets accountability, and where AI begins to resemble something more like understanding than output.

Read Post

ScienceLogic

Read more about Part 1: What If Data Wasn't Just the Fuel for AI but the Foundation of Everything It Knows?

Better Together: Building the Self-Healing Enterprise

Dec 2, 2025 By Christina Kosmowski,and Mehdi Daoudi In LogicMonitor

When technology slows, everything does. Guests wait to check in. Travelers queue at kiosks. Shoppers refresh the page, hoping the payment goes through. Every second of downtime costs companies millions and frustrates millions more. LogicMonitor and Catchpoint have been solving that problem from different sides: one focused on the systems and infrastructure that keep businesses running, the other on the experiences and performance that users actually feel.

Read Post

LogicMonitor

Read more about Better Together: Building the Self-Healing Enterprise

Heroku Monitoring Best Practices (2026) | MetricFire

Dec 2, 2025 By Elliot Langston In MetricFire

Looking for reliable Heroku monitoring in 2026? Start with the metrics that matter, pair Heroku’s built-ins with the right add-ons, and add alerts that catch issues before users do.

Read Post

MetricFire

Read more about Heroku Monitoring Best Practices (2026) | MetricFire

Observability in the AI age: Datadog's approach

Dec 2, 2025 By Yanbing Li In Datadog

Ten years ago, Datadog was a single-product company focused on breaking down the silos between dev and ops. As the shift towards the cloud accelerated and organizations transitioned to the new DevOps model, we set out to develop an observability platform that would enable these teams to safely scale faster and answer the essential questions about their services: are they available, secure, compliant, performant, and cost-efficient?

Read Post

Datadog

Read more about Observability in the AI age: Datadog's approach

A New Chapter: LogicMonitor + Catchpoint - A Personal Note from Mehdi

Dec 2, 2025 By Mehdi Daoudi In Catchpoint

In 2008, I was sitting in my garage office with a simple but stubborn idea: the Internet deserved better. End users deserved better. Companies needed a way to truly understand what their customers were experiencing, not just what their servers were reporting. Digital Experience Monitoring wasn’t a category yet. But the need was unmistakable. That idea didn’t come from theory or ambition. It came from lived experiences.

Read Post

Catchpoint

Read more about A New Chapter: LogicMonitor + Catchpoint - A Personal Note from Mehdi

Optimize Kubernetes cluster cost with Datadog Cluster Autoscaler

Dec 2, 2025 By Allie Rittman In Datadog

Running Kubernetes at scale almost always means paying for more compute than you need. To protect reliability, platform and application teams typically overprovision nodes early in development and keep scaling up as they add features and workloads. They are often reluctant to move to smaller or different instance types without a clear picture of how those changes will affect performance or availability. The result is a fleet of underutilized nodes that silently inflate your cloud bill.

Read Post

Datadog

Read more about Optimize Kubernetes cluster cost with Datadog Cluster Autoscaler

Top Browser Monitoring Features Every DevOps Team Should Prioritize in 2026

Dec 2, 2025 By Dotcom-Monitor In Dotcom-Monitor

In 2026, digital performance is more critical than ever. Users expect web applications to load instantly, respond flawlessly, and support complex interactions without delay. For DevOps teams, this means browser monitoring is no longer optional—it’s a foundational capability for ensuring availability, speed, and reliability across modern web experiences.

Read Post

Dotcom-Monitor

Read more about Top Browser Monitoring Features Every DevOps Team Should Prioritize in 2026

Patterns for Deploying OpenTelemetry Collector at Scale

Dec 2, 2025 By Elizabeth Mathew In SigNoz

So, you've embraced OpenTelemetry, and it's been great. Pat, Pat. That single, vendor-neutral pipeline for your traces, metrics, and logs felt like the future. But now, the future is getting bigger. That simple OTel Collector configuration that worked perfectly for a few services is starting to show its limits as you scale. The data volume is climbing, reliability is becoming a concern, and you're wondering if that single collector instance is now a bottleneck waiting to happen.

Read Post

SigNoz

Read more about Patterns for Deploying OpenTelemetry Collector at Scale

Datadog Bits AI SRE: Your new teammate for on-call shifts

Dec 2, 2025 By Datadog In Datadog

Bits AI SRE is an always-on SRE agent built to handle complex troubleshooting and late-night alerts. Developed against thousands of real-world incidents and powered by Datadog’s platform, Bits AI SRE analyzes your entire stack, tests hypotheses, and identifies root causes in minutes. Resolve faster, get back to sleep sooner, and give your on-call team the confidence and capacity they need.

View Video

Datadog

Read more about Datadog Bits AI SRE: Your new teammate for on-call shifts

Optimize Your Oracle Cloud (OCI) Spend with Datadog Cloud Cost Management

Dec 2, 2025 By Datadog In Datadog

Support for Oracle Cloud Infrastructure (OCI) is now live in Datadog Cloud Cost Management. In this short demo, you’ll learn how to: Get granular visibility into OCI cost and usage—by service, compartment, tag, and resource tier. Uncover savings opportunities by combining cost data with observability metrics like CPU, memory, and storage utilization. Set up anomaly monitors and budgets to avoid cost overruns—especially for high-risk workloads like AI and GPU training.

View Video

Datadog

Read more about Optimize Your Oracle Cloud (OCI) Spend with Datadog Cloud Cost Management

Contextual, in-product guidance for every Grafana user: A closer look at Interactive Learning

Dec 2, 2025 By Tom Glenn In Grafana

As developer advocates at Grafana Labs, we’re always looking for new ways to help our users better understand and learn observability. You might remember our previous project that brought learning to life through an adventure-style game, and now we’re really excited to share something else we’ve been working on: Interactive Learning, a new way to get the technical help you need directly in Grafana.

Read Post

Grafana

Read more about Contextual, in-product guidance for every Grafana user: A closer look at Interactive Learning

New Feature: Filter HTTP Pings by Keywords

Dec 2, 2025 By Pēteris Caune In Healthchecks

Healthchecks.io can now classify HTTP pings from clients as start, success, or failure signals not only by URL suffixes (no suffix, /start, /fail, /{exit-status}) but also by looking for specific keywords or phrases in the HTTP request body. The content filtering feature was already available for email pings, and now it has been extended to HTTP pings as well.

Read Post

Healthchecks

Read more about New Feature: Filter HTTP Pings by Keywords

Using SigNoz MCP Server & Claude to find root cause of Alerts

Dec 2, 2025 By SigNoz - Open Source Observability Platform In SigNoz

Using SigNoz MCP Server & Claude to find root cause of Alerts.

View Video

SigNoz

Read more about Using SigNoz MCP Server & Claude to find root cause of Alerts

Effective Site Safety Measures That Reduce Asset Loss And Damage

Dec 2, 2025 By OpsMatters In OpsMatters

When managing construction sites or industrial facilities, implementing effective safety measures is paramount for reducing asset loss and damage. The nature of these work environments makes them vulnerable to various risks ranging from accidents to theft.

Read Post

OpsMatters

Read more about Effective Site Safety Measures That Reduce Asset Loss And Damage

European enterprises prioritise governance in AI deployments, as North America accelerates towards full autonomy

Dec 1, 2025 By Digitate In Digitate

Digitate report reveals differing approaches to AI deployment between Europe and North America, but ROI remains consistent. Europe leading on governance while NA organisations show faster progress towards autonomous operations.

Read Post

Digitate

Read more about European enterprises prioritise governance in AI deployments, as North America accelerates towards full autonomy

November 2025 - Early Warning Signals

Dec 1, 2025 By Colin Bartlett In StatusGator

November brought a steady flow of service disruptions across productivity, finance, developer tools, and major consumer platforms. Two incidents stood out as the month’s most significant: a major Google Workspace outage on November 12 affecting Docs and Sheets globally, and a widespread Cloudflare issue on November 18 that caused cascading failures across multiple services.

Read Post

StatusGator

Read more about November 2025 - Early Warning Signals

Introducing our new service monitor APIs

Dec 1, 2025 By Valeria Kurolapova In StatusGator

We’re pleased to announce new enhancements to the StatusGator API platform that make it easier to automate how you monitor third-party services. The new Service Search, Create Service Monitor, and Update Service Monitor endpoints give developers more control over how monitors are created, labeled, and maintained across projects and environments. These APIs are designed for teams that integrate StatusGator into their deployment processes, internal tooling, or infrastructure automation.

Read Post

StatusGator

Read more about Introducing our new service monitor APIs

New roadmap & feature request hub

Dec 1, 2025 By Valeria Kurolapova In StatusGator

We’re excited to announce that StatusGator has officially moved to a new platform for collecting feature requests, organizing our roadmap, and keeping you updated on what we’re building. This new system makes it easier than ever to share ideas, vote on improvements, and follow the progress of the features that matter most to you.

Read Post

StatusGator

Read more about New roadmap & feature request hub

Incident IQ: Outage announcement bar

Dec 1, 2025 By Valeria Kurolapova In StatusGator

Our Incident IQ integration just got better. Meet the Outage Announcement Bar, a simple way to surface live outage details inside Incident IQ. This new feature makes it even easier for users, support teams, and administrators to stay aware of service disruptions the moment they happen. This update builds on our existing Incident IQ integration, which already syncs outage reports from your StatusGator status page into Incident IQ.

Read Post

StatusGator

Read more about Incident IQ: Outage announcement bar

Why AI Will Push #Telemetry Budgets to the Breaking Point in 2026

Dec 1, 2025 By Cribl In Cribl

Telemetry growth is about to hit a new level in 2026. Nick Heudecker from Cribl walks through our new predictions report and explains why observability costs are set to surge again, with more than a third of enterprises spending at least 15 % of their IT budgets on telemetry alone. He also shares how agentic AI adds new risk to the data pipeline, why most AI workloads will struggle to scale, and how platform shifts and market forces will reshape the data landscape.

View Video

Cribl

Read more about Why AI Will Push #Telemetry Budgets to the Breaking Point in 2026

#AI Powered Data Protection Inside Cribl Guard

Dec 1, 2025 By Cribl In Cribl

Cribl Guard uses an always running AI agent to spot sensitive data as it moves through your environment and recommend the right protections in real time. In this demo, you will see how the agent samples live events, identifies patterns like credentials and credit cards, and turns them into one click fixes that keep your destinations safe. Faster detection, smarter rule recommendations, and instant mitigation. This is what modern data protection looks like.

View Video

Cribl

Read more about #AI Powered Data Protection Inside Cribl Guard

New agents in the Dojo: Expanded Sumo Logic Dojo AI

Dec 1, 2025 By Margaret Selid In Sumo Logic

Back in September, we unveiled Sumo Logic Dojo AI, our agentic AI platform built to power intelligent security operations and incident response. With that launch, we introduced Mobot, our conversational interface, as well as our first agents designed to help automate routine tasks, streamline investigations, and give security teams the freedom and ability to focus on analyzing the highest value security issues facing their organization. Today, we’re excited to share the latest additions to Dojo AI.

Read Post

Sumo Logic

Read more about New agents in the Dojo: Expanded Sumo Logic Dojo AI

Ep 20: re:Invent FOMO? Dojo AI demo

Dec 1, 2025 By Sumo Logic, Inc. In Sumo Logic

Not heading to re:Invent this week? Don't worry—we've got you covered. In this episode, we welcome Architect Solutions Engineer, Jake Lee, to preview the exciting new Sumo Logic tools we are showcasing in Vegas. Our new SOC analyst agent acts as an AI partner that instantly assesses incident severity and recommends next steps—no more drowning in alerts. The MCP server breaks down barriers by letting you query Sumo Logic from Slack or integrate security insights directly into your IDE.

View Video

Sumo Logic

Read more about Ep 20: re:Invent FOMO? Dojo AI demo

Design as Infrastructure

Dec 1, 2025 By Sol Escalada In Honeycomb

SaaS products that are built for engineers power critical workflows, yet their designs are often afterthoughts. SaaS products often assume that technical audiences will figure out their way through a complex experience, or just forgive them for the paper cuts on the way. A foundational design system can be perceived as a layer of polish rather than an infrastructure investment, especially in the early stages of a startup.

Read Post

Honeycomb

Read more about Design as Infrastructure

Monitoring Azure Metrics to Protect Uptime And Stop Threats Early

Dec 1, 2025 By LogicMonitor In LogicMonitor

This is the fifth blog in our Azure Monitoring series, and we’re focusing on what’s most critical: keeping your environment secure and always available. Performance and cost mean nothing if your services go offline or your data is compromised. In this post, we’ll highlight the Azure metrics that help CloudOps teams detect threats early, build resilience into their stack, and stay ahead of outages before they impact users or compliance. Missed our earlier posts? Catch up.

Read Post

LogicMonitor

Read more about Monitoring Azure Metrics to Protect Uptime And Stop Threats Early

Monitor Everything is an Anti-Pattern!

Dec 1, 2025 By Costa Tsaousis In netdata

Bullshit and nonsense. But let’s take it from the beginning. The industry’s story goes something like this: Then, in the same breath: You see the contradiction already, right? The same industry that tells you “collect less, simplify, trust the experts” is also the industry where: This isn’t an observability strategy. It’s observability by hindsight. Right. Good. Now we’re having fun.

Read Post

netdata

Read more about Monitor Everything is an Anti-Pattern!

Here's What a Network Needs After a Cloud Migration

Dec 1, 2025 By Kevin Dooley In Auvik

By now, most organizations have realized the benefits of moving some, most, or all of their business applications to the cloud. The cloud typically offers better security and performance, at a lower price, than housing resources on-premises. You may have helped them in that migration or you may have been hired after it was complete. Either way, a client with cloud hosting has different network requirements than one whose infrastructure is primarily on-premises.

Read Post

Auvik

Read more about Here's What a Network Needs After a Cloud Migration

Configuring an Internet Connection for a Cloud-Hosted Environment

Dec 1, 2025 By Kevin Dooley In Auvik

Part 2 in our series on Here’s What a Network Needs After a Cloud Migration. Part 1 looked at how to redesign the LAN. When a company’s application infrastructure moves to the cloud, a reliable Internet connection becomes mandatory. Hiccups in Internet service that might have been an inconvenience when apps were in-house now grind the business to a halt. Unfortunately, the Internet link happens to be the single least reliable element in an IT infrastructure.

Read Post

Auvik

Read more about Configuring an Internet Connection for a Cloud-Hosted Environment

Prometheus in the Cloud: Why Managed Solutions Win

Dec 1, 2025 By Lionel Porcheron In Bleemeo

Prometheus has become the gold standard for monitoring cloud-native applications. But running Prometheus at scale comes with significant operational overhead. Let’s explore why a managed Prometheus solution might be the right choice for your team.

Read Post

Bleemeo

Read more about Prometheus in the Cloud: Why Managed Solutions Win

Managing User Access & Authentication in a Cloud-Hosted Environment

Dec 1, 2025 By Kevin Dooley In Auvik

This is the third and final instalment in a series on Here’s What a Network Needs After a Cloud Migration. Part 1 looked at how to redesign the LAN. Part 2 outlined strategies for the Internet connection. One of the things that becomes more important in a cloud-based application environment is managing user access and authentication.

Read Post

Auvik

Read more about Managing User Access & Authentication in a Cloud-Hosted Environment

Stop the Insanity! Quit Doing These 7 Manual Network Management Tasks

Dec 1, 2025 By Alex Hoff In Auvik

Active network infrastructure management is a key element of any managed service offering. Traditionally, network management has involved a lot of tedious manual work, making it expensive and very hard to scale. And that’s why many MSPs have shied away from actively managing the network. But not managing network infrastructure at all is a risk to your business. Your clients likely expect you’re looking after the network whether you’ve promised it or not.

Read Post

Auvik

Read more about Stop the Insanity! Quit Doing These 7 Manual Network Management Tasks

How Browser Monitoring Tools Enhance Application Reliability & User Experience

Dec 1, 2025 By Dotcom-Monitor In Dotcom-Monitor

Modern web applications are increasingly complex, with dynamic content, single-page apps (SPAs), APIs, and third-party integrations. For businesses, ensuring application reliability and a seamless end-user experience is critical. Poor performance can lead to customer dissatisfaction, revenue loss, and reputational damage. This is where browser monitoring tools and browser performance monitoring come into play.

Read Post

Dotcom-Monitor

Read more about How Browser Monitoring Tools Enhance Application Reliability & User Experience

How to Fix Cyclic Inheritance Errors in Icinga Director during Object Configuration

Dec 1, 2025 By Ravi Srinivasa In Icinga

Icinga Director is a powerful tool that greatly simplifies the configuration, management, and deployment of monitoring objects in Icinga. It provides a user-friendly interface and automation features that make complex setups easier to maintain. Occasionally, though, you may unintentionally introduce a cyclic inheritance while configuring templates. A typical case occurs when a template imports another template that eventually imports the original one again.

Read Post

Icinga

Read more about How to Fix Cyclic Inheritance Errors in Icinga Director during Object Configuration

9 Monitoring Tools That Deliver AI-Native Anomaly Detection

Dec 1, 2025 By Anjali Udasi In Last9

The observability market has moved beyond manual threshold-setting. Modern platforms use statistical algorithms, machine learning, and causal AI to detect anomalies automatically. Some work immediately after deployment. Others train on your data for better accuracy. Each approach has technical trade-offs worth understanding. This guide compares how nine monitoring solutions handle automated anomaly detection and root cause analysis.

Read Post

Last9

Read more about 9 Monitoring Tools That Deliver AI-Native Anomaly Detection

Grafana Service Center: Simplify Service Reliability in One Place

Dec 1, 2025 By Grafana In Grafana

Grafana Service Center gives engineers and stakeholders a single place to ensure service reliability. In this video, Staff Product Manager Ryan Kehoe walks through how Service Center ties together alerts, SLOs, dashboards, incidents, and metadata for each service. Learn how to centralize reviews, speed up investigations, and improve visibility across your teams—all within Grafana Cloud.

View Video

Grafana

Read more about Grafana Service Center: Simplify Service Reliability in One Place

All Is Calm, All Is Compliant: Staying Audit-Ready Through the Year-End Rush

Dec 1, 2025 By Teneo In Teneo

As the year winds down, I find that most cybersecurity and compliance teams are focused on closing projects, hitting targets, and maybe even planning a well-earned break. But regulators? They don’t take holidays. FCA, PRA, GDPR – they remain vigilant, and so should you. For IT leaders, this season often feels like walking a tightrope: balancing operational demands with the relentless need for compliance.

Read Post

Teneo

Read more about All Is Calm, All Is Compliant: Staying Audit-Ready Through the Year-End Rush

Application monitoring 101: averages lie, percentiles clarify

Dec 1, 2025 By Lance Erickson In Scout

Your app feels slow, users complain… but your dashboard says response time is totally fine! But the real pain is revealed when looking at percentiles. In this post, we’ll break down why the 95th percentile response time metric is useful, plus how to work with it in practice. ‍

Read Post

Scout

Read more about Application monitoring 101: averages lie, percentiles clarify

Agentic Monitive: The principles

Dec 1, 2025 By Lucian Daniliuc In Monitive

Continuing my adventure to have Monitive being run by AI Agents, I had some brainstorming sessions with... well... AI. I was impressed about the maturity of the discussions one can have with ChatGPT 5.1 Pro and Claude Code Opus 4.5.

Read Post

Monitive

Read more about Agentic Monitive: The principles

Honeycomb Frontend Observability - See Everything

Dec 1, 2025 By Honeycomb In Honeycomb

Chapters: In this video we take a tour through Honeycomb's Frontend Observability offerings for Web and Mobile. We see how the launchpads can help spot performance errors, how errors that occur in the frontend can be traced all the way to their cause in other backend services easily with the error investigations feature, and how easy it is to find differences between traces across various devices.

View Video

Honeycomb

Read more about Honeycomb Frontend Observability - See Everything

How To Migrate Away From DogStatsD Using Telegraf

Dec 1, 2025 By Benjamin Pitts In MetricFire

Datadog is a popular monitoring platform, and one of its key components is DogStatsD which is a customized extension of the original open-source StatsD protocol. DogStatsD adds powerful features like tagging, histograms, and distributions, but it also introduces vendor lock-in. This is because DogStatsD metrics follow a specific wire format that many other monitoring platforms do not natively support.

Read Post

MetricFire

Read more about How To Migrate Away From DogStatsD Using Telegraf

How to Write a Cover Letter That Actually Helps You Get the Job

Dec 1, 2025 By Kaylie Boogaerts In Checkly

Cover letters are supposed to help you shine, but most of them blur together into the same polite, forgettable paragraphs. The intention is good (“I want them to notice me!”), but the execution… not so much. So, here’s a simple, honest guide to writing a cover letter that actually works, especially if you’re applying to Checkly. Spoiler: shorter is better. And authenticity in this AI era is better than perfect polished perfection.

Read Post

Checkly

Read more about How to Write a Cover Letter That Actually Helps You Get the Job

Improve service reliability and ops culture with Grafana Cloud Service Center

Dec 1, 2025 By Ryan Kehoe In Grafana

Today’s engineering organizations are built around service ownership. Service owners are accountable for keeping their services reliable, performant, and ready to scale. But no service operates in isolation; every team depends on others, and those dependencies form a complex web that can be hard to see, let alone understand. To truly deliver reliable systems, you need visibility not only into how your own service performs, but also how it affects others.

Read Post

Grafana

Read more about Improve service reliability and ops culture with Grafana Cloud Service Center

Monitor Claude Code adoption in your organization with Datadog's AI Agents Console

Dec 1, 2025 By Ali Al-Rady In Datadog

AI coding assistants are quickly becoming a core part of software engineering workflows, helping developers write, refactor, and review code faster. But without effective monitoring, it can be difficult to know whether these tools are performing reliably and proving useful to engineers. As organizations scale their use of tools like Claude Code, key questions emerge.

Read Post

Datadog

Read more about Monitor Claude Code adoption in your organization with Datadog's AI Agents Console

Our latest updates across the VictoriaMetrics Observability ecosystem

Dec 1, 2025 By Denys Holius In VictoriaMetrics

We’re excited to announce a set of updates across the entire VictoriaMetrics open source products suite — including VictoriaMetrics, VictoriaLogs, VictoriaTraces, the VictoriaMetrics Kubernetes Operator. These improvements bring better performance, stronger security, enhanced metadata visibility, and a smoother experience when running observability at scale.

Read Post

VictoriaMetrics

Read more about Our latest updates across the VictoriaMetrics Observability ecosystem

AI Agent for Business SLA Predictions: Safeguarding Business Continuity with Predictive Intelligence

Dec 1, 2025 By Somdipto Ghosh In Digitate

Modern business functions are based on the promise of smooth and seamless experience, without the need for downtime or long waits for backend processes to finish. For such digital operations, timely execution of business processes—like financial closings, order fulfilment, report generation—is non-negotiable.

Read Post

Digitate

Read more about AI Agent for Business SLA Predictions: Safeguarding Business Continuity with Predictive Intelligence

Accelerate investigations with AI-powered log parsing

Dec 1, 2025 By Usman Khan In Datadog

When debugging production issues, investigating security incidents, or analyzing network traffic, engineers and analysts need not only to find the right logs but to make sense of all the dense, unstructured data generated by different systems. Logs rarely ship neatly laid out in a way that facilitates filtering, faceting, or graphing for every possible scenario. As a result, teams often find themselves writing regular expressions or custom parsers on the fly, which can be error-prone and time-consuming.

Read Post

Datadog

Read more about Accelerate investigations with AI-powered log parsing

Operations | Monitoring | ITSM | DevOps | Cloud