Monthly Archive

Sponsored Post

Fabrix.ai at Cisco Live 2026 Amsterdam

Feb 28, 2026 By Shailesh Manjrekar In Fabrix

This post highlights the biggest Cisco AI Summit takeaways that came up again and again in Cisco Live conversations, and what they mean for teams operating AI in production. If you are following the broader AgentOps movement and the rise of agentic workflows, Fabrix.ai’s point of view is grounded in a core idea: AI agents create value only when they can be operated safely and consistently. A good starting point is here: Fabrix.ai’s approach to agentic.

Read Post

Fabrix

Read more about Fabrix.ai at Cisco Live 2026 Amsterdam

Sponsored Post

What is a Real-Time Data Lake?

Feb 28, 2026 By David Bunting In ChaosSearch

A data lake is a centralized data repository where structured, semi-structured, and unstructured data from a variety of sources can be stored in their raw format. Data lakes help eliminate data silos by acting as a single landing zone for data from multiple sources. But what's the difference between a traditional data lake and a real-time data lake? Some traditional data lakes use batch processing, which involves processing and analyzing a collection of data that has been stored over a specific timeframe. For example, payroll and billing systems that are handled on a weekly or monthly basis might use batch processing.

Read Post

ChaosSearch

Read more about What is a Real-Time Data Lake?

Behind the magic of auto-instrumentation (Grafana OpenTelemetry Community Call)

Feb 28, 2026 By Grafana In Grafana

You add the OpenTelemetry Java agent, restart your app - and like magic, observability appears. But is it really magic? What’s actually enabled by default? What telemetry should you expect to see? What’s missing? And what might you want to tweak, tune, or even turn off?

View Video

Grafana

Read more about Behind the magic of auto-instrumentation (Grafana OpenTelemetry Community Call)

Better status page analytics

Feb 27, 2026 By Valeria Kurolapova In StatusGator

We’ve made a small improvement to Status Page analytics. Previously, analytics were limited to a fixed view of the last 30 days. While helpful, it didn’t always give you the right lens for the question you were trying to answer. Now, you can choose the time range that fits your needs.

Read Post

StatusGator

Read more about Better status page analytics

How Fabrix.ai Agents Ensure Data Privacy & Security

Feb 27, 2026 By Shailesh Manjrekar In Fabrix

As Agentic AI moves into enterprise environments, IT and security leaders face a critical challenge on how to leverage advanced LLMs without exposing sensitive data, intellectual property, or proprietary configurations to the cloud. You cannot build a self-driving, autonomous IT infrastructure if your security team blocks the deployment, and that’s exactly why the Fabrix.ai platform features an Enterprise-Grade LLM Integration architecture anchored by our built-in Data Security layer.

Read Post

Fabrix

Read more about How Fabrix.ai Agents Ensure Data Privacy & Security

IT Cost Optimization Strategy: Eliminating Guesswork with Observability

Feb 27, 2026 By Andy Wojnarek In Galileo

IT organizations are being asked to reduce costs, manage risk, and maintain performance at the same time. Meanwhile, infrastructure complexity continues to grow, and vendor pricing changes are reshaping budget assumptions. Too often, an IT cost optimization strategy is shaped by incomplete data around sizing, licensing, refresh timing, and platform decisions. That uncertainty leads to overprovisioning, budget surprises, and reactive operations. Observability changes that equation.

Read Post

Galileo

Read more about IT Cost Optimization Strategy: Eliminating Guesswork with Observability

Shopify outage on February 15, 2026

Feb 27, 2026 By Colin Bartlett In StatusGator

On February 15, 2026, Shopify experienced a widespread service disruption that impacted merchants and shoppers around the world. While the provider did not acknowledge the issue until 15:36 UTC, StatusGator’s Early Warning Signals detected unusual activity and alerted customers at 15:00 UTC, just minutes after the first outage reports began coming in. This incident highlights the importance of independent, real time monitoring.

Read Post

StatusGator

Read more about Shopify outage on February 15, 2026

SendGrid Status Monitoring: How to Track Email Delivery Outages

Feb 27, 2026 By Nuno Tomas In isDown

When SendGrid goes down, your transactional emails stop reaching customers. Password resets fail. Order confirmations vanish. Support tickets never arrive. By the time you notice, customers are already complaining. For DevOps and SRE teams, checking SendGrid status shouldn't be a manual process. It shouldn't wait until customers report it either. For a team sending 10,000 transactional emails per day, a 15-minute outage means roughly 100 emails that never arrived.

Read Post

isDown

Read more about SendGrid Status Monitoring: How to Track Email Delivery Outages

This Month in Datadog - February 2026

Feb 27, 2026 By Datadog In Datadog

On the first episode of This Month in Datadog in 2026, Jeremy covers how you can protect agentic AI applications with AI Guard, stay up to date and collaborate during incidents with five Incident Management releases, and ship software with confidence using Feature Flags. Later in the episode, Kevin spotlights Datadog Data Observability, which enables you to detect data quality and pipeline issues early.

Read Post

Datadog

Read more about This Month in Datadog - February 2026

Scaling IBM MQ: The Hidden Taxes on Reliability (and How to Break Free)

Feb 27, 2026 By meshIQ In meshIQ

IBM MQ's reliability comes at a hidden price. Scaling introduces complex licensing costs, operational bottlenecks, and compliance risks. Learn four strategies to transform MQ from a costly burden into an efficient, compliant backbone for modern enterprises.

Read Post

meshIQ

Read more about Scaling IBM MQ: The Hidden Taxes on Reliability (and How to Break Free)

Grafana Campfire - Back to Basics - (Grafana Community Call - Feb 2026)

Feb 27, 2026 By Grafana In Grafana

Grafana Campfire Community Calls are back. We are starting with *Back to the Basics.* Even though you heard it so many times, but some of you are either new to or not very experienced with terms such as monitoring, observability, metrics, tracing, profiling etc. The good news is that you're not alone, and we've got this!! This will be a perfect learning opportunity to gain understanding during this live call and ask questions if anything is not clear.

View Video

Grafana

Read more about Grafana Campfire - Back to Basics - (Grafana Community Call - Feb 2026)

The rise of agentic AI in production: Can observability systems run themselves?

Feb 27, 2026 By Grafana Labs Team In Grafana

Sometimes the biggest shifts in technology aren’t about collecting more data — they’re about who (or what) gets to act on it. In this episode of “Grafana’s Big Tent” podcast, host Tom Wilkie, Grafana Labs CTO, is joined by Spiros Xanthos, Founder & CEO of Resolve AI, Manoj Acharya, VP of Engineering for Observability at Grafana Labs, and Cyril Tovena, Principal Engineer on the Grafana Assistant team, to discuss agentic AI in observability.

Read Post

Grafana

Read more about The rise of agentic AI in production: Can observability systems run themselves?

k8s-monitoring-helm Chart Office Hours (February 2026)

Feb 27, 2026 By Grafana In Grafana

In the February edition of the Kubernetes Monitoring Helm chart office hours, we discuss the version 3.8 release, the upcoming 4.0 release and features, and we discuss the upcoming deprecation of the 1.x and 2.0 versions.

View Video

Grafana

Read more about k8s-monitoring-helm Chart Office Hours (February 2026)

From RCA to Autonomous Ops: The Future of AI in Observability | Big Tent S3E7

Feb 27, 2026 By Grafana In Grafana

SREs are famously skeptical of AI — so how do you convince them to trust agents in production? In this episode of Grafana’s Big Tent, Tom Wilkie talks with Spiros Xanthos (Resolve AI), Manoj Acharya (Grafana Labs), and Cyril Tovena (Grafana Assistant team) about agent-first observability. They unpack knowledge graphs, LLM reasoning, autonomous debugging, pricing models, and the “Claude Code moment” for observability. Is autonomous production ops closer than we think?

View Video

Grafana

Read more about From RCA to Autonomous Ops: The Future of AI in Observability | Big Tent S3E7

AI Is Changing How We Monitor Infrastructure

Feb 27, 2026 By Shyam Sreevalsan In netdata

The Netdata Cloud MCP Server is now available — giving AI agents and assistants direct access to your Netdata through a single endpoint at app.netdata.cloud/api/v1/mcp.

Read Post

netdata

Read more about AI Is Changing How We Monitor Infrastructure

Your Data is Whispering and Needs a Human to Listen

Feb 27, 2026 By The Graylog Team In Graylog

If you have ever owned, operated, or supported a piece of technology, you have probably built a dashboard. Maybe it started as a quick chart to answer a simple question, then quietly grew into something more important. Dashboards are often created by the people who know the systems best, the ones who can wire together data sources and click all the right buttons. But those same builders are rarely trained in how humans actually interpret data.

Read Post

Graylog

Read more about Your Data is Whispering and Needs a Human to Listen

Data Observability, AI Guard, Feature Flags, Ambassador program, and more | This Month in Datadog

Feb 27, 2026 By Datadog In Datadog

See how you can ensure trust across the data life cycle in February’s episode of This Month in Datadog. Join us for a spotlight of Datadog Data Observability, which enables you to detect data quality and pipeline issues early, as well as remediate those issues with end-to-end lineage. Plus, we cover: Protecting agentic AI applications from real-time threats with Datadog AI Guard Staying up to date and reducing steps to collaborate with five new Incident Management releases Releasing software with confidence using Datadog Feature Flags.

View Video

Datadog

Read more about Data Observability, AI Guard, Feature Flags, Ambassador program, and more | This Month in Datadog

Why Evidence-Backed RCA in Edwin AI Starts With Logs

Feb 27, 2026 By Margo Poda In LogicMonitor

A step-by-step look at how Edwin AI uses native LogicMonitor logs, topology, and context to turn root cause analysis from alert-driven inference into evidence-backed investigation. Most root cause analysis today starts with alerts and ends with explanations that sound reasonable but can’t be verified. An alert is fed into a language model, and the output looks like an answer. It often isn’t.

Read Post

LogicMonitor

Read more about Why Evidence-Backed RCA in Edwin AI Starts With Logs

8 Years of Building Obkio: From Network Monitoring to Observability & Network Diagnostics

Feb 27, 2026 By Alyssa Lamberti In Obkio

In 2016, Obkio was just an idea, but it was an idea born from a real problem. Before writing a single line of code, we conducted a market audit to understand why Network Performance Monitoring solutions weren't more mature. We interviewed banks, manufacturing companies, and service providers, and the answer was unanimous: the NPM tools on the market were too complex, and most businesses simply didn't have the internal resources to dedicate full-time to managing them.

Read Post

Obkio

Read more about 8 Years of Building Obkio: From Network Monitoring to Observability & Network Diagnostics

Block Builder: a new Mimir Component (Mimir Community Call February 2026)

Feb 27, 2026 By Grafana In Grafana

At today’s community call, we will hear from David Grant, one of the engineers who has brought a new component, the Block Builder, into Mimir. Using the Ingest Storage architecture in Mimir 3.0, the Block Builder takes over the block-building responsibility from the Ingester. This feature is experimental in Mimir today, but is rolling out to production inside of Grafana Labs now. This is a great time to introduce the component, discuss the motivation, and show where it fits in the larger architecture.

View Video

Grafana

Read more about Block Builder: a new Mimir Component (Mimir Community Call February 2026)

AI Agents in IT Operations: From Concept to Practical Value

Feb 27, 2026 By Dallon Robinette In Selector

Artificial intelligence has been a defining theme in IT operations for nearly a decade. Early AIOps initiatives focused on predictive analytics and anomaly detection, promising to reduce operational overhead and improve system reliability. While these capabilities delivered incremental value, they often fell short of transforming how operations actually functioned.

Read Post

Selector

Read more about AI Agents in IT Operations: From Concept to Practical Value

The Grafana Labs operating system: Introducing our Guiding Principles

Feb 27, 2026 By Matt Toback In Grafana

Matt Toback is the VP of Culture at Grafana Labs. We published our original company values back in December 2020. We were a young company, growing fast, and fully remote. Our values at the time were aspirational, and painted a picture of the kind of company we wanted to be. Those values did real work and they mattered. You could hear them used in everyday conversations, and they helped get us to where we are today. But growth has a way of revealing gaps.

Read Post

Grafana

Read more about The Grafana Labs operating system: Introducing our Guiding Principles

The Definitive AWS Outage Report 2025: Reliability Analytics and Cascade Impact

Feb 27, 2026 By Hrishikesh Barua In IncidentHub

Amazon Web Services remains one of the most popular cloud providers, with 200+ services in 39 regions across the world. Like all providers, they have their share of outages. In 2025, IncidentHub detected 38 AWS outages, of which the one on October 20th had the most widespread impact affecting hundreds of SaaS providers simultaneously. Payments were disrupted, students lost access to classrooms, developer tooling degraded, and some IT teams experienced alerting gaps.

Read Post

IncidentHub

Read more about The Definitive AWS Outage Report 2025: Reliability Analytics and Cascade Impact

Talk to Your Logs: LLM-Powered Chat UI in DSDL 5.2.3

Feb 27, 2026 By Huaibo Zhao In Splunk

We are excited to announce the release of the Splunk App for Data Science and Deep Learning (DSDL) version 5.2.3. Since 2018, DSDL has served as an innovation hub for custom AI integrations within Splunk. In 2025, the release of DSDL 5.2.0 introduced customizable Large Language Model (LLM) integrations, bringing Retrieval Augmented Generation (RAG) and Agentic AI workflows to Splunk users.

Read Post

Splunk

Read more about Talk to Your Logs: LLM-Powered Chat UI in DSDL 5.2.3

Top tips: Think it's a recommendation? It might be an ad

Feb 26, 2026 By Alsherin In ManageEngine

Top tips is a weekly column where we highlight what’s trending in the tech world and list ways to explore these trends. This week, we'll be looking at ways we can spot ads disguised as recommendations in today's influencer era. These days, it's getting harder for me to distinguish between an ad and a recommendation.

Read Post

ManageEngine

Read more about Top tips: Think it's a recommendation? It might be an ad

Powering Security Innovation: Executive Q&A on Splunk Joining AWS Security Hub Extended

Feb 26, 2026 By Kamal Hathi In Splunk

To succeed in the AI era, customers need fast, easy access to security solutions that can harness the power of agentic AI and deliver business outcomes. They need seamless access to their data for faster threat detection, simpler incident response, and reduced risk. They need technology vendors to work together and not in silos.

Read Post

Splunk

Read more about Powering Security Innovation: Executive Q&A on Splunk Joining AWS Security Hub Extended

Setting up a Domain Blacklist Check

Feb 26, 2026 By Uptime Website Monitoring In uptime

Check if your domain is on spam blacklists daily with Uptime.com's Domain Blacklist Check and get instant alerts for any issues.

View Video

uptime

Read more about Setting up a Domain Blacklist Check

AI Won't Replace IT Jobs - But This Will

Feb 26, 2026 By solarwindsinc In SolarWinds

AI isn’t the real threat to IT careers. The real risk? Not building the skills that AI can augment. Jon Collins breaks down the “AI augmentation gap.”

View Video

SolarWinds

Read more about AI Won't Replace IT Jobs - But This Will

Millions of Metrics. Zero Clarity.

Feb 26, 2026 By Virtana In Virtana

Millions of metrics. Zero clarity. That’s the reality many IT teams are facing today. As environments grow more complex, telemetry explodes. Millions of records generated every hour. Dozens of specialized tools for network, storage, Kubernetes, cloud, AI workloads. Each tool is good at its domain. But none of them answers the real question: Where should I focus right now? Fragmented visibility creates predictable failure modes.

View Video

Virtana

Read more about Millions of Metrics. Zero Clarity.

How to Debug Code You Didn't Write (your AI did)

Feb 26, 2026 By Todd H. Gardner In TrackJS

I was looking at a customer’s error report last week. A TypeError buried three callbacks deep in a checkout flow that made no sense. The code around it was clean, well-structured, and completely wrong about how the Stripe API actually works. Turns out it was vibe-coded. Someone prompted their way through the integration, it passed code review because it looked reasonable, and it worked fine right up until a customer’s card got declined for the first time. That’s the new normal.

Read Post

TrackJS

Read more about How to Debug Code You Didn't Write (your AI did)

12 Best SSL Certificate Monitoring Tools in 2026

Feb 26, 2026 By Dotcom-Monitor In Dotcom-Monitor

An expired or misconfigured SSL/TLS certificate doesn’t fail quietly. Users get blocked by browser warnings, conversions drop, and teams scramble to diagnose whether the problem is expiration, a missing intermediate, an SNI/hostname mismatch, or a CDN edge serving an old chain. That’s why SSL certificate monitoring in 2026 is less about “check the expiry date” and more about continuous validation + fast alerting + enough context to fix the issue quickly.

Read Post

Dotcom-Monitor

Read more about 12 Best SSL Certificate Monitoring Tools in 2026

Enable end-to-end visibility into your Java apps with a single command

Feb 26, 2026 By Sarjeel Yusuf In Datadog

Achieving end-to-end observability for applications is a top priority for organizations today, but instrumenting for both frontend and backend monitoring can be a significant hurdle. What complicates matters is that the SREs and DevOps teams responsible for deploying monitoring tools typically don’t own frontend code or have the context needed to safely modify it.

Read Post

Datadog

Read more about Enable end-to-end visibility into your Java apps with a single command

The Command Center Shift: Why the Future of Middleware is Unified, Predictive, and Transaction-Centric

Feb 26, 2026 By meshIQ In meshIQ

Middleware is evolving beyond invisible plumbing into a strategic Command Center. The future demands unified management, predictive intelligence, and transaction-centric operations to move from reactive firefighting to operational mastery in 2026.

Read Post

meshIQ

Read more about The Command Center Shift: Why the Future of Middleware is Unified, Predictive, and Transaction-Centric

Track and Fix Ruby on Rails errors Using Rollbar

Feb 26, 2026 By Rollbar In Rollbar

Setting up Rollbar with your Ruby on Rails application with custom parameters and people tracking.

View Video

Rollbar

Read more about Track and Fix Ruby on Rails errors Using Rollbar

Build a Unified Operational Ecosystem with ServiceNow and Coralogix

Feb 26, 2026 By Jonny Steiner In Coralogix

During high-priority incidents, SRE teams frequently lose critical time switching between monitoring platforms and ticketing systems. Context switching like this forces engineers to manually update incident states by copying and pasting data. The inevitable result is increased risk of information gaps and slower Mean Time to Recovery (MTTR).

Read Post

Coralogix

Read more about Build a Unified Operational Ecosystem with ServiceNow and Coralogix

AI can do what now?! What an ethical hacker says about deepfakes and AI

Feb 26, 2026 By Elastic In Elastic

Real-time camera deepfakes are no longer science fiction. High-fidelity, AI-generated impersonation may be advancing quickly — but that's not the only AI risk financial services companies should be thinking about. In this episode of AI Can Do What Now?!, Lisa Jones-Huff, director of security solutions architecture at Elastic, sits down with ethical hacker Freakyclown (FC) to explore what is technically possible today with AI, where reality still falls short of the hype, and what security teams should be worried about.

View Video

Elastic

Read more about AI can do what now?! What an ethical hacker says about deepfakes and AI

AI can do what now?! The real risks of AI in social engineering

Feb 26, 2026 By Elastic In Elastic

What is the most immediate risk financial services companies face today? AI-enabled social engineering is already accelerating real-world attacks. Scale, personalization, speed, and automation are lowering the barrier for attackers while making fraud detection more complex for defenders. In this episode of AI Can Do What Now?!, Lisa Jones-Huff, director of security solutions architecture at Elastic, is joined by ethical hacker Freakyclown (FC), and principle solutions architect Joe Murin to explore what is actually happening right now — beyond the hype.

View Video

Elastic

Read more about AI can do what now?! The real risks of AI in social engineering

Tech Talk 2 2 Back to the partition where we are going we do not need global indexes

Feb 26, 2026 By VictoriaMetrics In VictoriaMetrics

Resources for Further Learning.

View Video

VictoriaMetrics

Monitoring

Read more about Tech Talk 2 2 Back to the partition where we are going we do not need global indexes

Observability Self-Hosted 2026.1 - Routing Insights

Feb 26, 2026 By solarwindsinc In SolarWinds

SolarWinds Evangelist Chrystal Taylor introduces the new routing insights feature in Observability Self-Hosted 2026.1. This first phase enhancement enriches routing table information with detailed context, including forwarding interface names, VRF data, next hop IPs, and timestamps. The update unifies BGP, OSPF, and EIGRP neighbors in a single dashboard, providing visibility into peer identity, flap counts, health status, and admin states.

View Video

SolarWinds

Read more about Observability Self-Hosted 2026.1 - Routing Insights

Monitor OpenRouter Usage with OpenTelemetry and SigNoz

Feb 26, 2026 By SigNoz - Open Source Observability Platform In SigNoz

SigNoz uses distributed tracing to gain visibility into your software stack. If you need any clarification or find something missing, feel free to raise a GitHub issue with the label documentation or reach out to us at the community slack channel.

View Video

SigNoz

Read more about Monitor OpenRouter Usage with OpenTelemetry and SigNoz

Let's make alerting great again

Feb 26, 2026 By Nikolay Sivko In Coroot

No one has time to watch dashboards all day. Alerts exist to tell us when something goes wrong or is starting to go wrong, so we can act early. In theory, it sounds simple. Define a rule, set a threshold, get notified when it is crossed. In practice, it rarely works that smoothly.

Read Post

Coroot

Read more about Let's make alerting great again

9 Best Network Monitoring Tools for 2026

Feb 26, 2026 By Greg Collins In WhatsUp Gold

A Rapidly Evolving Network Landscape Demands the Right Monitoring Strategy Choosing the right network monitoring solution has become a mission‑critical decision for IT teams. In recent years networks have become increasingly hybrid, cloud‑distributed and reliant on remote connectivity. For enterprises with complex infrastructures, network monitoring is an essential tool.

Read Post

WhatsUp Gold

Read more about 9 Best Network Monitoring Tools for 2026

Building Web API integrations that scale (5 key lessons)

Feb 26, 2026 By Blog In Squared Up

I've used the Web API plugin with a wide range of APIs, and each one taught me something new. But before diving into building, I learned to pause and ask: What am I actually trying to display? Not what data the API can give me, but what would be useful on a dashboard? That shift in thinking — from ‘fetch everything’ to ‘fetch what matters’ — shapes how I approach every integration.

Read Post

Squared Up

Read more about Building Web API integrations that scale (5 key lessons)

The Accountability Era: Decision Paths That Stand Up to Review

Feb 26, 2026 By ScienceLogic In ScienceLogic

Modern IT environments depend on decisions that can withstand scrutiny. As systems grow more interconnected and outages carry greater cost, organizations must understand not just what actions teams take, but how those actions were formed. Operators need guidance anchored in evidence and aligned with business impact. Operational accountability now extends beyond correctness. Teams must show the information that shaped the decision, the options considered, and the reasoning behind the chosen path.

Read Post

ScienceLogic

Read more about The Accountability Era: Decision Paths That Stand Up to Review

Colsubsidio transforms business process monitoring with Elastic Observability

Feb 26, 2026 By Amena Siddiqi In Elastic

Colsubsidio is one of the largest and most representative family compensation funds in Colombia. The organization manages and delivers essential social services to millions of users through a broad network spanning health, education, subsidies, recreation, tourism, credit, housing, pharmacies, retail supply, culture, and labor welfare.

Read Post

Elastic

Read more about Colsubsidio transforms business process monitoring with Elastic Observability

The 5 best Jira reporting tools for 2026

Feb 26, 2026 By Blog In Squared Up

Jira is the backbone of project management for thousands of agile teams worldwide. But while Jira excels at tracking issues and sprints, its native reporting can leave teams wanting more — especially when it comes to sharing insights, visualizing trends, and integrating data from across the business. That’s where dedicated Jira reporting tools come in. In this guide, we rank the 5 best Jira reporting tools on the market today.

Read Post

Squared Up

Read more about The 5 best Jira reporting tools for 2026

The "Now" Problem: Why BESS Operations Demand Last Value Caching

Feb 26, 2026 By Suyash Joshi In InfluxData

Battery Energy Storage Systems (BESS) represent one of the most unforgiving environments for real-time data. Unlike a passive asset, a battery is a complex electrochemical system where safety and revenue are determined by split-second decisions. In this context, “average” latency can become a serious problem. Performance depends entirely on one key question.

Read Post

InfluxData

Read more about The "Now" Problem: Why BESS Operations Demand Last Value Caching

What is Site24x7 Event Correlation? Causal AI and autonomous IT operations explained

Feb 26, 2026 By ManageEngine Site24x7 In Site24x7

When your distributed system goes down, your team spends days sorting through noise. That is revenue walking out the door. In this video, Jasper Paul breaks down the event correlation engine built to eliminate alert fatigue, and accelerate root cause analysis. Most monitoring tools still rely on basic time-window alert grouping — clustering alerts that fire at the same time and calling it correlation. But in a distributed system, outages are never isolated events. And grouping symptoms doesn't find root causes.

View Video

Site24x7

Read more about What is Site24x7 Event Correlation? Causal AI and autonomous IT operations explained

Sponsored Post

SAP Application Performance Monitoring (APM): Beyond Generic Metrics

Feb 25, 2026 By Avantra In Avantra

Your enterprise APM tool shows SAP is using 90% CPU. The dashboard turns red. An alert fires. Now what? You open Dynatrace. You see the Java Virtual Machine metrics for your NetWeaver stack. You see HTTP response times for your Fiori apps. You see a spike in database calls. None of this tells you why VA01 takes 45 seconds to create a sales order. None of this tells you which custom ABAP report is consuming memory. None of this explains the short dump that crashed your pricing routine. This is the gap between generic APM and true SAP application performance monitoring. Your enterprise tools see the symptoms.

Read Post

Avantra

Read more about SAP Application Performance Monitoring (APM): Beyond Generic Metrics

Lightrun Launches Industry's First AI SRE With Live Dynamic Runtime Context

Feb 25, 2026 By Lightrun In Lightrun

Autonomously Remediates Software Issues, Generates Missing Runtime Evidence on Demand, and Validates Hypotheses Against Live Execution from Code to Production.

Read Post

Lightrun

Read more about Lightrun Launches Industry's First AI SRE With Live Dynamic Runtime Context

Introducing read-only API tokens

Feb 25, 2026 By Valeria Kurolapova In StatusGator

We’ve made a small but important improvement to API tokens in StatusGator. API tokens now support permission levels. When creating a token, you can choose between: That’s it. A simple change that gives you more control and better security.

Read Post

StatusGator

Read more about Introducing read-only API tokens

Smarter Alerts, Upgraded Solution Packs, and an Expanded Ecosystem for Hyperconnectivity

Feb 25, 2026 By Tejo Prayaga In Fabrix

At Fabrix.ai, we are constantly pushing the boundaries of what Agentic AI and AIOps can achieve. We are happy to announce the release of Fabrix.ai platform version 8.2, packed with capabilities that make managing your IT environment more intuitive, secure, and perfect.

Read Post

Fabrix

Read more about Smarter Alerts, Upgraded Solution Packs, and an Expanded Ecosystem for Hyperconnectivity

Grafana 12.4 TL;DR - The Final 12.x Release

Feb 25, 2026 By Grafana In Grafana

As the final minor release in the Grafana 12 series, 12.4 builds on our shift toward scalable, as-code workflows and a dramatically improved user experience. From bi-directional Git workflows to smarter dashboard layouts and stronger governance controls, this release is all about helping teams move faster with less friction.

View Video

Grafana

Read more about Grafana 12.4 TL;DR - The Final 12.x Release

Fixing a production error with the Flare CLI and AI, from discovery to deploy

Feb 25, 2026 By Freek Van der Herten In Oh Dear

Using the Flare CLI and its agent skill to find, fix, and resolve a production error without leaving the terminal. The AI agent looks up the latest error on freek.dev via the Flare CLI, analyzes the stack trace against the local source code, generates a fix, deploys it using bash mode, and marks the error as resolved in Flare. Learn more.

View Video

Oh Dear

Read more about Fixing a production error with the Flare CLI and AI, from discovery to deploy

Observability Self-Hosted 2026.1 - Server Configuration Comparisons

Feb 25, 2026 By solarwindsinc In SolarWinds

In this video, SolarWinds Evangelist Chrystal Taylor introduces server configuration comparisons, a new feature in Observability Self-Hosted 2026.1 and Server Configuration Monitor 2026.1. The key highlight is the ability to compare server configurations side by side, enabling users to identify differences in configuration files between nodes or against a defined ideal state. This new functionality aims to help users monitor configuration drift.

View Video

SolarWinds

Read more about Observability Self-Hosted 2026.1 - Server Configuration Comparisons

Incident Report: Exercises, Cleanups, and Evacuations

Feb 25, 2026 By Fred Hebert In Honeycomb

Every year, Honeycomb runs disaster recovery scenarios in multiple environments, including in production. Although each of our instances runs in a single region, on at least three Availability Zones (AZs), we have multiple plans for partial regional failures, and particularly, zonal failures. One of these tests was run on December 5th, and after its successful completion came its cleanup steps.

Read Post

Honeycomb

Read more about Incident Report: Exercises, Cleanups, and Evacuations

Alerting Is a Socio-Technical System

Feb 25, 2026 By James Barnes In StatusCake

In the previous posts, we’ve looked at how alert noise emerges from design decisions, why notification lists fail to create accountability, and why alerts only work when they’re designed around a clear outcome. Taken together, these ideas point to a broader conclusion. That alerting is not just a technical system, it’s a socio-technical one. Alerting systems encode assumptions about how people behave, how responsibility is distributed, and how decisions are made under pressure.

Read Post

StatusCake

Read more about Alerting Is a Socio-Technical System

AI performance reviews for your app with the Flare CLI

Feb 25, 2026 By Freek Van der Herten In Oh Dear

The Flare CLI connects to your Flare performance monitoring data and uses AI to turn it into actionable insights, right from your terminal. In this video, you'll see how a single command pulls your real performance data from Flare, then generates a full review: identifying slow endpoints, spotting error trends, and suggesting concrete fixes. Links.

View Video

Oh Dear

Read more about AI performance reviews for your app with the Flare CLI

Best Website Monitoring Tools for Compliance and Security in 2026

Feb 25, 2026 By ChangeTower In ChangeTower

Compliance audits used to be annual fire drills. Teams would scramble for weeks gathering screenshots, pulling logs, and hoping nothing slipped through the cracks. That approach no longer works when regulations like GDPR and HIPAA require continuous documentation and real-time evidence of security controls. Website monitoring tools designed for compliance have evolved to address this reality, automating evidence collection and flagging issues before auditors ever arrive.

Read Post

ChangeTower

Read more about Best Website Monitoring Tools for Compliance and Security in 2026

Claude Code + OpenTelemetry: Per-Session Cost and Token Tracking

Feb 25, 2026 By Adnan Rahic In ObservIQ

I was looking at our Claude Code spend in the Anthropic console the other day. Aggregate cost, aggregate tokens — no breakdown by developer, no breakdown by session. I knew my Hackathon team had been using it heavily on building out new features for the OpenTelemetry Distro Builder. But heavily how? I had no idea. Turns out Claude Code has been emitting OpenTelemetry signals the whole time. Per-session cost, token counts, every tool call it makes on your codebase.

Read Post

ObservIQ

Read more about Claude Code + OpenTelemetry: Per-Session Cost and Token Tracking

Digital Employee Experience Is Now Core to IT - Recognized by Analysts, Reinforced by Customers

Feb 25, 2026 By Paul Gentile In Nexthink

Over the past few years, Digital Employee Experience (DEX) has moved from emerging concept to essential capability for modern IT organizations. The conversation has changed. IT is no longer measured only by system uptime or ticket resolution. Today, success is defined by how technology actually performs for employees — and how consistently organizations can deliver productive, friction-free digital work.

Read Post

Nexthink

Read more about Digital Employee Experience Is Now Core to IT - Recognized by Analysts, Reinforced by Customers

Catch Every Moment in Kubernetes: Splunk's Observability Advantage

Feb 25, 2026 By Splunk In Splunk

Discover why real-time, unsampled observability is critical for Kubernetes environments with Stephane Estevez from Splunk at KubeCon Europe 2026. Learn how Splunk’s unique approach helps you catch every important moment—even when containers vanish in milliseconds. Watch now for expert insights on cloud-native monitoring, observability, and Kubernetes best practices!

View Video

Splunk

Read more about Catch Every Moment in Kubernetes: Splunk's Observability Advantage

VictoriaMetrics February 2026 Ecosystem Updates

Feb 25, 2026 By Pablo Fernandez In VictoriaMetrics

This month, we’re thrilled to see OpenAI using the VictoriaMetrics Stack internally — including VictoriaMetrics, VictoriaLogs, and VictoriaTraces — in their Harness engineering experiment, as shown in their architecture diagram. It’s a great way of combining observability and AI agents.

Read Post

VictoriaMetrics

Read more about VictoriaMetrics February 2026 Ecosystem Updates

Grafana 12.4 release: faster and easier data visualization, observability as code updates, and more

Feb 25, 2026 By Grafana Labs Team In Grafana

As we gear up for Grafana 13, the next major release of the open source data visualization platform that we’ll announce at GrafanaCON this April, our engineering team is still shipping some powerful new features along the way. Case in point: Grafana 12.4 is officially here, and there’s a lot to be excited about. The latest minor release includes a ton of updates that help you build and design dashboards faster than ever, as well as manage and scale those dashboards seamlessly over time.

Read Post

Grafana

Read more about Grafana 12.4 release: faster and easier data visualization, observability as code updates, and more

Public Preview: Dynamic Dashboards - Tabs, Auto Grid Layout, Context-Aware Editing | Grafana 12.4

Feb 25, 2026 By Grafana In Grafana

Now in public preview, Dynamic dashboards includes new features and a revamped user experience that make it even easier to find the exact insights you need, when you need them.

View Video

Grafana

Read more about Public Preview: Dynamic Dashboards - Tabs, Auto Grid Layout, Context-Aware Editing | Grafana 12.4

Git Sync: GitLab, Bitbucket, and universal provider support

Feb 25, 2026 By Grafana In Grafana

In this video, Roberto Jiménez, Staff Software Engineer at Grafana Labs, shares that Git Sync is expanding beyond GitHub to include universal support for different providers. The implementation uses two layers.

View Video

Grafana

Read more about Git Sync: GitLab, Bitbucket, and universal provider support

Cut Costs, Not Visibility. Use S3 for Low-Cost Log Retention and Faster Response.

Feb 25, 2026 By Splunk In Splunk

Why pay for continuous ingestion of data you rarely use? Learn how to maintain a lean data strategy by keeping long-term logs in cheap S3 storage, while retaining the power to "promote" specific slices into Splunk whenever an audit or investigation arises. See how Promote for Amazon S3 gives you the speed of local indexing without sacrificing speed in investigations.

View Video

Splunk

Read more about Cut Costs, Not Visibility. Use S3 for Low-Cost Log Retention and Faster Response.

AlphaFold, Office Politics, and Mustafa Suleyman's Two Futures (w/Benedict Lelijveld)

Feb 25, 2026 By Nexthink In Nexthink

In this episode, Benedict Lelijveld joins us to unpack what it feels like to start a career in an era shaped by COVID disruption, hybrid work, and accelerating AI. We dig into his writing on Mustafa Suleyman and the idea of “pessimism aversion”: holding genuine hope for breakthroughs (from personal AI to advances in biology) while staying clear-eyed about risks like misuse, weak regulation, and who really benefits. Benedict also reflects on what early-career professionals lose when work becomes too remote—and why protecting your voice, curiosity, and craft matters more than ever as automation spreads.

View Video

Nexthink

Read more about AlphaFold, Office Politics, and Mustafa Suleyman's Two Futures (w/Benedict Lelijveld)

Case Study - Troubleshooting Storage Failures in a VMware ESXi Infrastructure

Feb 25, 2026 By Karthik G In eG Innovations

IT problems happen even in the best architected infrastructure due to configuration changes, failures, upgrades and such. How quickly and effectively you can detect and resolve such problems dictates how efficient your IT operation is. Today, I’ll cover how eG Enterprise helped us troubleshoot a hardware failure (a storage battery failure) that that caused a cascade of failures in a VMware ESXi infrastructure.

Read Post

eG Innovations

Read more about Case Study - Troubleshooting Storage Failures in a VMware ESXi Infrastructure

Microsoft SCOM Tips & Tricks

Feb 24, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

This one is for all the Microsoft SCOM geeks out there — 99 practical tips & tricks to make managing SCOM way easier. The tips compiled here draw from community experts, SCOM-focused blogs, Microsoft’s official documentation, and the hands-on experience at NiCE. You may already know some of them, but having them all organized in one place makes it easy to reference and put them into practice.

Read Post

NiCE IT Mgmt

Read more about Microsoft SCOM Tips & Tricks

Notes from the Field: XenServer falling back to file-based licensing when using LAS

Feb 24, 2026 By GripMatix In GripMatix

Citrix has been transitioning products toward License Access Service (LAS) as the modern licensing method. Unlike traditional file-based licensing, LAS introduces service-based communication between products and the Citrix License Server. As of 15 April 2026, LAS becomes the mandatory licensing method for supported products. Environments still relying on file-based licensing will need to transition before that date.

Read Post

GripMatix

Read more about Notes from the Field: XenServer falling back to file-based licensing when using LAS

API update: Service status page ratings now available

Feb 24, 2026 By Valeria Kurolapova In StatusGator

We’ve added service status page ratings to API v3. You can now access the same letter grades, descriptions, and average acknowledgment delay metrics that appear on StatusGator service pages – directly from the API.

Read Post

StatusGator

Read more about API update: Service status page ratings now available

What Is Predictive Analytics? A Complete Guide for 2026

Feb 24, 2026 By Company In InfluxData

In simple terms, predictive analytics is a form of analytics that tries to predict future events, trends, or behaviors based on historical and present data. You can achieve this goal in different ways, each involving trade-offs between accuracy and cost.

Read Post

InfluxData

Read more about What Is Predictive Analytics? A Complete Guide for 2026

The Evolution of Digital Employee Experience (DEX) | How IT Is Transforming the Workplace

Feb 24, 2026 By Nexthink In Nexthink

Digital Employee Experience (DEX) is transforming how IT teams support employees, improve productivity, and drive business outcomes. In this video, we explore the evolution of DEX—from traditional reactive IT support to proactive, experience-driven operations that empower both employees and organizations.

View Video

Nexthink

Read more about The Evolution of Digital Employee Experience (DEX) | How IT Is Transforming the Workplace

The Grafana Cloud identity blueprint: balancing security and scale

Feb 24, 2026 By Sarah Constant In Grafana

If you've ever rolled out Grafana Cloud to a growing engineering organization, this pattern may sound familiar: Everything feels simple at first. You invite a few teammates, give them access, and dashboards start appearing. Then the team grows. Then the number of stacks grows. Over time, a model that once felt fast and empowering starts to feel risky, difficult to understand, and even harder to undo. This post is about avoiding that moment.

Read Post

Grafana

Read more about The Grafana Cloud identity blueprint: balancing security and scale

Measure and improve mobile app startup performance with Datadog RUM

Feb 24, 2026 By Jessica Manheimer In Datadog

Mobile app users form opinions quickly. A slow or inconsistent startup experience can frustrate them before they reach the first screen, increasing the likelihood that they abandon the app or fail to complete key actions such as signing up or making a purchase. However, app teams often lack reliable signals that explain why startup performance varies, making it difficult to improve the user experience.

Read Post

Datadog

Read more about Measure and improve mobile app startup performance with Datadog RUM

From Alerts to Answers: Introducing Coralogix Cases

Feb 24, 2026 By Ofri Grushka In Coralogix

Modern incident response doesn’t fail due to a lack of alerts firing. It fails because teams are overwhelmed by the sheer volume and the lack of context around them. Today, most observability and monitoring platforms generate a flood of alerts. Each one is triggered independently, even when they are symptoms of the same issue. Engineers are left trying to reconstruct the full picture while jumping between dashboards, Slack messages, and tickets.

Read Post

Coralogix

Read more about From Alerts to Answers: Introducing Coralogix Cases

Reducing Risk When It Matters Most: How Verifiable Guidance Protects Critical Operations

Feb 24, 2026 By ScienceLogic In ScienceLogic

When a major incident strikes, every second becomes a decision point. Service degradations accelerate. Customers feel the impact. Revenue and reputation hang in the balance. In these moments, IT teams do not need abstractions or probabilistic guesses. They need guidance they can validate and decision paths they can explain with confidence long after the incident is resolved. Hybrid environments are too complex for intuition, and the repercussions of an incorrect action are significant.

Read Post

ScienceLogic

Read more about Reducing Risk When It Matters Most: How Verifiable Guidance Protects Critical Operations

Observability Self-Hosted 2026.1 - Additional Cloud Support

Feb 24, 2026 By solarwindsinc In SolarWinds

SolarWinds Evangelist Chrystal Taylor demonstrates the new cloud entity support features in Observability Self-Hosted version 2026.1. The update adds monitoring capabilities for MySQL and PostgreSQL databases on Google Cloud Platform, GCP load balancers, Azure functions, AWS Elastic Kubernetes Service, and AWS Lambda functions. She provides a guided walkthrough of the dashboard interface, showing how users can monitor various metrics including database performance, network traffic, latency, function execution counts, system usage, and costs across different cloud platforms.

View Video

SolarWinds

Read more about Observability Self-Hosted 2026.1 - Additional Cloud Support

Monitoring and Optimizing a Hybrid Cloud Environment | WhatsUp Gold

Feb 24, 2026 By Progress WhatsUp Gold In WhatsUp Gold

This webinar focuses on Monitoring and Optimizing a Hybrid Cloud Environment. Downtime is an expensive inconvenience. Yet many IT teams still face monitoring blackouts due to rigid licensing models and outdated failover strategies. In this session, we’ll introduce a smarter approach: High Availability by Design. Whether you're scaling operations or modernizing infrastructure, this session will enable you with the tools and insights to build a resilient, future-ready monitoring strategy.

View Video

WhatsUp Gold

Read more about Monitoring and Optimizing a Hybrid Cloud Environment | WhatsUp Gold

Bindplane + VictoriaMetrics: Unified Telemetry for Metrics, Traces, and Logs at Scale

Feb 24, 2026 By Adnan Rahic & Diana Todea In ObservIQ

We’re excited to announce new native Bindplane destinations for the VictoriaMetrics ecosystem. It’s now easier to collect, process, and route OpenTelemetry metrics, traces, and logs at scale. You can directly connect VictoriaMetrics’ high-performance storage engines to Bindplane’s vendor-neutral, OpenTelemetry-native telemetry pipeline.

Read Post

ObservIQ

Read more about Bindplane + VictoriaMetrics: Unified Telemetry for Metrics, Traces, and Logs at Scale

How To Manage Checks With Terraform | Grafana Cloud Synthetic Monitoring

Feb 24, 2026 By Grafana In Grafana

Learn how to create, update and manage terraform synthetic monitoring checks in Grafana Cloud Synthetic Monitoring. In this video, we walk through the steps you need to follow.

View Video

Grafana

Read more about How To Manage Checks With Terraform | Grafana Cloud Synthetic Monitoring

Why "Best Practices" Are Breaking IT Teams

Feb 24, 2026 By solarwindsinc In SolarWinds

“Best practices” fail when teams forget the principles behind them. Agile. ITIL. DevOps. None of them work without people-first leadership.

View Video

SolarWinds

Read more about Why "Best Practices" Are Breaking IT Teams

Reinventing the Incident Responder's Day: Empowering Tier 2 SOC Analysts with Splunk's Agentic SOC Platform

Feb 24, 2026 By Milena Chen In Splunk

The Tier 2 SOC Analyst or the Incident Responder (often hailed as the "Sherlock Holmes of the network") faces an increasingly complex and relentless digital landscape. In a world where analysts are being overwhelmed by alerts, held back by fragmented, manual tooling and inefficient workflows, incident responders are charged with the critical task of identifying, analyzing, and mitigating security threats.

Read Post

Splunk

Read more about Reinventing the Incident Responder's Day: Empowering Tier 2 SOC Analysts with Splunk's Agentic SOC Platform

I let Claude investigate a production incident with Honeybadger's MCP server

Feb 24, 2026 By Honeybadger In Honeybadger

In this demo, Kevin shows how you can use Honeybadger's MCP server with Claude to investigate a production incident — going from a natural language prompt to a complete incident dashboard in minutes. Honeybadger is an application health monitoring platform that helps developers catch errors, track performance, and stay on top of incidents. The MCP server lets AI assistants like Claude query your Honeybadger data directly, so you can investigate issues conversationally without digging through dashboards manually.

View Video

Honeybadger

Read more about I let Claude investigate a production incident with Honeybadger's MCP server

Why Site Performance Metrics Are the Missing Piece in Your Local SEO Strategy

Feb 24, 2026 By OpsMatters In OpsMatters

Most conversations about local SEO start and end with Google Business Profiles, reviews, and citations. And sure, those things matter. But there's a whole layer of the ranking equation that gets ignored by marketing teams because it lives on the ops side of the house. Site performance, server response times, uptime consistency, and how your infrastructure handles traffic spikes during peak local search hours. These aren't just IT concerns anymore. They have a direct line to whether your business shows up when someone searches "plumber near me" at 9 PM on a Tuesday.

Read Post

OpsMatters

Read more about Why Site Performance Metrics Are the Missing Piece in Your Local SEO Strategy

Freshping is retiring-ensure your monitoring remains uninterrupted

Feb 23, 2026 By Srivaralakshmi Ms In ManageEngine

Freshping has announced that it will retire its service on March 6, prompting many organizations to reassess how they maintain uptime visibility. When monitoring stops, it doesn't mean your issues stop too; it’s a period of forced blindness. This sunsetting period exposes a core vulnerability: Digital visibility is only as strong as the platform supporting it.

Read Post

ManageEngine

Read more about Freshping is retiring-ensure your monitoring remains uninterrupted

Sponsored Post

What to Say When Things Break: Outage Notification Templates for Ops Teams

Feb 23, 2026 By StatusGator In StatusGator

This practical guide explains what to say when systems break, offering ready-to-use outage notification templates and best practices to help ops teams communicate clearly during incidents. Learn how effective outage communication can reduce confusion, manage user expectations, and maintain trust during service disruptions.

Read Post

StatusGator

Read more about What to Say When Things Break: Outage Notification Templates for Ops Teams

DNS blocklist monitoring now available to all Oh Dear users

Feb 23, 2026 By Mattias Geniar In Oh Dear

Your domain is on a spam blocklist. Password reset emails aren't arriving, order confirmations land in spam, and customers are complaining that "your site doesn't work." By the time you hear about it, the damage has been building for days. We've shipped DNS blocklist monitoring to catch this early. Oh Dear now checks your domain against 11 major blocklists and notifies you the moment you're listed, with direct links to get removed.

Read Post

Oh Dear

Read more about DNS blocklist monitoring now available to all Oh Dear users

The limits of MCP and how Olly surpasses them

Feb 23, 2026 By Chris Cooney In Coralogix

Model Context Protocol (MCP) servers act as adapter layers between clients and AI based workloads. MCP installation into an IDE, such as Cursor, brings a wealth of information directly into the developers primary tool, minimizing context switching and, especially in the world of observability, bringing telemetry closer to the code. MCP is not without its limits. These limits initially seem trivial, but in time, some of the inherent limitations to a basic MCP implementation become apparent.

Read Post

Coralogix

Read more about The limits of MCP and how Olly surpasses them

How To Create and Manage Secrets | Grafana Cloud Synthetic Monitoring

Feb 23, 2026 By Grafana In Grafana

Learn how to create and use secrets management in Grafana Cloud Synthetic Monitoring. In this video, we walk through the steps you need to create secrets.

View Video

Grafana

Read more about How To Create and Manage Secrets | Grafana Cloud Synthetic Monitoring

What I Learned After Vibe Coding for 1000 Hours

Feb 23, 2026 By Splunk In Splunk

I built a simple web application with an agentic IDE to demonstrate why vibe coding is often just a high-speed technical debt generator. We'll look at how AI agents make architectural assumptions that create brittle foundations for your code.

View Video

Splunk

Read more about What I Learned After Vibe Coding for 1000 Hours

The Benefits of Distributed Network Monitoring for Multi-Site Businesses: Why Hybrid Work Changed Everything

Feb 23, 2026 By Andrii Kernitskyi In Obkio

Most companies rewired how their people work, not once but twice. First for remote, then for RTO (Return to Office). Their network monitoring never caught up. So, what happened? IT teams are managing a network that spans headquarters, branch offices, home setups, and cloud apps with tools that still assume everyone's connecting back to one place. When something breaks (and it will), nobody can pinpoint where. IT takes the blame. Users lose productivity. Leadership loses patience.

Read Post

Obkio

Read more about The Benefits of Distributed Network Monitoring for Multi-Site Businesses: Why Hybrid Work Changed Everything

Using Core Web Vitals in Honeycomb Frontend Telemetry

Feb 23, 2026 By Ken Rimple In Honeycomb

Google's Core Web Vitals (CWVs) measurements have been used by web administrators and SREs to review frontend application performance metrics, and have been factored into Google's page rankings since 2021. They are also used in Google Analytics, which crawls websites and evaluates performance metrics over a period of multiple days, and with various frontends (desktop web, mobile web, etc.) to establish how well a website performs in production.

Read Post

Honeycomb

Read more about Using Core Web Vitals in Honeycomb Frontend Telemetry

IT Leadership Isn't a Tool Problem - It's a People Problem

Feb 23, 2026 By solarwindsinc In SolarWinds

Most IT failures aren’t caused by bad tools. They’re caused by ignoring people. Jon Collins explains why leadership, culture, and trust matter more than Agile, ITIL, or DevOps frameworks.

View Video

SolarWinds

Read more about IT Leadership Isn't a Tool Problem - It's a People Problem

Evaluating our AI Guard application to improve quality and control cost

Feb 23, 2026 By Santiago Mola In Datadog

This article is part of our series on how Datadog’s engineering teams use LLM Observability to build, monitor, and improve AI-powered systems. Organizations are building AI agents that help users automate work, analyze data, and interact with complex systems through natural language. As these agents become more capable, they also become more complex and exposed to risks such as prompt injection, data leaks, and unsafe code execution.

Read Post

Datadog

Read more about Evaluating our AI Guard application to improve quality and control cost

Best Incident Management Software for Engineering Teams (2026)

Feb 23, 2026 By Sahil Khan In Last9

Compare 9 incident management tools: PagerDuty, Opsgenie, Incident.io, Rootly, FireHydrant, BetterStack, Grafana OnCall, Squadcast, and Last9. Features, pricing, and which fits your team. Product Marketing Manager.

Read Post

Last9

Read more about Best Incident Management Software for Engineering Teams (2026)

AI Assistant vs Skylar Advisor

Feb 23, 2026 By ScienceLogic In ScienceLogic

What happens when AI understands your entire environment? With Skylar Advisor, you move beyond prompts and responses and get prioritized guidance based on real operational impact. Skylar Advisor identifies what matters most, explains why it matters, and provides clear next steps so even junior IT professionals can operate with confidence.

View Video

ScienceLogic

Read more about AI Assistant vs Skylar Advisor

A 4-Month Bug Fixed in <10 Minutes with Olly

Feb 23, 2026 By Chris Cooney In Coralogix

In today’s highly interconnected systems, the subtle relationships between services are rarely obvious. Modern, complex architectures generate telemetry that functions less as “flashing signs” and more as faint “breadcrumbs” to be followed across a vast network of signals. In 2025, about two-thirds of outages involved third-party systems like cloud platforms and APIs.

Read Post

Coralogix

Read more about A 4-Month Bug Fixed in

How Coralogix's Data Pipeline Turns Obscure Data into Clear Business Value

Feb 23, 2026 By Coralogix Team In Coralogix

Observability data arrives as a flood of signals, full of potential, but rarely consistent. Error messages and debug logs can reveal what businesses care about: reliability, customer experience, and revenue. The challenge is turning raw technical events into information the whole organization can act on. Many observability systems store data first and structure it later, forcing teams to rebuild context in dashboards and queries, often duplicating logic across services.

Read Post

Coralogix

Read more about How Coralogix's Data Pipeline Turns Obscure Data into Clear Business Value

Heartbeat behind the metrics | Hemachand on what visibility really means

Feb 23, 2026 By ManageEngine Site24x7 In Site24x7

What happens when observability grows faster than infrastructure? In this episode of Heartbeat Behind the Metrics, Hemachand Munagapati, Product Manager at Site24x7, reflects on over 15 years with the product and how the idea of a single pane of monitoring has shaped everything that followed.

View Video

Site24x7

Read more about Heartbeat behind the metrics | Hemachand on what visibility really means

Icinga Notifications: Improving Alerting and Incident Workflows Webinar

Feb 23, 2026 By Icinga In Icinga

Modern monitoring is not just about alerting, it’s about reducing noise, protecting on-call engineers from burnout, and improving incident MTTR through context-aware workflows. Icinga Notifications helps teams achieve just that with configurable, extensible alert processing built for scale. This webinar was held on February 17, 2026. We dive into the brand-new Icinga Notifications capabilities, a modern approach to alerting and incident workflows tailored for complex, dynamic infrastructures.

View Video

Icinga

Read more about Icinga Notifications: Improving Alerting and Incident Workflows Webinar

Why Nexthink Intelligence Is a Game-Changer for IT Teams

Feb 23, 2026 By Nexthink In Nexthink

Nexthink Intelligence transforms digital employee experience (DEX) for modern enterprises. Learn how IT teams can leverage real-time analytics, proactive insights, and automation to improve user productivity, troubleshoot issues fast, and deliver better workplace tech experiences. Learn more at nexthink.com.

View Video

Nexthink

Read more about Why Nexthink Intelligence Is a Game-Changer for IT Teams

Sponsored Post

Cisco Live'26 - Amsterdam: Aligning with the AI-Driven Future

Feb 22, 2026 By Shailesh Manjrekar In Fabrix

The energy at Cisco Live EMEA in Amsterdam (February 9-13, 2026) was primarily driven by groundbreaking AI announcements, & the event provided Fabrix.ai an opportunity to strengthen our strategic position alongside Cisco and Splunk ecosystems. The event’s focus on AI, highlighted by the recent Cisco AI Summit, emphasizes a clear market direction in which Fabrix.ai is perfectly poised to accelerate innovation.

Read Post

Fabrix

Read more about Cisco Live'26 - Amsterdam: Aligning with the AI-Driven Future

Database Partitioning: Types, Strategies, and When to Use Each

Feb 22, 2026 By Prathamesh Sonpatki In Last9

How database partitioning works in PostgreSQL and MySQL. Range, list, and hash partitioning with SQL examples and guidance on when to partition vs shard. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Read Post

Last9

Read more about Database Partitioning: Types, Strategies, and When to Use Each

Database Sharding: How It Works and When You Actually Need It

Feb 21, 2026 By Prathamesh Sonpatki In Last9

How database sharding works, common strategies (hash, range, directory), shard key selection, and the operational cost of running a sharded database in production. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Read Post

Last9

Read more about Database Sharding: How It Works and When You Actually Need It

Trello outage on February 19, 2026

Feb 20, 2026 By Andy Libby In StatusGator

On February 19, 2026, Trello users around the world began experiencing issues loading boards and accessing their workspaces. StatusGator received the first outage reports at 14:24 UTC and triggered an Early Warning Signal at 14:28 UTC. Trello did not officially acknowledge the incident until 15:08 UTC, after user reports had already subsided. This incident highlights how real time user reports and Early Warning Signals can identify widespread service degradation before providers confirm a problem.

Read Post

StatusGator

Read more about Trello outage on February 19, 2026

Release v2.9: OTEL Logs, Database Functions, SNMP Functions and more.

Feb 20, 2026 By Netdata In netdata

What’s New in Netdata v2.9 In this video, we walk through the biggest updates in Netdata v2.9, including: Top Tab Database Functions to analyze slow queries and performance bottlenecks without logging into your database SNMP Network Interfaces Function for real-time visibility into network interfaces Microsoft SQL Server Collector with richer MSSQL metrics OpenTelemetry Logs Ingestion to correlate logs and metrics in one place.

View Video

netdata

Read more about Release v2.9: OTEL Logs, Database Functions, SNMP Functions and more.

Cost Optimization for AI Workloads: From Visibility to Control

Feb 20, 2026 By Teia Jensen In LogicMonitor

ITOps teams can achieve cost management of AI workloads with an observability platform that connects AI usage and performance with cloud spend for clear visibility and predictability. Behind the buzz around artificial intelligence, or AI, many companies are discovering the hidden and compounding costs of AI adoption.

Read Post

LogicMonitor

Read more about Cost Optimization for AI Workloads: From Visibility to Control

How LogicMonitor Delivers AI Cost Optimization

Feb 20, 2026 By Teia Jensen In LogicMonitor

LogicMonitor delivers AI cost optimization by unifying infrastructure telemetry, AI-specific signals, and cloud financial data into a single workflow, so teams can move from visibility to continuous, operationalized cost control. In Cost Optimization for AI Workloads: From Visibility to Control, we explored why AI workloads introduce new layers of cost complexity—from GPU-heavy compute and token-based pricing to distributed infrastructure that obscures true spend.

Read Post

LogicMonitor

Read more about How LogicMonitor Delivers AI Cost Optimization

What feels different about enterprise IT operations today compared to even 3-5 years ago?

Feb 20, 2026 By Virtana In Virtana

Speed isn’t the problem. Speed without shared visibility is. AI compressed release cycles, multiplied dependencies, and pushed accountability to teams who no longer own the full stack. The result? Faster change. Slower resolution. Higher risk. This is why MTTR is moving the wrong way...and why observability has to evolve. : Amit Rathi.

View Video

Virtana

Read more about What feels different about enterprise IT operations today compared to even 3-5 years ago?

The Hidden Operational Risk Financial Institutions Can No Longer Ignore

Feb 20, 2026 By Nexthink In Nexthink

Why digital experience is now a regulatory priority In regulated industries like financial services, even minor technology friction can quickly become a regulatory risk. Gaps in visibility, slow systems, and inconsistent performance can trigger audit findings, SLA breaches, and increased compliance scrutiny.

Read Post

Nexthink

Read more about The Hidden Operational Risk Financial Institutions Can No Longer Ignore

Identify untested code across every level of your codebase

Feb 20, 2026 By Eric Metaj In Datadog

As organizations scale their services and adopt AI-assisted coding, code changes are landing faster and in greater volume than ever before. While this powerful new practice is accelerating the pace of development, it is also increasing the likelihood that untested code may slip into repositories without detection. What makes this problem even worse is that most teams have no reliable way to know which code is covered by tests.

Read Post

Datadog

Read more about Identify untested code across every level of your codebase

Nexthink Workspace - Where DEX Work Happens

Feb 20, 2026 By Nexthink In Nexthink

Workspace is the new space for managing DEX inside the Infinity platform. It brings signals, analysis, guided actions, personalized answers, and chat history into one clean and intuitive full-screen experience. Workspace turns everyday questions into insight and action so teams can investigate faster and make better decisions without complexity or technical query languages. Its enhanced reasoning engine is fully NQL certified, delivering accurate explanations and deeper context across every investigation.

View Video

Nexthink

Read more about Nexthink Workspace - Where DEX Work Happens

Is Your File Integrity Monitoring Outdated? Kubernetes Needs Runtime FIM

Feb 20, 2026 By Sysdig In Sysdig

If your file integrity monitoring (FIM) still relies on scheduled scans… it was built for static servers — not Kubernetes. In cloud-native environments, traditional FIM creates detection delays, wasted CPU, excessive I/O, and alert noise. And if a malicious process modifies a file and exits before the next scan? You might miss it entirely. In this video, we break down: Modern runtime FIM works differently. Instead of scanning everything on a schedule, it.

View Video

Sysdig

Read more about Is Your File Integrity Monitoring Outdated? Kubernetes Needs Runtime FIM

Database Performance Tuning: A Practical Guide for Production Systems

Feb 20, 2026 By Preeti Dewani In Last9

Tune PostgreSQL and MySQL for production with connection pooling, memory configuration, write path optimization, vacuum management, and lock contention fixes. Technical Product Manager at Last9.

Read Post

Last9

Read more about Database Performance Tuning: A Practical Guide for Production Systems

Debugging with Seer: Getting Started & Full End-to-End Demo

Feb 20, 2026 By Sentry In Sentry

In this workshop series, Cody De Arkland will show you how to debug faster with Seer throughout your workflow. The series starts with an overview of all Seer’s features, and how to set everything up. Subsequent workshops will dive deep into using each feature.

View Video

Sentry

Read more about Debugging with Seer: Getting Started & Full End-to-End Demo

Event Intelligence is Replacing Monitoring - Here's Why That Matters

Feb 20, 2026 By Dallon Robinette In Selector

For more than two decades, monitoring has been the foundation of IT operations. Organizations invested heavily in tools designed to collect metrics, visualize performance, and trigger alerts when thresholds were breached. This model was effective in an era when infrastructure was largely static, workloads were predictable, and system dependencies were relatively easy to trace. That environment no longer exists.

Read Post

Selector

Read more about Event Intelligence is Replacing Monitoring - Here's Why That Matters

Sponsored Post

Forwarding Microsoft SCOM Alerts to the Service Desk

Feb 19, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

Modern IT operations rely heavily on monitoring solutions like System Center Operations Manager (SCOM) to detect issues across servers, applications, and services. While SCOM excels at generating alerts, organizations often struggle to ensure these alerts translate into actionable incidents in their IT Service Management (ITSM) platforms. Without proper integration, critical alerts may be missed, tickets may be created manually, and incident resolution can be delayed.

Read Post

NiCE IT Mgmt

Read more about Forwarding Microsoft SCOM Alerts to the Service Desk

Monitor Versa Networks SD-WAN performance in Datadog

Feb 19, 2026 By Angelina Jin In Datadog

Modern enterprises rely on software-defined wide area networks (SD-WANs) to connect branch offices, data centers, and cloud environments. While this flexibility improves resiliency and performance, it also increases operational complexity.

Read Post

Datadog

Read more about Monitor Versa Networks SD-WAN performance in Datadog

Icinga Director v1.11.6 Release

Feb 19, 2026 By Ravi Srinivasa In Icinga

We are happy to announce the release of Icinga Director version 1.11.6. This release addresses several important bug fixes and introduces improvements that enhance the overall stability of Icinga Director.

Read Post

Icinga

Read more about Icinga Director v1.11.6 Release

The Next Era of Observability: Founders' Reflections - Additional Q&A

Feb 19, 2026 By Rox Williams In Honeycomb

What happens when the people who helped define observability take a hard look at AI? That’s what Honeycomb co-founders Christine Yen (CEO) and Charity Majors (CTO) dug into during this webinar, starting with the early days of observability (back when it wasn’t even a category yet).

Read Post

Honeycomb

Read more about The Next Era of Observability: Founders' Reflections - Additional Q&A

Traces Are Not Your Business Logic

Feb 19, 2026 By Mukta Aphale In Last9

Distributed traces track how your system processed a single request — not what your customers did over time. Confusing the two leads to poorly instrumented systems.

Read Post

Last9

Read more about Traces Are Not Your Business Logic

The Current State of Content Negotiation for AI Agents (Feb 2026)

Feb 19, 2026 By Stefan Judis In Checkly

The web was built for humans, but now the agents are taking over. Humans look at a web page and see content rendered by their browser. AI agents see 180,000 tokens of nav bars, footers, and div soup — burning through their context window on junk that makes them slower and stupider. The web needs to evolve, and we as developers are driving the shift. AI agents like Claude Code, Cursor, Codex, and Gemini are how we interact with documentation, CLIs, and products today.

Read Post

Checkly

Read more about The Current State of Content Negotiation for AI Agents (Feb 2026)

SSL/TLS Certificate Lifetimes to Reduce to 47 Days

Feb 19, 2026 By Ramesh Subramaniam In eG Innovations

Last year it was widely reported that the CA/Browser Forum had voted to significantly reduce the lifespan of SSL/TLS certificates over the next 4 years, with a final lifespan of just 47 days starting in 2029. The first reduction will come into action in a few weeks, on March 15th 2026, accelerating the need for organizations to automate their monitoring and renewal processes around certificate expiry.

Read Post

eG Innovations

Read more about SSL/TLS Certificate Lifetimes to Reduce to 47 Days

Make use of guardrail metrics and stop babysitting your releases

Feb 19, 2026 By Anthony Rindone In Datadog

Modern CI/CD pipelines have automated the hard work of building, testing, and deploying our code. But for many teams, that’s where the automation stops. The most critical part of a release, turning a new feature on for real users, is still a stressful, manual process. An engineer cautiously ramps up traffic to 5%, then 10%. The whole team stares at dashboards, trying to see if anything breaks. If something does, they scramble to manually roll back.

Read Post

Datadog

Read more about Make use of guardrail metrics and stop babysitting your releases

Reliability Has Outgrown the Systems Supporting It

Feb 19, 2026 By LogicMonitor In LogicMonitor

Service reliability has outgrown uptime checks and component-level tools, creating friction that slows response, increases toil, and wears teams down. Uptime checks can pass, high availability can be in place, and users still can’t complete basic actions. Pages load slowly, latency spikes, and requests stall — all without a single system flagged as down. Availability measures whether a service is running.

Read Post

LogicMonitor

Read more about Reliability Has Outgrown the Systems Supporting It

Signal-Driven Error Monitoring: Detecting and Debugging Reactive Failures in Angular

Feb 19, 2026 By Sonu Kapoor In AppSignal

Angular's Signal-based reactivity model represents one of the biggest paradigm shifts the framework has seen since Ivy. By replacing the asynchronous push-pull model of RxJS with synchronous, localized updates, Signals make state management both simpler and faster. But this new simplicity hides a subtle danger: when something breaks inside your reactive graph, it often does so silently. A computed value might stop updating. An effect might fire indefinitely.

Read Post

AppSignal

Read more about Signal-Driven Error Monitoring: Detecting and Debugging Reactive Failures in Angular

How to write annotations in Kubernetes with JSON for Datadog Autodiscovery | Datadog Tips & Tricks

Feb 19, 2026 By Datadog In Datadog

Pod annotations in Kubernetes with invalid JSON syntax can prevent Datadog Autodiscovery from detecting integrations, resulting in missing metrics and gaps in monitoring. Watch this video for a step-by-step process to write annotations: Note: This video focuses on Datadog Autodiscovery v2 syntax.

View Video

Datadog

Read more about How to write annotations in Kubernetes with JSON for Datadog Autodiscovery | Datadog Tips & Tricks

React Native SDK 8.0.0 is here

Feb 19, 2026 By Antonis Lilis In Sentry

We just released React Native SDK 8.0.0, here's what's new, and what's changed. It's been a while since the last major version. The last major release, 7.0.0, shipped on September 2, 2025. After 13 minor and 2 patch releases, it's finally time for a new major version to land: 8.0.0. This version is a maintenance and capability major. This means we: It should be straightforward to upgrade, but check the migration guide for your setup.

Read Post

Sentry

Read more about React Native SDK 8.0.0 is here

Bindplane Blueprints for Elasticsearch: Production-Ready NGINX Log Pipelines for Kibana

Feb 19, 2026 By Chelsea Wright In ObservIQ

We've just released new and easy-to-use Bindplane blueprints designed specifically for Elasticsearch as a destination. These blueprints empower teams to quickly transform raw events such as those from NGINX access and error logs into clean, structured, and ECS-compliant data optimized for high-performance visualization in Kibana.

Read Post

ObservIQ

Read more about Bindplane Blueprints for Elasticsearch: Production-Ready NGINX Log Pipelines for Kibana

What is OpenTelemetry and Why Do Organizations Use it?

Feb 19, 2026 By Jeff Darrington In Graylog

Mining for information about environments is like trying to find gold. Looking for gold can be sifting through silty waters or blasting through a mine. In some cases, the gold nuggets are so small as to be almost invisible, some things look like gold but aren’t, and others are larger nuggets where the miner strikes it rich. Trying to understand how a distributed system works means sifting through vast amounts of telemetry, looking for patterns.

Read Post

Graylog

Read more about What is OpenTelemetry and Why Do Organizations Use it?

The 2025 Wake-Up Call for Engineering Teams

Feb 19, 2026 By Spencer Bos In logz.io

For years, organizations tried to solve operational pain by collecting more data, adding more dashboards, and consolidating more tools. But 2025 exposed a deeper mismatch. Systems had become more distributed, AI-assisted, and interdependent than ever before, while teams had shrunk and on-call pressure had intensified. This wasn’t a tooling failure. It was an architectural and cognitive one.

Read Post

logz.io

Read more about The 2025 Wake-Up Call for Engineering Teams

Use AI to turn any JSON API into a dashboard in minutes with the Infinity data source plugin and Grafana Assistant

Feb 19, 2026 By Ivana Huckova In Grafana

The internet is full of fascinating data just waiting to be visualized and queried. And with the latest update to Grafana Cloud, you can start doing it in minutes. Through public APIs, you can access information about global earthquake activity, weather forecasts, music catalogs, and millions of other datasets. And then there's all the data that sits inside company APIs, partner services, and internal platforms that power everyday products and operations.

Read Post

Grafana

Read more about Use AI to turn any JSON API into a dashboard in minutes with the Infinity data source plugin and Grafana Assistant

Nexthink AI Drive

Feb 19, 2026 By Nexthink In Nexthink

By centralizing AI visibility, usage, guidance, and measurement into a single vantage point - AI Drive transforms fragmented AI activity into quantifiable business outcomes.

View Video

Nexthink

Read more about Nexthink AI Drive

Who Watches the Vibe Coder?

Feb 19, 2026 By Todd H. Gardner In TrackJS

AI didn’t replace developers. It replaced the part where you were forced to understand what you just shipped. Now you can prompt your way to a feature, skim the diff, and merge something that “seems reasonable.” And then production does what production always does: finds the one weird browser + one slow network + one user flow that turns your “reasonable” code into a bonfire. So who watches the vibe coder?

Read Post

TrackJS

Read more about Who Watches the Vibe Coder?

The 5 Pillars of DEXOps Explained: Turning Digital Experience into Business Impact

Feb 19, 2026 By Megan Brake In Nexthink

Most IT leaders agree on one thing: digital employee experience matters. What is less clear is how to operationalize it in ways that deliver measurable business outcomes. Many organizations invest in tools and dashboards, launch experience initiatives, and even measure sentiment. But without an operational model that connects employee experience to core business objectives, IT teams often stay stuck in reactive support. DEXOps changes that.

Read Post

Nexthink

Read more about The 5 Pillars of DEXOps Explained: Turning Digital Experience into Business Impact

Kiro Can Now Use Lightrun via MCP

Feb 19, 2026 By Lightrun Team In Lightrun

AI code assistants transformed how software is written. They did not transform how it fails. Today, we’re announcing a new MCP integration between Lightrun and Kiro. Kiro now gains live runtime visibility through the Lightrun MCP, grounding AI-assisted development in how code actually behaves at runtime. Kiro, the AI coding assistant from the teams at AWS, is built for velocity and intuition. It helps teams move from specification to production faster by turning intent into working code.

Read Post

Lightrun

Read more about Kiro Can Now Use Lightrun via MCP

SQL Query Optimization: Techniques That Actually Improve Performance

Feb 19, 2026 By Sahil Khan In Last9

Find and fix slow SQL queries using execution plans, missing index detection, N+1 pattern fixes, and pagination strategies for PostgreSQL and MySQL. Product Marketing Manager.

Read Post

Last9

Read more about SQL Query Optimization: Techniques That Actually Improve Performance

How to Make AI-Generated Code Reliable with Runtime Context

Feb 19, 2026 By Lightrun Team In Lightrun

AI coding assistants like Cursor and Claude Code are driving massive productivity gains, yet they have introduced a critical validation gap in the software delivery lifecycle. While these tools excel at generating syntax, they lack visibility into live production environments. This article explains how Runtime Context, the missing nervous system of AI development, secures production by moving from probabilistic guessing to deterministic, live code validation.

Read Post

Lightrun

Read more about How to Make AI-Generated Code Reliable with Runtime Context

Is your IT team up late responding to alerts?

Feb 19, 2026 By ScienceLogic In ScienceLogic

Is your IT team up at 3am responding to incidents that could have been prevented? You need Skylar Advisor.#IT.

View Video

ScienceLogic

Read more about Is your IT team up late responding to alerts?

Move to ManageEngine Site24x7 to elevate your website monitoring

Feb 18, 2026 By Bela Susan Thomas In Site24x7

Organizations using entry-level tools face limited visibility, slow issue response, and scalability challenges that increase downtime risks. ManageEngine solves this with its enterprise-grade, AI-powered platform, delivering end-to-end digital experience monitoring in cloud and On-Premise versions. Switching isn't only easy, it brings predictive intelligence, global precision, and seamless growth support to your workflows—protecting your revenue while improving your operational excellence.

Read Post

Site24x7

Read more about Move to ManageEngine Site24x7 to elevate your website monitoring

YouTube Outage (Feb 17, 2026). What Happened?

Feb 18, 2026 By Nuno Tomas In isDown

On February 17, 2026, YouTube went down for users worldwide. Starting around 8:00 PM ET, the platform's homepage, Shorts feed, sign-in system, smart TV apps, YouTube Music, and YouTube Kids all stopped working. Over 21,000 reports were logged on IsDown alone. The error message was the same everywhere: "Something went wrong." For consumer users, it was an inconvenience. For businesses that depend on YouTube — content teams, advertisers, media companies, live streamers — it was a blind spot.

Read Post

isDown

Read more about YouTube Outage (Feb 17, 2026). What Happened?

Selector Raises $32 Million to Eliminate Data Silos in Complex Network Operations

Feb 18, 2026 By Selector In Selector

Valuation doubles and annual recurring revenue grows nearly four times, driven by Fortune 1000 adoption of unified observability solutions.

Read Post

Selector

Read more about Selector Raises $32 Million to Eliminate Data Silos in Complex Network Operations

Top 6 Cloud Monitoring Challenges in Hybrid & Multi-Cloud Environments

Feb 18, 2026 By Sofia Burton In LogicMonitor

Hybrid and multi-cloud monitoring breaks down when teams can’t connect signals to customer impact fast enough to act. Hybrid and multi-cloud sound simple: run some workloads in public cloud, keep some on-premises, and connect it all. But in practice, you’re managing dependencies across teams and systems, tools that don’t share context, and incidents that refuse to stay in one place.

Read Post

LogicMonitor

Read more about Top 6 Cloud Monitoring Challenges in Hybrid & Multi-Cloud Environments

Meet Nexthink Spark

Feb 18, 2026 By Nexthink In Nexthink

Spark operates as a highly experienced IT agent, using real-time understanding of the employee’s environment to resolve issues immediately, without tickets or delay. Learn more about Spark here: https://nexthink.com/platform/spark

View Video

Nexthink

Read more about Meet Nexthink Spark

AI Query Assist for SolarWinds SQL Sentry

Feb 18, 2026 By solarwindsinc In SolarWinds

Rewrite inefficient SQL Server queries in seconds—not hours. In this demo, we show you how AI Query Assist in SolarWinds SQL Sentry transforms the way you tune performance. Watch how to take a problematic query from the "Top SQL" view and use generative AI to instantly generate optimized rewrites and uncover missing indexes. What you will see: Instant Optimization: How to automate query rewriting and get plain-language explanations of the logic changes.

View Video

SolarWinds

Read more about AI Query Assist for SolarWinds SQL Sentry

Understanding Namespaces in Icinga 2 DSL

Feb 18, 2026 By Yonas Habteab In Icinga

Last time, we explored the concept of variable scopes in Icinga 2, which help you manage and organize your DSL configurations effectively. As promised, today we’ll dive into another, how shall I say, advanced topic: Namespaces in Icinga 2.

Read Post

Icinga

Read more about Understanding Namespaces in Icinga 2 DSL

Designing Alerts for Action

Feb 18, 2026 By James Barnes In StatusCake

In the first two posts of this series, we explored how alert noise emerges from design decisions, and why notification lists fail to create accountability when responsibility is unclear. There’s a deeper issue underneath both of those problems. Many alerting systems are designed without being clear about the outcome they’re meant to produce. When teams don’t explicitly decide what they want to happen as a result of a signal, they default to the loudest option available.

Read Post

StatusCake

Read more about Designing Alerts for Action

Unlimited Team Sizes for All

Feb 18, 2026 By Pēteris Caune In Healthchecks

Starting from today, Healthchecks.io users on all plans (Hobbyist, Supporter, Business, Business Plus) can invite an unlimited number of users into their projects. Previously, the limits were: 3 team members for Hobbyist and Supporter, 10 team members for Business, and unlimited team members for Business Plus. From now on, it is unlimited for all.

Read Post

Healthchecks

Read more about Unlimited Team Sizes for All

Database Indexing: How It Works, Types, and When to Use It

Feb 18, 2026 By Faiz Shaikh In Last9

How database indexes work, when to use B-tree vs hash indexes, clustered vs non-clustered indexes, and how to tell if your indexes are actually helping.

Read Post

Last9

Read more about Database Indexing: How It Works, Types, and When to Use It

Improve performance and reliability with APM Recommendations

Feb 18, 2026 By Anthony Lagana In Datadog

SREs and application developers rely on telemetry data to understand and improve their systems. As organizations scale and evolve, those systems generate an ever-growing volume of metrics, logs, and traces. But more data alone does not make it easier to improve performance or reliability: Identifying meaningful optimizations still requires careful investigation and analysis.

Read Post

Datadog

Read more about Improve performance and reliability with APM Recommendations

Turn Raw Data into Reliability by Changing Performance Perspectives

Feb 18, 2026 By Jonny Steiner In Coralogix

In a global microservices architecture, technical performance initially presents as a chaotic stream of disconnected telemetry. For a Technical Program Manager (TPM), success depends on the ability to move past these disconnected individual data points to identify stable patterns. If they have services entering critical states, looking at individual logs or traces is inefficient. Protecting system reliability requires an engine that can automate pattern recognition at scale.

Read Post

Coralogix

Read more about Turn Raw Data into Reliability by Changing Performance Perspectives

Introducing: Checkly Agent Skills

Feb 18, 2026 By Stefan Judis In Checkly

AI coding agents are excellent at writing code. Ask Claude Code, Codex, or Cursor to add a feature, and it just works. At Checkly, we were ready for the new agentic world from the start! Monitoring as Code means your entire monitoring setup lives in your repository. API Checks, Browser Checks, alert channels, status pages; everything is defined in code, managed with the Checkly CLI, and version-controlled like any other part of your stack.

Read Post

Checkly

Read more about Introducing: Checkly Agent Skills

DNS Check Overview

Feb 18, 2026 By Uptime Website Monitoring In uptime

Learn how to set up and monitor DNS server checks with Uptime.com, test DNS record types like A, AAAA, MX, and more, and get alerts when DNS resolution issues occur.

View Video

uptime

Read more about DNS Check Overview

API update: You can now delete any monitor type

Feb 17, 2026 By Valeria Kurolapova In StatusGator

We’ve made another improvement to the StatusGator API based directly on customer feedback. Many teams are managing monitors programmatically, and one recurring request was the ability to fully manage monitor cleanup via API. You can now delete all monitor types through the API.

Read Post

StatusGator

Read more about API update: You can now delete any monitor type

How a Singleton Pattern Broke Our Django Logging

Feb 17, 2026 By Quinn Milionis In Scout

With modern tooling and agentic coding assistants, straightforward bugs are almost a relief. If a test can catch it, or a user can reproduce it, chances are you can squash it quickly. The harder category — and the one worth writing about — are the bugs where everything looks correct. Your code runs, no exceptions are thrown, your debug statements confirm the right functions fire at the right times, and yet nothing works.

Read Post

Scout

Read more about How a Singleton Pattern Broke Our Django Logging

How to Monitor and Fix Critical User Experiences

Feb 17, 2026 By Sentry In Sentry

Try Sentry for free: https://sentry.io
Docs: https://docs.sentry.io

View Video

Sentry

Monitoring

Read more about How to Monitor and Fix Critical User Experiences

Unlocking business resilience with full-stack observability in hybrid IT environments

Feb 17, 2026 By Jonathan Tullett In Elastic

For CIOs and technology leaders across the Gulf Cooperation Council (GCC), full-stack observability is a strategic lever for achieving faster ROI, operational resilience, and digital maturity. By integrating AI-powered insights and automation, IT leaders can streamline operations and align technology outcomes with business goals. Demonstrating ROI within tight timelines is critical, as is leveraging observability to maintain competitive advantage in a rapidly evolving market.

Read Post

Elastic

Read more about Unlocking business resilience with full-stack observability in hybrid IT environments

OpenTelemetry support for .NET 10: A behind-the-scenes look

Feb 17, 2026 By Martin Costello In Grafana

At Grafana Labs, we are fully committed to the open source OpenTelemetry project and are actively engaged with the OTel community. Many Grafanistas spend a large proportion of their time contributing directly to OpenTelemetry upstream projects, helping make observability more powerful, reliable, and accessible for everyone as part of our big tent philosophy.

Read Post

Grafana

Read more about OpenTelemetry support for .NET 10: A behind-the-scenes look

Teaching AI How to Refinery

Feb 17, 2026 By Tyler Helmuth In Honeycomb

At the beginning of February, we released v3.1 of Refinery, our advanced, tail-based sampling solution. The new version comes with more performance enhancements, bug fixes, and a few new pieces of telemetry. In tandem with the 3.1 release, we also released a new tool for our MCP server which helps your AIs understand Refinery, and how Honeycomb handles sampling.

Read Post

Honeycomb

Read more about Teaching AI How to Refinery

The New Standard for Operational Decision-Making: Why Trustworthy Guidance Matters More Than Ever

Feb 17, 2026 By ScienceLogic In ScienceLogic

Modern IT operations sit at the center of revenue, customer experience, and business continuity. Every decision engineers make influences far more than the technical domain, which is why teams need intelligence they can validate, reasoning they can understand, and guidance they can rely on. In an environment shaped by rapid change and expanding dependencies, decisions must be grounded in accuracy and context to avoid unnecessary risk.

Read Post

ScienceLogic

Read more about The New Standard for Operational Decision-Making: Why Trustworthy Guidance Matters More Than Ever

From random chunks to real code - wiring up Next.js source maps in Sentry

Feb 17, 2026 By Sergiy Dybskiy In Sentry

When you ship a Next.js app, the React and TypeScript you write aren’t what your users actually download. Next.js compiles, minifies, splits, and shuffles your code into chunks in ways that are great for performance and terrible for debugging. This post shows you how that pipeline works, how source maps and debug IDs connect it all back to your original code, and how to wire things up so Sentry shows you real file names and line numbers instead of an unreadable stack trace.

Read Post

Sentry

Read more about From random chunks to real code - wiring up Next.js source maps in Sentry

What is the Model Context Protocol (MCP)

Feb 17, 2026 By Jeff Darrington In Graylog

The Iron Man’s J.A.R.V.I.S. is the artificial intelligence (AI) that almost every person wants to see. A conversational technology that answers questions like a friend would. The rise of large language models (LLMs) almost seems to give people the friendly robotic sidekick that generations of children grew up dreaming about.

Read Post

Graylog

Read more about What is the Model Context Protocol (MCP)

Already Love Scout APM? We Have Integrated Error Monitoring!

Feb 17, 2026 By Sarah Morgan In Scout

The error monitoring scene has changed a ton over the past few years. We've gone from basic exception tracking to fully integrated platforms that correlate errors with performance metrics and logs. We’ve even got AI-powered debugging! But in the midst of the AI explosion, some things remain unchanged and most teams are still drowning in data with little actionability.

Read Post

Scout

Read more about Already Love Scout APM? We Have Integrated Error Monitoring!

Productivity in the Age of AI - DEXOps 1:1 with Scott Pope

Feb 17, 2026 By Nexthink In Nexthink

In the first of a new rotating expert series, Scott Pope (Nexthink's Director of Value Advisory) joins to explore DEXOps, productivity, and why DEX has firmly entered the boardroom conversation. We talk about how the market has evolved, what AI is really changing, how to communicate value to senior leaders, and the story behind the DEX Productivity Report. Also: Arsenal. Briefly. And yes, Tom still needs to update the show music. Hang in there.

View Video

Nexthink

Read more about Productivity in the Age of AI - DEXOps 1:1 with Scott Pope

OpenTelemetry Production Monitoring: What Breaks, and How to Prevent It

Feb 17, 2026 By Sematext In Sematext

OpenTelemetry almost always works beautifully in staging, demos, and videos. You enable auto-instrumentation, spans appear, metrics flow, the collector starts, and dashboards light up. Everything looks clean and predictable. However, production has a way of humbling even the most carefully prepared setups. When real traffic hits, and it always spikes sooner or later, you start seeing dropped spans.

Read Post

Sematext

Read more about OpenTelemetry Production Monitoring: What Breaks, and How to Prevent It

New Persistent Queue Options - Pebble Storage Extension

Feb 17, 2026 By Bindplane In ObservIQ

Bindplane now supports the Pebble storage extension for the OpenTelemetry Collector persistent queue. Faster recovery. Durable buffering. More control over telemetry under load. Own your pipeline.

View Video

ObservIQ

Read more about New Persistent Queue Options - Pebble Storage Extension

Watch 10K Telemetry Events Drain Instantly #opentelemetry #observability

Feb 17, 2026 By Bindplane In ObservIQ

We loaded 10,000 messages into a persistent queue. Then watched it vanish. This is what the Pebble persistent queue storage extension looks like inside a Bindplane-managed OpenTelemetry pipeline. Resilience under pressure.

View Video

ObservIQ

Read more about Watch 10K Telemetry Events Drain Instantly #opentelemetry #observability

Bindplane | Blueprints for ClickHouse: Optimize Telemetry Before It Hits ClickStack

Feb 17, 2026 By Bindplane In ObservIQ

Chelsea from the Customer Success team walks through the Bindplane Blueprints for ClickHouse guide — showing how to optimize logs, metrics, and traces before they land in ClickStack. You’ll see how to: ClickHouse is powerful. But raw telemetry at scale gets expensive fast. Bindplane acts as the control plane for your OpenTelemetry infrastructure. Blueprints let you apply production-ready processing logic instantly without YAML sprawl or config drift.

View Video

ObservIQ

Read more about Bindplane | Blueprints for ClickHouse: Optimize Telemetry Before It Hits ClickStack

Microsoft Entra ID secrets and certificates: One of the most preventable causes of enterprise application failures

Feb 16, 2026 By Geoffrin Edwin In Site24x7

All it takes to make critical applications to fail, customer portals to crash, and render internal systems inaccessible is just one expired client secret. Not a sophisticated cyberattack. Not a worldwide cloud service outage. Just a single credential that quietly expired while everyone focused on "more important" things. Is secret expiry that big of a concern? Chances are great that enterprise-scale organizations have at least one expired credential in production right now.

Read Post

Site24x7

Read more about Microsoft Entra ID secrets and certificates: One of the most preventable causes of enterprise application failures

Introducing "Explain Flame Graph": Stop Fighting Fires and Start Explaining Them

Feb 16, 2026 By Jonny Steiner In Coralogix

In a modern observability deployment, it’s simple to get data that helps you understand where your system is failing. However, when we try to understand why, the answer is often buried beneath a mound of stack traces. For many developers, attempting to interpret a flame graph by manually calculating self-time (the resources consumed by the function itself) versus child-frame latency (the time spent waiting on called sub-functions) is both confusing and time-consuming.

Read Post

Coralogix

Read more about Introducing "Explain Flame Graph": Stop Fighting Fires and Start Explaining Them

3 Best Tools to Check DNS Records of Domains

Feb 16, 2026 By Super Monitoring In Super Monitoring

DNS records are instructions that tell the internet how to handle your domain. They store details like your website’s IP address, email servers, and security settings. When someone visits your site or sends you an email, DNS records guide the request to the right server. Without correct DNS records, websites can break, and emails can fail. Many tools let you check DNS records, but not all provide clear, reliable results. Some tools show only basic records, while others provide deep insights.

Read Post

Super Monitoring

Read more about 3 Best Tools to Check DNS Records of Domains

16 new integrations - powered by AIready Low Code Plugins

Feb 16, 2026 By Blog In Squared Up

Today marks a big milestone in our mission to bring more data, more context, and more visibility into a single, unified view. We’re excited to announce 16 brand‑new integrations, extending the range of data sources you can connect with just a few clicks. But the integrations themselves are only half the story.

Read Post

Squared Up

Read more about 16 new integrations - powered by AIready Low Code Plugins

Healthchecks and Cron Jobs on Status Pages

Feb 16, 2026 By Leo Baecker In Hyperping

You can now add healthcheck and cron job monitors directly to your status pages. Until now, status pages only supported HTTP monitors and browser checks. You can now display the status of your background jobs, scheduled tasks, and internal services right next to your existing monitors. Head to your status page settings to add healthchecks to your sections. Questions? Reach out via in-app chat or email us at hello@hyperping.io.

Read Post

Hyperping

Read more about Healthchecks and Cron Jobs on Status Pages

Troubleshooting Microservices with OpenTelemetry Distributed Tracing

Feb 15, 2026 By Sematext In Sematext

Distributed tracing doesn’t just show you what happened. It shows you why things broke. While logs tell you a service returned a 500 error and metrics show latency spiked, only traces reveal the full chain of causation: the upstream timeout that triggered a retry storm, the N+1 query pattern that saturated your connection pool, or the missing cache hit that turned a 50ms call into a 3-second database roundtrip.

Read Post

Sematext

Read more about Troubleshooting Microservices with OpenTelemetry Distributed Tracing

Making our docs AI-friendly: a tale of two caches

Feb 15, 2026 By Mattias Geniar In Oh Dear

Our documentation, FAQ, and blog posts can now be served as clean markdown to AI agents. Send Accept: text/markdown or append.md to the URL, and you get structured content instead of a full HTML page. It worked great in development. Then we deployed, and two separate caching layers broke everything.

Read Post

Oh Dear

Read more about Making our docs AI-friendly: a tale of two caches

Amazon Web Services outage - February 10, 2026

Feb 13, 2026 By Andy Libby In StatusGator

On February 10, 2026, Amazon Web Services (AWS) experienced an outage that triggered widespread reports of CloudFront failures and DNS resolution issues. While AWS later acknowledged the incident, StatusGator detected the disruption earlier using Early Warning Signals, giving customers valuable lead time before the provider confirmed anything publicly.

Read Post

StatusGator

Read more about Amazon Web Services outage - February 10, 2026

Improved monitor filters

Feb 13, 2026 By Valeria Kurolapova In StatusGator

We shipped a small improvement to monitor filtering to make it easier to stay organized — especially if you’re managing a board with a lot of monitors.

Read Post

StatusGator

Read more about Improved monitor filters

Network Heroes: The Phantom Malware

Feb 13, 2026 By ITOM In ManageEngine

Join our network heroes for a high-stakes chase inside modern networks where threats hide in plain sight, moves laterally, triggers no alerts, and exposing why visibility matters.

Read Post

ManageEngine

Read more about Network Heroes: The Phantom Malware

Accelerate incident resolution with Applications Manager's AI alert summary

Feb 13, 2026 By Applications Manager In ManageEngine

Leverage AI to understand critical incidents in your IT infrastructure! With Applications Manager's AI driven alarm summaries, understand incident cascades to dig down to root cause of performance issues. Reduce cognitive labour and unlock actionable intelligence with the latest version.

Read Post

ManageEngine

Read more about Accelerate incident resolution with Applications Manager's AI alert summary

Digital experience monitoring in 2026: Don't let hidden issues drive your customers away!

Feb 13, 2026 By Bela Susan Thomas In Site24x7

Slow loading times, frustrated clicks, and silent crashes can cause users to leave your app in just three seconds.

Read Post

Site24x7

Read more about Digital experience monitoring in 2026: Don't let hidden issues drive your customers away!

Sovereign observability: How UAE data residency powers resilient digital economies

Feb 13, 2026 By Ramkumar Ramaswamy In Site24x7

Cloud observability is a must for IT teams operating in modern digital economies. It allows administrators to see inside complex systems, understand how each component behaves under real conditions, and act before users or regulators feel the impact. In simple terms, observability transforms digital infrastructure from a black box into a transparent, accountable, and resilient system.

Read Post

Site24x7

Read more about Sovereign observability: How UAE data residency powers resilient digital economies

OpenTelemetry Deep Dive: Standards, Tracing, and the Future of Observability | Big Tent S3E6

Feb 13, 2026 By Grafana In Grafana

OpenTelemetry co-founder Ted Young joins Grafana’s Big Tent podcast to explain how observability evolved beyond logs, metrics, and traces. Learn why tracing is just logging with context, how OpenTelemetry became a standard, and what’s next for zero-touch instrumentation and AI-driven observability.

View Video

Grafana

Read more about OpenTelemetry Deep Dive: Standards, Tracing, and the Future of Observability | Big Tent S3E6

Complexity to Clarity: Why Enterprises Are Choosing Progress WhatsUp Gold Over SolarWinds

Feb 13, 2026 By Progress WhatsUp Gold In WhatsUp Gold

A clearer path for enterprise administrators with simpler deployment, unified visibility and predictable costs. If SolarWinds feels too complex, costly or rigid to scale, it might be time to look for a new network monitoring solution. In this session, Progress WhatsUp Gold experts help enterprise admins achieve unified visibility and faster time to value without module or architecture sprawl, using transparent device-based licensing.

View Video

WhatsUp Gold

Read more about Complexity to Clarity: Why Enterprises Are Choosing Progress WhatsUp Gold Over SolarWinds

InfluxDB 3 Core vs. Enterprise

Feb 13, 2026 By InfluxData In InfluxData

In this video, Senior Developer Advocate Cole Bowden walks you through the key similarities and differences that exist in InfluxDB 3 Core and InfluxDB 3 Enterprise. As an open source offering, Core thrives at data collection on the edge and providing real-time insights into fresh data, while Enterprise includes support, compaction for performant historical analysis over wide windows, better scaling and security for enterprise-scale operations.

View Video

InfluxData

Read more about InfluxDB 3 Core vs. Enterprise

NIS2 and CER Serve a Broader Purpose Than Cybersecurity - The 5 Biggest Risks You Need to Address Now

Feb 13, 2026 By Erik van Veenendaal In eG Innovations

The European directives NIS2 (Network and Information Security Directive 2) and Critical Entities Resilience (CER) Directive have rapidly sharpened the conversation around digital resilience. While many organizations initially viewed these directives as an extension of their cybersecurity obligations, it is becoming increasingly clear that much more is at stake. These directives require a strategic transformation in how organizations manage risks, processes, and responsibilities.

Read Post

eG Innovations

Read more about NIS2 and CER Serve a Broader Purpose Than Cybersecurity - The 5 Biggest Risks You Need to Address Now

The evolution of OpenTelemetry: A deep dive with co-founder Ted Young

Feb 13, 2026 By Grafana Labs Team In Grafana

Sometimes the biggest challenges in software aren’t about code — they’re about consensus. What do we call things? What do we standardize? And how do you evolve a system that thousands of companies depend on without breaking everything along the way?

Read Post

Grafana

Read more about The evolution of OpenTelemetry: A deep dive with co-founder Ted Young

AI-driven caching strategies and instrumentation

Feb 13, 2026 By Lazar Nikolov In Sentry

The things that separate a minimum viable product (MVP) from a production-ready app are polish, final touches, and the Pareto 'last 20%' of work. Most bugs, edge cases, and performance issues won't show up until after launch, when real users start hammering your application. If you're reading this, you're probably at the 80% mark, ready to tackle the rest.

Read Post

Sentry

Read more about AI-driven caching strategies and instrumentation

AI Is Everywhere, So Why Isn't It Delivering Business Value?

Feb 13, 2026 By Chanté Frazer In Nexthink

Enterprises have never had more access to artificial intelligence and less certainty about what it is delivering. Generative AI tools now sit inside everyday workflows, embedded across productivity software and operational systems employees rely on for critical work. They generate insight at scale, reveal patterns more clearly than before, and offer earlier visibility into potential risk.

Read Post

Nexthink

Read more about AI Is Everywhere, So Why Isn't It Delivering Business Value?

The Fragmentation Tax: What Multi-Tool Incident Response is Really Costing You

Feb 13, 2026 By Dallon Robinette In Selector

Here’s a question that sounds simple but isn’t: When something breaks in your environment, how long does it take your team to agree on what they’re looking at? Not how long it takes to fix it—that’s a different problem. I mean: how long does it take for everyone on the bridge to have the same basic understanding of what’s broken, where it started, and what it’s affecting?

Read Post

Selector

Read more about The Fragmentation Tax: What Multi-Tool Incident Response is Really Costing You

What Is Web Transaction Monitoring?

Feb 13, 2026 By Dotcom-Monitor In Dotcom-Monitor

Quick Answer: Web transaction monitoring is a type of synthetic monitoring that uses scripted browser tests to simulate and validate multi-step user workflows, such as logins or checkouts. It proactively checks application functionality and performance from end-to-end, ensuring critical user journeys work correctly before customers are impacted.

Read Post

Dotcom-Monitor

Read more about What Is Web Transaction Monitoring?

Honeybadger supports SSL certificate expiration monitoring

Feb 13, 2026 By Ben Findley In Honeybadger

When you have a lot of websites, SSL certificate expiration monitoring can be a lot of work, especially without using a certificate authority such as Let's Encrypt. The last thing you want is an outage because a random SSL certificate wasn't set to auto-renew and expired! Honeybadger has your back! That's why we added SSL certificate warnings to our existing uptime monitoring feature.

Read Post

Honeybadger

Read more about Honeybadger supports SSL certificate expiration monitoring

8 Steps Companies Can Take To Strengthen Business Premises Security

Feb 13, 2026 By OpsMatters In OpsMatters

Improving the safety of your business premises is a continuous process. New threats appear every year, and physical vulnerabilities can put your team and your assets at risk. Taking a proactive approach helps you stay ahead of potential intruders.

Read Post

OpsMatters

Read more about 8 Steps Companies Can Take To Strengthen Business Premises Security

Top tips to organize your digital workspace

Feb 12, 2026 By Monideepa Mrinal Roy In ManageEngine

Top tips is a weekly column where we highlight what’s trending in the tech world and list ways to explore these trends. This week, we’re tackling a growing challenge for modern professionals: organizing digital workspaces in an era where files, apps, and notifications constantly compete for attention. As work becomes increasingly cloud-based and collaborative, a cluttered digital environment can slow teams down, create confusion, and impact productivity. The good news?

Read Post

ManageEngine

Read more about Top tips to organize your digital workspace

Syslog Checks: How to find Insights in the Data Flood

Feb 12, 2026 By Geoffrin Edwin In Site24x7

Every SysAdmin knows the feeling. They are swimming in logs—terabytes of them. Every daemon, service, and kernel subsystem religiously writing their activities to syslog. The data exists. The signals are there. Yet, somehow, incidents still are still unpredictable. How is this even possible? Here's why this happens: Traditional syslog infrastructure was designed for storage and retrieval, not detection and response.

Read Post

Site24x7

Read more about Syslog Checks: How to find Insights in the Data Flood

Claude outage - February 10, 2026

Feb 12, 2026 By Colin Bartlett In StatusGator

On February 10, 2026, Claude users around the world began reporting service failures affecting chat sessions, API integrations, and Claude Code workflows. The first verified outage report reached StatusGator at 19:33 UTC. StatusGator issued an Early Warning Signal at 20:24 UTC. Claude did not post an official “Investigating” update until 22:11 UTC. This incident clearly demonstrates the gap between real user impact and official status page updates.

Read Post

StatusGator

Read more about Claude outage - February 10, 2026

Uptrace Errors & Logs Tutorial: Capture Stacktraces, Context, and Traces in One Place

Feb 12, 2026 By Uptrace In Uptrace

Every error tells a story — and Uptrace helps you see the full picture. In this tutorial, you’ll learn how to use Uptrace to capture errors, logs, stacktraces, and request context in a single observability platform. See how errors automatically link to traces, understand exactly what happened, and debug issues faster with rich attributes, user data, and performance impact. What you’ll learn: Understand not just *what broke*, but *who it affected and why* — and fix problems with confidence using Uptrace.

View Video

Uptrace

Read more about Uptrace Errors & Logs Tutorial: Capture Stacktraces, Context, and Traces in One Place

Uptrace Tutorial: Dashboards, Percentiles, Heatmaps & OpenTelemetry Metrics

Feb 12, 2026 By Uptrace In Uptrace

Learn how to use *Uptrace* to measure what truly matters in your applications using percentiles, heatmaps, and histograms—then turn that data into dashboards that answer questions before they’re even asked. In this tutorial, you’ll discover how to: Whether you’re setting up observability for the first time or replacing expensive monitoring tools, this guide shows how Uptrace helps you understand performance, reliability, and user experience — all in one place.

View Video

Uptrace

Read more about Uptrace Tutorial: Dashboards, Percentiles, Heatmaps & OpenTelemetry Metrics

End-to-End Tracing with Uptrace: Follow Any Request Across Your Entire System

Feb 12, 2026 By Uptrace In Uptrace

Stop guessing where requests slow down. With Uptrace, you can follow any request across your entire system and instantly see performance bottlenecks, errors, and latency sources. This video covers: Build real observability, not just dashboards.

View Video

Uptrace

Read more about End-to-End Tracing with Uptrace: Follow Any Request Across Your Entire System

Uptrace Alerts in 10 Minutes: Metrics, Errors, Slack & Telegram

Feb 12, 2026 By Uptrace In Uptrace

Learn how to monitor application metrics, track errors, and configure real-time alert notifications in Uptrace. In this step-by-step tutorial, you will: Perfect for developers, DevOps engineers, and teams looking for simple, powerful observability.

View Video

Uptrace

Read more about Uptrace Alerts in 10 Minutes: Metrics, Errors, Slack & Telegram

Using WhatsUp Gold with Antivirus Software

Feb 12, 2026 By Progress WhatsUp Gold In WhatsUp Gold

This video explains how antivirus software can interfere with WhatsUp Gold performance and gives recommended exclusions and settings to ensure both products work together reliably.

View Video

WhatsUp Gold

Read more about Using WhatsUp Gold with Antivirus Software

How to Prepare Your Network for RTO (Return-to-Office Mandates)

Feb 12, 2026 By Andrii Kernitskyi In Obkio

IT teams are being held hostage in the return-to-office debate. They didn't even get a seat at the table. And if you're not at the table, you're on the menu. The job market has cooled dramatically. Canada's unemployment rate hit 7.1% in August 2025, which is the highest since May 2016, excluding pandemic years. Employers noticed. And the RTO mandates started rolling out fast: The debate is heating up. Employees don't want to give up remote work. Executives want people in the office seats.

Read Post

Obkio

Read more about How to Prepare Your Network for RTO (Return-to-Office Mandates)

Splunk Attack Range v5 Demo

Feb 12, 2026 By Splunk In Splunk

The Splunk Attack Range is an open source project that lets security teams spin up instrumented cloud environments, simulate adversary behavior, and use the generated telemetry to build and test detections in Splunk. Whether you are a detection engineer tuning rules, a purple team validating coverage, or a developer automating tests, Attack Range gives you a repeatable, cloud-based lab. This post highlights what Attack Range does, how it works, and how to get started - whether you prefer a web UI, a REST API, or the command line.

View Video

Splunk

Read more about Splunk Attack Range v5 Demo

Will humans be replaced by AI? The truth

Feb 12, 2026 By Elastic In Elastic

Agentic AI doesn’t replace analysts, it augments them. The real value comes from making teams more efficient, not smaller. This is the perspective most people miss. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

View Video

Elastic

Read more about Will humans be replaced by AI? The truth

Understanding Lighthouse: Speed Index

Feb 12, 2026 By Todd H. Gardner In Request Metrics

You run Lighthouse and it tells you your Speed Index is bad. But the page looks like it loads fine. You see stuff on screen early. So why is Lighthouse acting like your site is a sloth? Speed Index is a “how fast does this page visually fill in” metric. Not “when did the first pixel show up” (that’s FCP) and not “when did the main content show up” (That’s LCP). It’s the whole above-the-fold loading experience, averaged over time.

Read Post

Request Metrics

Read more about Understanding Lighthouse: Speed Index

Dashboarding Azure: SquaredUp vs Grafana

Feb 12, 2026 By Blog In Squared Up

If you’re looking for a dashboarding solution today, chances are you’ve looked at Grafana or SquaredUp — or both. Grafana is a popular open source dashboarding tool with on-prem and cloud variants, while SquaredUp is the SaaS, cloud-based unified dashboarding solution. Both offer a comprehensive list of data sources that they can plug into and build dashboards. As such, they both also offer an integration with Azure - which is the focus of our discussion today.

Read Post

Squared Up

Read more about Dashboarding Azure: SquaredUp vs Grafana

From Legacy Data Historians to a Modern, Open Industrial Data Stack

Feb 12, 2026 By Suyash Joshi In InfluxData

We recently sat down with founder and principal consultant at recultiv8, Coenraad Pretorius, who drew on his years of data engineering experience in the manufacturing and energy sectors to share key industrial IoT insights.

Read Post

InfluxData

Read more about From Legacy Data Historians to a Modern, Open Industrial Data Stack

How to Migrate an Icinga 2 Master in a High Availability Setup

Feb 12, 2026 By Blerim Sheqa In Icinga

Moving an Icinga 2 master to a new machine requires careful preparation, especially in a master-to-master high availability setup. In production environments, such migrations are often part of broader infrastructure changes, platform standardization, or long-term monitoring strategy decisions. This guide walks you through the process step by step, ensuring a smooth migration without service interruption while keeping your monitoring platform stable and consistent across the environment.

Read Post

Icinga

Read more about How to Migrate an Icinga 2 Master in a High Availability Setup

AI observability: The backbone of mission resilience in the public sector

Feb 12, 2026 By Leah McEwen In Elastic

Downtime cost the public sector $193 million last year — and the financial hit is only the beginning. Beyond the numbers, downtime in the public sector can also lead to severe consequences for citizens: interrupted access to critical online services, delayed benefits, and stalled emergency response. When citizens cannot rely on government services, downtime becomes more than an inconvenience; it becomes a matter of trust. More than uptime, resilience is the new success metric for modern government.

Read Post

Elastic

Read more about AI observability: The backbone of mission resilience in the public sector

Troubleshooting & RCA with Olly

Feb 12, 2026 By Lily Waldorf In Coralogix

If troubleshooting still feels harder than it should, check on these two numbers: how many dashboards you have, and how many alerts fire every day. For most teams, it’s hundreds of dashboards and thousands of alerts, a sign of maturity, coverage, and good intentions. On the other hand, we also see that when something actually breaks, that coverage rarely turns into clarity fast enough.

Read Post

Coralogix

Read more about Troubleshooting & RCA with Olly

Monitor Fortinet FortiManager performance in Datadog

Feb 12, 2026 By Abhinav Modugula In Datadog

As enterprises scale, teams often find it harder to identify user-reported issues. Software-defined wide area networks (SD-WANs) can make it easier to add branch offices, but they can also make it more challenging to distinguish connectivity degradation from changes in application behavior. FortiManager provides a centralized control plane for Fortinet Secure SD-WAN and reduces operational complexity.

Read Post

Datadog

Read more about Monitor Fortinet FortiManager performance in Datadog

Sponsored Post

From cloud costs to cloud value: The role of performance analytics in increasing ROI

Feb 11, 2026 By Site24x7 In Site24x7

Many cloud providers offer services that scale with usage. However, unanticipated overutilization of compute instances, serverless functions, or managed databases can quickly drive up costs. Managing these resources effectively is crucial for keeping cloud spending predictable.

Read Post

Site24x7

Read more about From cloud costs to cloud value: The role of performance analytics in increasing ROI

AWS CloudFront Outage (Feb 2026): Timeline, Cascade, and Lessons

Feb 11, 2026 By Nuno Tomas In isDown

At approximately 9:15 PM UTC on February 10, 2026, Amazon CloudFront began returning NXDOMAIN responses for DNS queries against specific distributions. In practical terms: DNS was telling users that services behind those distributions simply didn't exist. The root cause was a DNS resolution failure within CloudFront's infrastructure that quickly spread to eight interconnected AWS services.

Read Post

isDown

Read more about AWS CloudFront Outage (Feb 2026): Timeline, Cascade, and Lessons

Why Monitoring Matters for Modern Hosting Platforms

Feb 11, 2026 By Connor James In AppSignal

With all the discussion in the dev community lately about changes made at Heroku, we wanted to use this moment to talk about PaaS (Platform as a Service) providers and how AppSignal can be a vital tool to ensure you're using your app's hosts for everything from optimal performance to lower usage bills.

Read Post

AppSignal

Read more about Why Monitoring Matters for Modern Hosting Platforms

Improve test coverage across codebases with Datadog Code Coverage

Feb 11, 2026 By Eric Metaj In Datadog

As codebases grow across many different services, it becomes harder to see what test suites actually cover. AI-assisted development and faster release cycles increase the volume of changes landing in repositories, raising the risk that untested code will make it through to production. To maintain a high standard, teams need clear and scalable visibility across repositories, consistent testing standards, and a way to catch blind spots before they reach users.

Read Post

Datadog

Read more about Improve test coverage across codebases with Datadog Code Coverage

Move fast, don't break things: Consistent testing standards at scale

Feb 11, 2026 By Eric Metaj In Datadog

Moving quickly is essential for modern engineering teams, but speed without guardrails can introduce hidden risks in testing. As organizations scale, teams often define and apply coverage standards inconsistently across services and repositories. What qualifies as “acceptable coverage” in one project may be completely different in another. Without automated enforcement, untested code can slip through reviews.

Read Post

Datadog

Read more about Move fast, don't break things: Consistent testing standards at scale

A Step-by-Step Look at how Agentic, Autonomous ITOps Resolves Incidents

Feb 11, 2026 By Margo Poda In LogicMonitor

Agentic, autonomous ITOps improves incident response by carrying context from detection through resolution, reducing noise, delay, and manual coordination. Most IT incidents don’t fail due to missing data. Monitoring systems generate more than enough signals. The problem is that understanding those signals—and deciding what to do with them—happens in fragments. Engineers move between dashboards, logs, tickets, and chat threads, stitching together context by hand.

Read Post

LogicMonitor

Read more about A Step-by-Step Look at how Agentic, Autonomous ITOps Resolves Incidents

The Architecture Shift Powering Network Observability

Feb 11, 2026 By Idan Green In Broadcom

If you work in network operations, you know that the only constant is the increasing complexity of the infrastructure you manage. The days of installing a monolithic software package on a single bare-metal server and letting it hum along for years are largely behind you. The software industry has largely shifted toward cloud-native architectures, microservices, and containerization. While these shifts promise agility and scalability, they also introduce significant operational complexity.

Read Post

Broadcom

Read more about The Architecture Shift Powering Network Observability

OpenTelemetry in Production: Design for Order, High Signal, Low Noise, and Survival

Feb 11, 2026 By Sematext In Sematext

A lot of talk around OpenTelemetry has to do with instrumentation, especially auto-instrumentation, about OTel being vendor neutral, being open and a defacto standard. But how you use the final output of OTel is what makes business difference. In other words, how do you use it to make your life as an SRE/DevOps/biz person easier? How do you have to set things up to truly solve production issues faster?

Read Post

Sematext

Read more about OpenTelemetry in Production: Design for Order, High Signal, Low Noise, and Survival

A Notification List Is Not a Team

Feb 11, 2026 By James Barnes In StatusCake

In the previous post, we looked at how alert noise is rarely accidental. It’s usually the result of sensible decisions layered over time, until responsibility becomes diffuse and response slows. One of the most persistent assumptions behind this pattern is simple. If enough people are notified, someone will take responsibility. After more than fourteen years of working with engineering teams of every size and shape, we’ve seen this assumption fail repeatedly.

Read Post

StatusCake

Read more about A Notification List Is Not a Team

Observing and Debugging Next.js apps with Sentry: A Hands-on Session

Feb 11, 2026 By Sentry In Sentry

Try Sentry for free: https://sentry.io
Docs: https://docs.sentry.io

View Video

Sentry

Monitoring

Read more about Observing and Debugging Next.js apps with Sentry: A Hands-on Session

Happy Birthday to Us: Honeycomb 10 Year Manifesto, Part 1

Feb 11, 2026 By Charity Majors In Honeycomb

Christine and I started Honeycomb in 2016, which means it’s been ten years. Christine, a developer, and I, an operations engineer, were both profoundly unhappy with the state of the art in monitoring and logging tools. The tools we had used at Facebook didn’t spray our signals around to a bunch of siloed-off pillars. They consolidated as much context as possible so we could properly explore it, the way every other non-software engineering team already takes for granted.

Read Post

Honeycomb

Read more about Happy Birthday to Us: Honeycomb 10 Year Manifesto, Part 1

Investigate Issues in Slack: Grafana Cloud Slack App with AI

Feb 11, 2026 By Grafana In Grafana

The Grafana Cloud app for Slack brings observability and incident response closer to where you and your teams already collaborate Ask questions about system health, alerts, on-call schedules, and Grafana Cloud features; manage incidents and alerts; and collaborate with full context.

View Video

Grafana

Read more about Investigate Issues in Slack: Grafana Cloud Slack App with AI

Grafana's creator on why UI matters | Big Tent S3 E5

Feb 11, 2026 By Grafana In Grafana

From Grafana Lab's Big Tent podcast - season 3, episode 5.

View Video

Grafana

Read more about Grafana's creator on why UI matters | Big Tent S3 E5

88% of Organizations Face Growing IT Complexity. Here's How Leaders Are Responding

Feb 11, 2026 By Renuka Suresh In HEAL Software

CIOs, IT leaders, platform engineering managers, and SRE/DevOps teams running multi-tool monitoring stacks who need faster incident clarity.

Read Post

HEAL Software

Read more about 88% of Organizations Face Growing IT Complexity. Here's How Leaders Are Responding

Releasing Icinga for Windows v1.14.0 - We have been cooking!

Feb 11, 2026 By Christian Stein In Icinga

As Bernd mentioned at last year’s OSMC, the Icinga for Windows team was heavy working on v1.14.0 which was going to be released in December. Well, we are off a couple of days, but we believe the wait was worth it!

Read Post

Icinga

Read more about Releasing Icinga for Windows v1.14.0 - We have been cooking!

Agent vs Assistant: The key distinction between Olly and the competition

Feb 11, 2026 By Chris Cooney In Coralogix

The market is saturated with agents and assistants, making it difficult to tell them apart. However, the difference between these two approaches is significant. They offer radically distinct levels of impact, reflecting major differences in both their technical complexity and the quality of their inferences. Let’s figure out the distinction.

Read Post

Coralogix

Read more about Agent vs Assistant: The key distinction between Olly and the competition

Sentry acquires XcodeBuildMCP

Feb 11, 2026 By Cameron Cooke In Sentry

Today we're announcing that Sentry has acquired XcodeBuildMCP, an open source MCP server that gives AI agents the ability to build, test, and debug native iOS and macOS apps. XcodeBuildMCP has become a go-to tool for agentic Apple-platform development, with more than 4,000 GitHub stars and an active community. It unlocks the full developer loop: build, run, debug, interact, and verify, allowing users to stay in their preferred agentic development environment.

Read Post

Sentry

Read more about Sentry acquires XcodeBuildMCP

A new perspective on dashboard sprawl

Feb 11, 2026 By Dave Clarke In Squared Up

Dashboards are supposed to answer questions, not create more of them. But investigations don't stop at a single view. The moment you want to understand one specific thing in detail like a failing VM, a degraded service, a slow pipeline, dashboards start to break down. You end up either building yet another dashboard or searching through many different ones. SquaredUp's Perspectives changes this.

Read Post

Squared Up

Read more about A new perspective on dashboard sprawl

What Agentic AI Is Really Made Of (Most People Miss This)

Feb 11, 2026 By Elastic In Elastic

Agentic AI isn’t just an LLM. Without the right context, it gives generic answers. This is the component that makes its decisions actually useful. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

View Video

Elastic

Read more about What Agentic AI Is Really Made Of (Most People Miss This)

How to Create and Manage Incidents in Uptime.com

Feb 11, 2026 By Uptime Website Monitoring In uptime

Learn how to create and manage incidents on your Uptime.com Status Page to keep your subscribers informed about service disruptions and maintenance events in real-time. In this tutorial, we'll cover understanding incident statuses (Investigating, Identified, Monitoring, Resolved, and more), three ways to create a new incident, configuring incident details and timelines, adding updates with Markdown formatting, managing and editing incidents, notifying Status Page subscribers, and using the REST API for incident management.

View Video

uptime

Read more about How to Create and Manage Incidents in Uptime.com

VictoriaMetrics at FOSDEM, Cloud Native Days France, and CfgMgmtCamp Ghent

Feb 11, 2026 By Diana Todea In VictoriaMetrics

Last week, members of the VictoriaMetrics team, including myself, spoke at three very different but equally important community events: FOSDEM in Brussels, Cloud Native Days France in Paris, and CfgMgmtCamp in Ghent. Each event drew a different crowd with its own expectations, making them a good way to see where open source observability stands today and how VictoriaMetrics is adapting to real-world needs. The talks we gave were snapshots of the problems we are actively working on.

Read Post

VictoriaMetrics

Read more about VictoriaMetrics at FOSDEM, Cloud Native Days France, and CfgMgmtCamp Ghent

AI Query Assist for SolarWinds Database Performance Analyzer

Feb 11, 2026 By solarwindsinc In SolarWinds

Is your database slow? Let AI do the heavy lifting. Watch how SolarWinds DPA’s AI Query Assist transforms query tuning from a manual headache into a streamlined process. This demo shows you how to get instant, AI-powered recommendations for your worst-performing queries while maintaining the control to review and verify every fix. It’s not just about finding the problem—it’s about fixing it faster.

View Video

SolarWinds

Read more about AI Query Assist for SolarWinds Database Performance Analyzer

Track and Fix Frontend JS errors Using Rollbar

Feb 11, 2026 By Rollbar In Rollbar

Setting up Rollbar with your Javascript application with custom parameters, people tracking, and session replay.

View Video

Rollbar

Read more about Track and Fix Frontend JS errors Using Rollbar

How to run checks on internal services with Grafana Cloud Synthetic Monitoring

Feb 11, 2026 By Bukola Ayodele In Grafana

Many critical services run inside private networks, where traditional monitoring tools and practices can’t offer full visibility. This makes it difficult to validate service availability and performance before problems impact your users. Synthetic Monitoring — a Grafana Cloud solution that helps you proactively monitor the performance of your applications and services — addresses this gap with a feature known as private probes.

Read Post

Grafana

Read more about How to run checks on internal services with Grafana Cloud Synthetic Monitoring

What is DEX Ops?

Feb 11, 2026 By Megan Brake In Nexthink

For decades, IT operations have been built around incidents, SLAs, and ticket closure rates. Success has been defined by how quickly tickets are resolved and whether service levels are met. But the modern digital workplace has changed. Employee productivity, digital adoption, collaboration quality, and business performance depend on far more than ticket metrics. A device that “works” but performs poorly still erodes productivity.

Read Post

Nexthink

Read more about What is DEX Ops?

Why distributed observability is straining and what new research reveals

Feb 11, 2026 By OpsMatters In OpsMatters

Distributed systems quietly run much of today's digital world. People expect these systems to work reliably across regions and time zones for everything from money transfers to streaming platforms and AI-driven workloads. As organisations use more microservices, containers, and event-driven architectures, observability has become the main way for teams to understand what is happening in production.

Read Post

OpsMatters

Read more about Why distributed observability is straining and what new research reveals

Landscape Operations Automation beyond SAP Landscape manager

Feb 10, 2026 By Avantra In Avantra

During the summer of 2024, SAP quietly announced the end of the Landscape Manager product. You can find out more from SAP directly here, including linked SAP Notes. LaMa Discontinued Community Post Unlike the news for Solution Manager or Focused Run, where the 2027 date signals a transition to extended support options, with LaMa the product is discontinued and extended support options aren’t available. For customers using Lama, the announcement and timeline are disruptive.

Read Post

Avantra

Read more about Landscape Operations Automation beyond SAP Landscape manager

NiCE Linux Power Management Pack 1.60 Available Now

Feb 10, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

We’re thrilled to introduce NiCE Linux Power Management Pack version 1.60, a major step forward in monitoring Linux environments running on IBM Power Systems (ppc64le) with Microsoft SCOM.

Read Post

NiCE IT Mgmt

Read more about NiCE Linux Power Management Pack 1.60 Available Now

Introducing Skylar Advisor: You Need an Advisor, Not an AI Assistant

Feb 10, 2026 By ScienceLogic In ScienceLogic

Skylar Advisor is a next-generation experience powered by Skylar AI, built to help IT teams focus on what matters right now. In this video, ScienceLogic Chief Product Officer Michael Nappi shares how Skylar Advisor proactively curates and summarizes key signals across monitoring tools, logs, and streaming telemetry into clear advisories your team can act on in seconds.

View Video

ScienceLogic

Read more about Introducing Skylar Advisor: You Need an Advisor, Not an AI Assistant

What Companies Get Wrong About Autonomous IT, And What Actually Moves Them Forward

Feb 10, 2026 By ScienceLogic In ScienceLogic

Many organizations approach Autonomous IT with the assumption that adding more tools, more data, or more automation will eventually produce self-governing operations. This assumption creates the illusion of progress. Complexity does not resolve itself when new systems are layered on top of existing ones. In most environments, each new tool adds another interpretation of the truth, which compounds the cognitive load on teams and forces more reconciliation, not less.

Read Post

ScienceLogic

Read more about What Companies Get Wrong About Autonomous IT, And What Actually Moves Them Forward

How Artificial Intelligence Supercharges IT Operations

Feb 10, 2026 By LogicMonitor In LogicMonitor

This article kicks off a 4-part series on leveraging AIOps to provide a more efficient, cost- and resource-saving, reliable, and agile IT infrastructure.

Read Post

LogicMonitor

Read more about How Artificial Intelligence Supercharges IT Operations

VictoriaLogs in VictoriaMetrics Cloud: Fast, Cost-Effective Log Management is Here

Feb 10, 2026 By Jose Gomez-Selles In VictoriaMetrics

Yes, you got it right: VictoriaLogs is now Generally Available in VictoriaMetrics Cloud! We believe that this is a huge milestone in our journey to deliver what our users are expecting from us: a complete, managed observability solution. If you’ve been following our quarterly updates, you know we’ve been after this launch for a while. In our latest update a few weeks ago we already announced that we were ready and today we’re making it official.

Read Post

VictoriaMetrics

Read more about VictoriaLogs in VictoriaMetrics Cloud: Fast, Cost-Effective Log Management is Here

What Are Cybersecurity Best Practices?

Feb 10, 2026 By Filip Cerny In Flowmon

Numerous best practices help to deliver robust cybersecurity. Adopting these will involve combining many of the technologies already outlined in the previous entry.

Read Post

Flowmon

Read more about What Are Cybersecurity Best Practices?

Exploring Splunk Alternatives [2026]: Deep Dive into Log Analysis

Feb 10, 2026 By Aiswarya S In Atatus

Splunk isn't bad software. It's genuinely powerful. But in 2026, a lot of engineering teams are asking a fair question: are we getting $300K worth of value out of this? More often than not, the answer is no. We went through 15 alternatives - read the docs, tested where we could, and talked to engineers who made the switch. This is what we found.

Read Post

Atatus

Read more about Exploring Splunk Alternatives [2026]: Deep Dive into Log Analysis

Same Work, More Windows: Why AI Isn't Paying Off Yet (w/ Anthony Firmin)

Feb 10, 2026 By Nexthink In Nexthink

In the first episode of a NEW ERA for the DEX Show, Tom (that's right, just Tom ) welcomes back AI and digital transformation leader Anthony Firmin to unpack the reality of enterprise AI adoption. Drawing on hard-won, real-world experience, Anthony explores why so many organisations are stuck in the “messy middle” of AI, where usage rises but value doesn’t. The conversation digs into trust, experience debt, shallow versus deep AI, and why “same work, more windows” is an early warning sign leaders ignore at their peril. It’s a grounded, human-centred look at what it really takes to make AI improve work, not just change it.

View Video

Nexthink

Read more about Same Work, More Windows: Why AI Isn't Paying Off Yet (w/ Anthony Firmin)

Build, buy, or open source? Understanding your options with Grafana's AI-powered observability

Feb 10, 2026 By Ksenia Yadav In Grafana

Some questions in engineering never go away. Here’s one that every team eventually confronts: Do we roll up our sleeves and build the tooling ourselves, or do we buy something built for us? It’s a choice that has the power to speed teams up or hold them back. With the rise of AI-powered observability, this familiar software dilemma has re-emerged with higher stakes and faster-moving technology.

Read Post

Grafana

Read more about Build, buy, or open source? Understanding your options with Grafana's AI-powered observability

Size Analysis is generally available in Sentry

Feb 10, 2026 By Max Topolsky In Sentry

Sentry acquired Emerge Tools in May 2025 to bring best-in-class mobile tooling to dev teams. Today, we’re officially bringing Size Analysis - one of their flagship products - to all Sentry users, so you never have to worry about app size again.

Read Post

Sentry

Read more about Size Analysis is generally available in Sentry

SRE Report: AI optimism and the economics of effort

Feb 10, 2026 By Denton Chikura In Catchpoint

For eight years, the survey behind the SRE Report has used a consistent methodology. That consistency allows us to track how reliability work evolves over time, rather than relying on snapshots. One of the most stable questions in the survey asks respondents to estimate how much of their work, on average, is spent on toil. Between 2020 and 2024, responses showed a gradual decline in reported toil.

Read Post

Catchpoint

Read more about SRE Report: AI optimism and the economics of effort

What problem is agentic AI trying to solve?

Feb 10, 2026 By Elastic In Elastic

Agentic AI isn’t limited to security operations. It’s already improving hospitals, financial systems, and service industries by reducing overload and filling skill gaps. Here’s the problem it was actually built to solve. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

View Video

Elastic

Read more about What problem is agentic AI trying to solve?

IT Leadership Best Practices: Why People Matter More Than Tools - SolarWinds TechPod 106

Feb 10, 2026 By solarwindsinc In SolarWinds

IT leadership best practices fail when organizations focus on tools instead of people. In this episode of SolarWinds TechPod, hosts Chrystal Taylor and Sean Sebring speak with Jon Collins, Field CTO and VP of Engagement at GigaOm, about what truly drives success in IT leadership—people, culture, and principles. This expert discussion breaks down why frameworks like Agile, ITIL, DevOps, and AI-driven operations succeed or fail based on leadership behaviors, prioritization, and trust—not technology alone.

View Video

SolarWinds

Read more about IT Leadership Best Practices: Why People Matter More Than Tools - SolarWinds TechPod 106

Sponsored Post

How to improve your Crash Free Users score in minutes

Feb 9, 2026 By Arisha Singh In Raygun

If you're reading this blog, you likely already know the importance of quality software. But with the overwhelming number of metrics that can be monitored and improved, development teams are struggling with what metrics they should prioritize to have the most significant impact. The Crash Free Users score in Raygun is a perfect place for development teams who care about software quality to focus their efforts. It tells you what percentage of users didn't encounter a crash or error while using your software and is an ideal north star to gauge the overall quality of your software.

Read Post

Raygun

Read more about How to improve your Crash Free Users score in minutes

How an AI assistant and MCP server deliver real-time cloud cost insights

Feb 9, 2026 By Sinjan Ballav In ManageEngine

Cloud costs don’t grow quietly. They spike, drift, and surprise teams at the worst possible moments, usually when someone finally opens a dashboard. While cloud cost management tools are powerful, getting quick answers often still means navigating multiple views, applying filters, exporting reports, and looping in the right people. But what if cloud cost analysis worked more like a conversation?

Read Post

ManageEngine

Read more about How an AI assistant and MCP server deliver real-time cloud cost insights

January 2026: IsDown Users Saved 9.2 Hours with Early Outage Detection

Feb 9, 2026 By Nuno Tomas In isDown

In January 2026, IsDown's early detection system gave users a cumulative advantage of 9.2 hours across 34 incidents — that's over half a business day of advance warning before vendors officially acknowledged their outages. The largest single detection advantage? A massive 2.2 hours for a SendGrid email delivery issue that left customers in the dark while their emails failed to reach Microsoft inboxes.

Read Post

isDown

Read more about January 2026: IsDown Users Saved 9.2 Hours with Early Outage Detection

Detecting incidents without components

Feb 9, 2026 By Valeria Kurolapova In StatusGator

StatusGator monitors services and their individual components, so you can stay informed about the systems you rely on – and filter down to only the components you care about. Most status pages do a good job of tagging incidents to the affected components. But sometimes providers publish incident updates without marking any components as impacted, even when the incident clearly affects something real.

Read Post

StatusGator

Read more about Detecting incidents without components

Continuous profiling in production: A real-world example to measure benefits and costs

Feb 9, 2026 By Jake Kramer In Grafana

Continuous profiling offers deep visibility into production environments, revealing exactly how applications consume CPU and memory. It’s the go-to observability practice for directly connecting system behavior and performance to specific lines of code. But when teams consider deploying continuous profiling more broadly, a common question comes up: what’s the overhead? Is it safe to run continuous profiling on my production services 24/7, or does the cost outweigh the benefits?

Read Post

Grafana

Read more about Continuous profiling in production: A real-world example to measure benefits and costs

How to Optimize Your Article with Surfer SEO

Feb 9, 2026 By Super Monitoring In Super Monitoring

Writing a good article is not enough anymore. The existing web contains millions of pages which compete for user attention and search engines determine which pages should appear at the top of search results. Optimization holds crucial value because it determines which websites will achieve success in online competition. The goal of our work is to develop content which answers user search queries. Surfer SEO exists specifically to fulfill this requirement.

Read Post

Super Monitoring

Read more about How to Optimize Your Article with Surfer SEO

What is agentic AI? (explained in 60 seconds)

Feb 9, 2026 By Elastic In Elastic

Agentic AI is the next evolution of artificial intelligence. Unlike traditional AI, it can act autonomously and make decisions on its own. Here’s what that actually means, without the hype. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

View Video

Elastic

Read more about What is agentic AI? (explained in 60 seconds)

Dashboard organization isn't about folders - it's about visibility

Feb 9, 2026 By Blog In Squared Up

Having well-organized dashboards is just as important as having good dashboards. But dashboard organization shouldn’t just make things easy to find. It should provide structure that supports collaboration and efficient troubleshooting. It has to be more than a basic folder system. This post looks at how classic dashboarding tools handle organization today, where they fall short, and how SquaredUp Workspaces organize for visibility and shared context.

Read Post

Squared Up

Read more about Dashboard organization isn't about folders - it's about visibility

AI NetOps: How AI and Machine Learning Transform Network Operations

Feb 9, 2026 By Kentik In Kentik

AI is changing network operations (NetOps) from static automation into adaptive, data-driven systems that can summarize incidents, retrieve knowledge, and guide remediation with human oversight. In this talk, Phil Gervasi breaks down what “AI for NetOps” really means in practice, including the difference between classical ML and large language models (LLMs), why data pipelines matter more than model tuning, and how patterns like RAG (retrieval augmented generation), text-to-SQL, and agentic workflows turn raw telemetry into decisions.

View Video

Kentik

Read more about AI NetOps: How AI and Machine Learning Transform Network Operations

Heartbeat behind the metrics | Muraleedharan on support, scale, and seeing the product in the wild

Feb 9, 2026 By ManageEngine Site24x7 In Site24x7

What does observability look like when you’re responsible for customers at scale? In this episode of Heartbeat Behind the Metrics, Muraleedharan Sadhasivam, Head of Customer Success, talks about his 15-year journey at ManageEngine and the perspective you only get from being close to customers every day. He shares why custom dashboards matter so much, and why AppLogs is a feature he wishes more users explored to complete the MELT story. From querying logs to turning them into alerts and dashboards, he explains how real insights start when data is brought together.

View Video

Site24x7

Read more about Heartbeat behind the metrics | Muraleedharan on support, scale, and seeing the product in the wild

Track cyber security with Reports in Digital Risk Analyzer

Feb 9, 2026 By ManageEngine Site24x7 In Site24x7

Discover how Site24x7’s Digital Risk Analyzer Reports help you instantly uncover vulnerabilities and assess multi-domain risks. In this quick walkthrough, learn how to view domain health, generate detailed or consolidated reports, schedule automated delivery, and share PDF insights with your team. Perfect for IT admins, DevOps, MSPs, and business leaders who want fast, actionable visibility into their cybersecurity posture.

View Video

Site24x7

Read more about Track cyber security with Reports in Digital Risk Analyzer

How Multispectral Drone Surveys Enhance Monitoring and Operational Intelligence

Feb 9, 2026 By Patrick Maple In OpsMatters

A multispectral drone survey is a powerful form of drone data analytics that captures invisible light data, enabling predictive maintenance and NDVI multispectral mapping with drones. This guide explains how industries use this UAV multispectral inspection service to move from reactive fixes to proactive, data-driven asset management with UAV multispectral data. However, many organizations still struggle to convert large volumes of monitoring data into timely, actionable insight.

Read Post

OpsMatters

Read more about How Multispectral Drone Surveys Enhance Monitoring and Operational Intelligence

Top 10 Port Monitoring Tools of 2025.

Feb 8, 2026 By Diana Bocco In Uptime Robot

Port failures don’t always take a service offline. A port stops accepting connections, times out intermittently, or gets blocked by a firewall change, while everything else looks healthy. When that happens, users feel the break long before uptime checks notice. This article reviews port monitoring tools from an operational point of view. It looks at how they detect closed or slow ports, how alerts behave in noisy environments, and where basic checks fall short during real incidents.

Read Post

Uptime Robot

Read more about Top 10 Port Monitoring Tools of 2025.

Monitoring cache stats using OpenTelemetry Go Metrics

Feb 7, 2026 By Vladimir Mihailenco In Uptrace

This article explains how to use opentelemetry-go Metrics API to collect metrics, for example, go-redis/cache stats.

Read Post

Uptrace

Read more about Monitoring cache stats using OpenTelemetry Go Metrics

Go Context timeouts can be harmful

Feb 7, 2026 By Vladimir Mihailenco In Uptrace

You probably should avoid ctx.WithTimeout or ctx.WithDeadline with code that makes network calls. Here is why.

Read Post

Uptrace

Read more about Go Context timeouts can be harmful

Monitor Load Balanced DNS Records with CIDR Ranges

Feb 7, 2026 By Matt Rideout In DNS Check

DNS Check's load balancer monitoring now supports CIDR notation, making it practical to monitor domains served by CDNs and cloud providers that use large IP pools. Instead of listing every possible IP address a provider might return, you can enter CIDR ranges like 104.16.0.0/13 and DNS Check will verify that responses fall within those ranges.

Read Post

DNS Check

Read more about Monitor Load Balanced DNS Records with CIDR Ranges

Firewall check: How long until you know your Firewall has been down?

Feb 6, 2026 By Geoffrin Edwin In Site24x7

Windows Firewall is enabled by default, right? How sure are you? Even if you are 99.999% sure, this is how you have a possible vulnerability on your hands. There are numerous cases where someone disables Windows Firewall temporarily to troubleshoot a connectivity issue. The problem gets resolved. The firewall stays disabled—for months. Nobody notices until the security team investigates why sensitive data is suddenly appearing on dark web marketplaces.

Read Post

Site24x7

Read more about Firewall check: How long until you know your Firewall has been down?

Cloud Provider Status Report - January 2026

Feb 6, 2026 By Nuno Tomas In isDown

This report analyzes cloud provider status data for January 2026, covering 12 major cloud platforms: AWS, Azure DevOps, DigitalOcean, Fly.io, Heroku, Linode, Microsoft Azure, Netlify, Railway, Render, and Vercel. The data includes official incident reports from each provider's status page and early detection capabilities from IsDown's monitoring system.

Read Post

isDown

Read more about Cloud Provider Status Report - January 2026

Key Takeaways From the 2025 Gartner Market Guide for Event Intelligence Solutions

Feb 6, 2026 By Dallon Robinette In Selector

The 2025 Gartner Market Guide for Event Intelligence Solutions reflects a shift in how IT operations leaders evaluate AI-driven technologies. As AI hype gives way to more practical evaluation, we are seeing a natural departure from broad promises about AI capabilities toward clearly defined use cases and outcomes.

Read Post

Selector

Read more about Key Takeaways From the 2025 Gartner Market Guide for Event Intelligence Solutions

Event Intelligence Solutions Part Three: Best Practices for Successful Adoption

Feb 6, 2026 By david.arrowsmith In Interlink

As Event Intelligence Solutions (EIS) move from early adoption to operational necessity, many enterprises are realizing that success depends on more than selecting the right technology. For Banking and Financial Services organizations, effective adoption requires a clear strategy, disciplined execution and a strong alignment to business priorities and regulatory demands and not least, customer expectations.

Read Post

Interlink

Read more about Event Intelligence Solutions Part Three: Best Practices for Successful Adoption

Agentic NetOps: How to beat the cloud monsters at their own game

Feb 6, 2026 By Avi Freedman In Kentik

The secret to hyperscaler success isn’t magic. Kentik Co-founder and CEO Avi Freedman explains how organizations can adopt the same operating principles and empower network teams to drive results that far exceeds their headcount.

Read Post

Kentik

Read more about Agentic NetOps: How to beat the cloud monsters at their own game

How we built Grafana Assistant - a conversation about AI development for observability

Feb 6, 2026 By Grafana In Grafana

This conversation with Grafana Labs engineers, Mat Ryer, Cyril Tovena and Sven Großmann, dives deep into the engineering behind Grafana Assistant, exploring how agentic AI is transforming the observability landscape. From hackathon origins to sophisticated backend agents, the team shares candid lessons on building, scaling, and refining AI tools for engineers.

View Video

Grafana

Read more about How we built Grafana Assistant - a conversation about AI development for observability

VirtualMetric DataStream + Google SecOps Integration: Pre-Ingest UDM Normalization at Scale

Feb 6, 2026 By VirtualMetric In VirtualMetric

Google SecOps (formerly Chronicle) is widely used for large-scale security analytics, long-term telemetry retention, and detection across diverse environments. Its Unified Data Model (UDM) enables correlation across sources and supports analytics that operate over long time horizons. To take full advantage of these capabilities, security data must arrive in a consistent and well-structured UDM format. In practice, this is rarely the case.

Read Post

VirtualMetric

Read more about VirtualMetric DataStream + Google SecOps Integration: Pre-Ingest UDM Normalization at Scale

Why Residential ISP ICMP Blocking Makes Remote Worker Monitoring Impossible (And What to Do About It)

Feb 6, 2026 By Alyssa Lamberti In Obkio

When your company’s help desk receives fifteen "my connection is slow" tickets from remote employees in a single morning. Your network monitoring dashboard shows everything green; VPN concentrators running smoothly, bandwidth usage normal, no alerts. Yet employees can't get their work done. You try to ping their home routers. Nothing. Attempt a traceroute to diagnose the path. It dies at the ISP edge. Check your SNMP queries. They never make it past the residential gateway.

Read Post

Obkio

Read more about Why Residential ISP ICMP Blocking Makes Remote Worker Monitoring Impossible (And What to Do About It)

TraceExporter for VS Code

Feb 6, 2026 By Percepio In Percepio

Percepio TraceExporter for VS Code makes it easy to export Percepio TraceRecorder snapshots during your debug session and open them directly in Percepio Tracealyzer. This is applicable for embedded systems based on Zephyr, FreeRTOS, SafeRTOS, Cesium, ThreadX or PX5, or using TraceRecorder’s “Bare Metal” option. The extension is currently provided in a Beta version as a downloadable.vsix file.

Read Post

Percepio

Read more about TraceExporter for VS Code

Instrumenting Code Using Prism and the Ruby Abstract Syntax Tree

Feb 6, 2026 By Jack Rothrock In Scout

A repository for this article can be found here.‍ When most developers think about request tracing, they picture instrumentation hooks inside familiar libraries. This allows us to track familiar metrics we see in application performance monitoring (APM) tools such as the duration of an HTTP call or how long a database query takes. But what if you could go deeper and instrument your own Ruby code automatically, without sprinkling timing calls everywhere?

Read Post

Scout

Read more about Instrumenting Code Using Prism and the Ruby Abstract Syntax Tree

Chrysalis Backdoor: What You Need to Know - and How Progress Flowmon Threat Briefing Helps You Stay Ahead

Feb 6, 2026 By Martin Škoda In Flowmon

A newly analyzed threat, Chrysalis, is a sophisticated backdoor attributed to the Chinese APT group Lotus Blossom. The malware employs advanced evasion techniques including heavy obfuscation, API hashing, dynamic DNS resolution, custom encryption and stealthy C2 communication disguised as legitimate traffic.

Read Post

Flowmon

Read more about Chrysalis Backdoor: What You Need to Know - and How Progress Flowmon Threat Briefing Helps You Stay Ahead

How eBPF Improves Open Source Observability

Feb 6, 2026 By Coroot In Coroot

Try it open source on your system. Learn how tools can make gathering and making sense of observability data instant and painless with co-founder Peter Zaitsev.

View Video

Coroot

Read more about How eBPF Improves Open Source Observability

Monitor Lustre with Datadog

Feb 6, 2026 By Michael Cronk In Datadog

High-performance computing (HPC) clusters rely on fast, reliable shared storage so that expensive CPU cores and GPUs aren’t left idle waiting for data. When bandwidth or I/O bottlenecks emerge, your workloads can slow down as your cluster spends more time on blocked reads, metadata lookups, and sync operations than actually computing.

Read Post

Datadog

Read more about Monitor Lustre with Datadog

Redefining partner-first: SolarWinds kickstarts new partner programme at February summit

Feb 5, 2026 By SolarWinds In SolarWinds

Programme enhancements will include upgrades to partner benefits, new enablement opportunities, and new discounts.

Read Post

SolarWinds

Read more about Redefining partner-first: SolarWinds kickstarts new partner programme at February summit

How to Automate Alerts for Critical Directory Changes with Site24x7 Server Monitoring

Feb 5, 2026 By Geoffrin Edwin In Site24x7

It takes just one misconfigured deployment script to silently dump TBs of debug logs into a production server's/var/log directory. By the time anyone notices, the disk will be at 98% capacity, and multiple microservices would have already crashed. Incidents like these usually take hours to remediate and cost the team an entire sprint's worth of goodwill with stakeholders. This should never happen.

Read Post

Site24x7

Read more about How to Automate Alerts for Critical Directory Changes with Site24x7 Server Monitoring

Skylar Advisor: Proactive Guidance for Modern Operations

Feb 5, 2026 By ScienceLogic In ScienceLogic

Meet Skylar Advisor, bringing trusted and verifiable guidance to IT operations by connecting real time observability with your data and knowledge. Built AI native, it helps teams cut through alert floods, understand what matters most and why, and take the next best steps with confidence. Every recommendation is evidence backed and traceable to the exact data and sources used, so guidance is clear, explainable, and defensible when the stakes are high.

View Video

ScienceLogic

Read more about Skylar Advisor: Proactive Guidance for Modern Operations

Building an AI GitHub Maintainer

Feb 5, 2026 By Sentry In Sentry

In this stream, we're going to be building an application together from scratch using ShadCN, AI SDK, and Sentry. Today's goals.

View Video

Sentry

Read more about Building an AI GitHub Maintainer

Heartbeat behind the metrics | Raghavan on building Site24x7

Feb 5, 2026 By ManageEngine Site24x7 In Site24x7

How do you build an observability platform that keeps up with constant change? In this episode of Heartbeat Behind the Metrics, Srinivasa Raghavan Santhanam, Director of Product Management at Site24x7, reflects on more than 15 years with the product and what he sees as its quiet strengths. He talks about GenAI as a hidden gem inside Site24x7, and you'll hear a standout customer story where a large Indian enterprise replaced 12 different tools with Site24x7, consolidating everything into a single platform. For him, that moment confirmed the platform’s ability to solve multiple problems at scale.

View Video

Site24x7

Monitoring

Read more about Heartbeat behind the metrics | Raghavan on building Site24x7

How to Use Pandas Time Index: A Tutorial with Examples

Feb 5, 2026 By Company In InfluxData

Time series data is everywhere in modern analytics, from stock prices and sensor readings to web traffic and financial transactions. When working with temporal data in Python, pandas provides powerful tools for handling time-based indexing through its DatetimeIndex functionality. This tutorial will guide you through creating, manipulating, and extracting insights from pandas time indexes with practical examples.

Read Post

InfluxData

Read more about How to Use Pandas Time Index: A Tutorial with Examples

What you missed at OTel Unplugged 2026 in 8 minutes!

Feb 5, 2026 By Bindplane In ObservIQ

OTel Unplugged 2026 was different by design. Held alongside FOSDEM in Brussels, this was an unconference built by the OpenTelemetry community, for the community. No sales pitches. No product demos. Just honest conversations about what’s working, what’s broken, and where OTel needs to go next. In this recap, you’ll hear short interviews and reflections from engineers, maintainers, and practitioners on.

View Video

ObservIQ

Read more about What you missed at OTel Unplugged 2026 in 8 minutes!

What Is Alert Noise Reduction? Techniques & Tools

Feb 5, 2026 By Arpit Sharma In Motadata

Modern IT environments are noisy. The sheer volume of telemetry data coming forth every second from microservices, hybrid clouds, and containerized applications is just extraordinary. In IT Operations, NOC teams, and Site Reliability Engineers (SREs), this data is crucial, but only if it can be acted upon. When it’s not like this, everything becomes a background noise.

Read Post

Motadata

Read more about What Is Alert Noise Reduction? Techniques & Tools

Sync Your Users Into Icinga Notifications: Introducing the Contacts/Groups API

Feb 5, 2026 By Jan Schuppik In Icinga

If you’ve ever onboarded a teammate at 4:57 PM on a Friday (or offboarded one at 4:58 PM… ), you know the pain: keeping notification contacts and groups up to date is work. With the Icinga Notifications REST API, you can automate that and avoid drift.

Read Post

Icinga

Read more about Sync Your Users Into Icinga Notifications: Introducing the Contacts/Groups API

What is Cybersecurity?

Feb 5, 2026 By Filip Cerny In Flowmon

Cybersecurity refers to the processes and technology used to protect information technology networks, data, people, servers, endpoint devices and other IT-related systems from cyberattacks. The need for this protection has never been greater. All organizations (in both private and public sectors) now exist in a threat landscape that allows attacks against their IT infrastructure.

Read Post

Flowmon

Read more about What is Cybersecurity?

How Network Operations Teams Use InfluxDB to Solve Network Monitoring Gaps

Feb 5, 2026 By Mike Devy In InfluxData

Organizations are starting to question whether the value they get from traditional Network Monitoring Systems (NMS) justifies the budget they’ve locked into them.

Read Post

InfluxData

Read more about How Network Operations Teams Use InfluxDB to Solve Network Monitoring Gaps

How Honeycomb Supercharges OpenTelemetry for AI

Feb 5, 2026 By Fahim Zaman In Honeycomb

It has become common knowledge that the nature of software development has changed as AI-code generation and agent-based features gain adoption. In perhaps a more subtle shift, the fundamentals of software instrumentation are changing too. As OpenTelemetry becomes the standard instrumentation layer across enterprises, with thousands of developers (many from Honeycomb) actively contributing to it, the nature of the telemetry data captured itself is evolving to meet the growing demand for rich context.

Read Post

Honeycomb

Read more about How Honeycomb Supercharges OpenTelemetry for AI

The E-Commerce Critical Path Checklist

Feb 5, 2026 By Pingdom In SolarWinds

It’s your site’s huge, annual sale weekend, and your online store’s checkout process went down for 10 minutes. At your conversion rate, that’s $10,000 in lost sales. Thankfully, it came back up after only 10 minutes, but the real issue is that you only found out from customer complaints on social media. You spent months on email marketing and other campaigns driving traffic to this sale, and now those efforts are turning into customer frustration instead of revenue.

Read Post

SolarWinds

Read more about The E-Commerce Critical Path Checklist

Kiro Can Now Reason With Lightrun's Live Runtime Context

Feb 5, 2026 By Gideon Freud In Lightrun

AI code generation is fast. Making it reliable requires runtime context. Today, Kiro gains live runtime visibility with the Lightrun MCP. This grounds AI-assisted development in how code actually behaves at runtime. Kiro, the AI coding assistant from the teams at AWS, is built for velocity and intuition. It moves from specification to production with speed and structure, helping teams turn intent into working code. But until now, like every AI coding assistant, Kiro had a major blind spot.

Read Post

Lightrun

Read more about Kiro Can Now Reason With Lightrun's Live Runtime Context

Understanding Lighthouse: First Meaningful Paint

Feb 5, 2026 By Todd H. Gardner In Request Metrics

You’re reading an old performance article, and it keeps talking about “First Meaningful Paint.” You search for how to improve it, but every tool gives you different advice. Some don’t mention it at all. What’s going on? Here’s the short answer: First Meaningful Paint is dead. Google deprecated it in Lighthouse 6.0 back in 2020 and removed it completely in Lighthouse 13. If you’re still trying to optimize for FMP, you’re chasing a ghost.

Read Post

Request Metrics

Read more about Understanding Lighthouse: First Meaningful Paint

Connecting Your Browser JavaScript App to Rollbar

Feb 5, 2026 By Rollbar In Rollbar

Setting up Rollbar with your Javascript application in under a minute, with Session Replay included.

View Video

Rollbar

Read more about Connecting Your Browser JavaScript App to Rollbar

The Human-Centric Stack: Why Logs Are the Great Equalizer in the Age of AI

Feb 5, 2026 By Rachel Revoy In SolarWinds

In 2026, we are seeing incredible feats of engineering with agentic AI, impacting metrics and distributed traces that map thousands of microservices. Our systems have never been more intelligent and complex. However, as our observability becomes more intelligent, fewer employees know how to manage and troubleshoot complex systems. These employees, who often bear the brunt of an error’s impact, may need to rely on specialists to interpret the system.

Read Post

SolarWinds

Read more about The Human-Centric Stack: Why Logs Are the Great Equalizer in the Age of AI

Custom Dashboard Creation: Step-by-Step Tutorial

Feb 5, 2026 By MetricFire Team In MetricFire

Creating a custom dashboard is the best way to monitor metrics that matter most to your systems. Tools like MetricFire make this process straightforward by combining hosted Grafana and Graphite, eliminating the need for self-hosted solutions. Here's how you can build dashboards tailored to your needs.

Read Post

MetricFire

Read more about Custom Dashboard Creation: Step-by-Step Tutorial

Grafana dashboards as code: How to manage your dashboards with Git

Feb 5, 2026 By Roberto Jiménez Sánchez In Grafana

Note: This blog post originally published in May 2025 and was updated in February 2026 to reflect that Git Sync is now available in public preview in Grafana Cloud. As your Grafana instance scales, so does the challenge of maintaining dashboards. Managing dozens—or hundreds—of dashboards through the UI alone can quickly become overwhelming. Tracking changes gets murky, dashboards multiply, and consistency suffers.

Read Post

Grafana

Read more about Grafana dashboards as code: How to manage your dashboards with Git

Add skills to agents: Use Assistant playbooks for faster answers, investigations

Feb 5, 2026 By Mat Ryer In Grafana

Grafana Assistant is the most general-purpose tool we’ve delivered since dashboards. People use our Grafana Cloud LLM to understand unfamiliar areas of their stacks, generate dashboards and beautiful visualizations out of thin air, build queries, and support investigations.

Read Post

Grafana

Read more about Add skills to agents: Use Assistant playbooks for faster answers, investigations

How to Set Up Effective Alert Thresholds in Graphite

Feb 5, 2026 By MetricFire Team In MetricFire

Setting up alert thresholds in Graphite transforms raw monitoring data into actionable notifications, helping you address system issues before they escalate. Here's what you need to know.

Read Post

MetricFire

Read more about How to Set Up Effective Alert Thresholds in Graphite

Beyond a Billion Spans: Using Highlights for High-Speed Root Cause Analysis at Scale

Feb 5, 2026 By Jonny Steiner In Coralogix

In late 2025, we introduced Trace Highlight Comparison. This capability was designed to solve the problem of having too many spans. This causes technical and financial challenges when identifying performance patterns within high-volume telemetry streams. The goal is to avoid massive indexing costs and eliminate the ingestion latency associated with indexing every record. However, knowing these trends is only half the battle.

Read Post

Coralogix

Read more about Beyond a Billion Spans: Using Highlights for High-Speed Root Cause Analysis at Scale

Top 9 Observability Tools for AI-Assisted Development & Deployment

Feb 5, 2026 By OpsMatters In OpsMatters

AI-assisted development is rapidly becoming the default way software is built. Code generation, AI copilots, agentic pull requests, and automated refactoring are now embedded directly into engineering workflows. While this shift dramatically increases delivery speed, it also introduces a new operational reality: production systems are changing faster than humans can fully reason about them. This is where observability becomes mission-critical.

Read Post

OpsMatters

Read more about Top 9 Observability Tools for AI-Assisted Development & Deployment

January 2026 product updates

Feb 4, 2026 By Valeria Kurolapova In StatusGator

January brought a packed set of updates to StatusGator – from better ways to organize and analyze your monitors, to new API capabilities and expanded security controls. Here’s a quick recap of everything that rolled out.

Read Post

StatusGator

Read more about January 2026 product updates

Sponsored Post

Why Every MSP Needs Centralized SaaS Monitoring

Feb 4, 2026 By Nuno Tomas In isDown

Your monitoring stack catches server failures, network issues, and application crashes. But what happens when Microsoft Teams goes down across half your client base at 3 AM? Your on-call tech gets bombarded with alerts that all trace back to one root cause they can't see. This is the MSP blind spot: third-party SaaS dependencies that sit outside your monitoring perimeter but directly impact your SLAs.

Read Post

isDown

Read more about Why Every MSP Needs Centralized SaaS Monitoring

What's New at Cribl 4.16: On release days, we wear teal.

Feb 4, 2026 By Cribl In Cribl

On release days, we wear teal, y'all! Check out the fun and exciting new features from Cribl releases on a monthly (:fingers-crossed:) basis. Here's what's new in Cribl 4.16.

View Video

Cribl

Read more about What's New at Cribl 4.16: On release days, we wear teal.

Observability trends for 2026 (Part 2): GenAI and OpenTelemetry reshape the landscape

Feb 4, 2026 By David Hope In Elastic

Over the course of my 20 years as a developer, SRE, and now observability product leader, software has typically progressed at a good pace. But now, the emergence of two transformative technologies are fundamentally reshaping enterprise observability: generative AI (GenAI) and OpenTelemetry (OTel). We surveyed over 500 IT decision-makers for a new report:The Landscape of Observability in 2026: Balancing Cost and Innovation.

Read Post

Elastic

Read more about Observability trends for 2026 (Part 2): GenAI and OpenTelemetry reshape the landscape

Monitoring Checklist for Cloud Infrastructure

Feb 4, 2026 By MetricFire Team In MetricFire

Cloud monitoring is essential for tracking performance, security, and costs in dynamic environments. With 94% of enterprises using cloud services and 81% adopting multi-cloud setups, maintaining control is critical. Here's what you need to know.

Read Post

MetricFire

Read more about Monitoring Checklist for Cloud Infrastructure

AI Agent Governance: How to Keep Agentic ITOps Workflows Safe

Feb 4, 2026 By Margo Poda In LogicMonitor

The future of ITOps automation is better control over what AI agents can see, share, and do. AI automation in ITOps is expected to resolve incidents, reduce operational load, and operate with limited human involvement. Those outcomes depend on systems that can take action, not just surface insight. Agentic AI enables that shift. AI agents can correlate signals across tools, update tickets, trigger remediation, and coordinate workflows without waiting for instruction.

Read Post

LogicMonitor

Read more about AI Agent Governance: How to Keep Agentic ITOps Workflows Safe

5 Common DevOps Monitoring Challenges and Solutions

Feb 4, 2026 By MetricFire Team In MetricFire

Modern DevOps faces tough monitoring challenges due to distributed systems, containers, and microservices. Key issues include fragmented visibility, alert fatigue, tool overload, pipeline blindspots, and cloud cost inefficiencies.

Read Post

MetricFire

Read more about 5 Common DevOps Monitoring Challenges and Solutions

OnlineOrNot updates from January 2026

Feb 4, 2026 By Max Rozen In OnlineOrNot

Hopefully this will be one of the last major "behind-the-scenes" updates for a while, because OnlineOrNot's frontend now runs on a React framework that's easy to deploy across multiple providers, and is fully off GraphQL, being powered by its own REST API.

Read Post

OnlineOrNot

Read more about OnlineOrNot updates from January 2026

ISO 27K Without the Bloat: An Open Source Approach

Feb 4, 2026 By Tony Ramos In ObservIQ

It’s often framed as an enterprise-only exercise: long timelines, expensive tooling, consultants everywhere, and a lot of compliance work that exists mainly to survive an audit. As a ~40-person, engineering-driven SaaS company, we needed the same level of trust and rigor as much larger organizations — but we weren’t willing to accept shelfware, parallel compliance infrastructure, or controls that only exist on paper. We also didn’t stop at ISO 27001.

Read Post

ObservIQ

Read more about ISO 27K Without the Bloat: An Open Source Approach

Make faster, better product decisions with Datadog Product Analytics

Feb 4, 2026 By Milene Darnis In Datadog

Product managers (PMs) need to make fast, confident decisions about what to build, fix, and improve based on user behavior within their application. But in practice, collecting the user insights they require is rarely straightforward. Recent updates to Datadog Product Analytics address this challenge. Product Analytics adds structure to autocaptured data and makes analysis easier to interpret, reuse, and share, helping PMs move from questions to answers without relying on SQL or engineering.

Read Post

Datadog

Read more about Make faster, better product decisions with Datadog Product Analytics

Surface and remediate runtime posture issues with Workload Protection Findings

Feb 4, 2026 By Danila Ivanov In Datadog

Threat detection and runtime posture monitoring are related but different jobs. Security teams already rely on Datadog Workload Protection to detect threats in real time across hosts and containers. But the actions that lead to those detections (file manipulation, process execution, network calls, or kernel activity) can be indicative of compromise or simply of risky behavior—like running compilers in production containers.

Read Post

Datadog

Read more about Surface and remediate runtime posture issues with Workload Protection Findings

Alert Noise Isn't an Accident - It's a Design Decision

Feb 4, 2026 By James Barnes In StatusCake

In a previous post, The Incident Checklist: Reducing Cognitive Load When It Matters Most, we explored how incidents stop being purely technical problems and become human ones. These are moments where decision-making under pressure and cognitive load matter more than perfect root cause analysis. When systems don’t support people clearly in those moments, teams compensate. They add process. They add people. They add noise. Alerting is one of the most visible places where this shows up.

Read Post

StatusCake

Read more about Alert Noise Isn't an Accident - It's a Design Decision

The Grok-to-AI Evolution: Why Modern SREs Are Moving Beyond Manual Parsing

Feb 4, 2026 By Mezmo In Mezmo

Grok structures logs. Context engineering connects systems. AI explains behavior. For years, Grok patterns have been the workhorse of the SRE world. Built on regular expressions, Grok helps teams extract structure from unstructured logs. As we explored in "Do You Grok It?", Grok is the key to turning messy log lines into usable fields. It's why our Grok Pattern Reference remains one of our most-visited resources — SREs are hungry for structure.

Read Post

Mezmo

Read more about The Grok-to-AI Evolution: Why Modern SREs Are Moving Beyond Manual Parsing

Tech Talk | Splunk MCP & Agentic AI: Machine Data Without Limits

Feb 4, 2026 By Splunk In Splunk

In this session, we’ll show how MCP empowers autonomous AI agents to retrieve, process, and share live data anywhere it’s needed — breaking barriers and accelerating insight across your organization.

View Video

Splunk

Read more about Tech Talk | Splunk MCP & Agentic AI: Machine Data Without Limits

Observing agentic AI workflows with Grafana Cloud, OpenTelemetry, and the OpenAI Agents SDK

Feb 4, 2026 By Adam Quan In Grafana

As agentic AI applications are used more broadly in production, they introduce new operational models, combining multi-step reasoning, tool execution, and autonomous decision-making into a single workflow. SRE teams need visibility into how these agents behave, where they fail, and how they perform over time.

Read Post

Grafana

Read more about Observing agentic AI workflows with Grafana Cloud, OpenTelemetry, and the OpenAI Agents SDK

Monitoring Sprawl: Why IT Teams Still Can't Get Actionable Insight Fast

Feb 4, 2026 By LogicMonitor In LogicMonitor

IT teams collect extensive monitoring data but struggle to turn it into fast, confident decisions during incidents. Most IT leaders aren’t worried about whether their environments are monitored—they’re worried about whether their teams can make sense of what they’re seeing quickly enough to actually resolve issues. When something breaks, the problem usually isn’t finding data. Dashboards show activity, alerts indicate changes, and logs capture events across the entire stack.

Read Post

LogicMonitor

Read more about Monitoring Sprawl: Why IT Teams Still Can't Get Actionable Insight Fast

How to Enhance Service Management for Small Firms

Feb 4, 2026 By OpsMatters In OpsMatters

Small firms juggle many tasks at once. They serve clients while managing budgets and staff. Most owners spend their days putting out fires instead of building better systems. Poor service management drains resources fast. Client requests get lost in email threads. Team members use different tools for the same tasks. Bills slip through the cracks. These problems cost money that small businesses can't afford to lose.

Read Post

OpsMatters

Read more about How to Enhance Service Management for Small Firms

AI Systems Status Report - January 2026

Feb 3, 2026 By Nuno Tomas In isDown

This report presents status data for eight major AI systems during January 2026: Anthropic, Cohere, DeepSeek, Google Gemini, Groq Cloud, OpenAI, Perplexity, Replicate, and xAI. The data includes official incidents reported on provider status pages and detection information from IsDown's monitoring system.

Read Post

isDown

Read more about AI Systems Status Report - January 2026

January 2026 Early Warning Signals

Feb 3, 2026 By Andy Libby In StatusGator

January 2026 saw a wave of high-impact service disruptions across social platforms, telecom providers, developer tools, education services, and streaming apps. In several cases, StatusGator detected problems minutes or even hours before providers publicly acknowledged them, and in many cases, providers never acknowledged them at all. Unfortunately, many providers still do not have public status pages, leaving users with little visibility into what is happening during an outage.

Read Post

StatusGator

Read more about January 2026 Early Warning Signals

OpenTelemetry Instrumentation Best Practices for Microservices Observability

Feb 3, 2026 By Sematext In Sematext

OpenTelemetry instrumentation is the foundation of modern microservices observability, but getting it right in production requires more than just enabling auto-instrumentation. This guide covers production-tested OpenTelemetry best practices that help engineering teams achieve reliable distributed tracing, control observability costs, and extract maximum value from their telemetry data.

Read Post

Sematext

Read more about OpenTelemetry Instrumentation Best Practices for Microservices Observability

30+ Top Observability Tools to Monitor Websites and Applications [2026 Updated]

Feb 3, 2026 By Janani In Atatus

By incorporating observability tools into your stack, you can better understand how your complex infrastructure operates, reduce downtime, and empower developers to identify and fix problems quickly. However, it now takes considerably more work, time, and money to build the best observability tools for your infrastructure and applications. According to a Splunk survey, over half of the firms polled employ eight or more observability tools.

Read Post

Atatus

Read more about 30+ Top Observability Tools to Monitor Websites and Applications [2026 Updated]

Protect agentic AI applications with Datadog AI Guard

Feb 3, 2026 By Océane Bordeau In Datadog

Organizations are increasingly using agentic AI applications powered by large language models (LLMs) to automate analysis, decision-making, and operational workflows. As these AI agents take on more responsibility, they gain access to internal tools and services and can interact with them in unintended ways.

Read Post

Datadog

Read more about Protect agentic AI applications with Datadog AI Guard

How to optimize JavaScript code with CSS

Feb 3, 2026 By Addie Beach In Datadog

When to use JavaScript or CSS in frontend projects is a matter of continued debate among many frontend developers. JavaScript is often the default choice for frontend development, as it offers a robust collection of libraries custom-made for creating advanced UI features, such as data-based visualizations or complex animations. But JavaScript also comes with tradeoffs, particularly when it comes to performance, accessibility, and code complexity.

Read Post

Datadog

Read more about How to optimize JavaScript code with CSS

Trace Google Pub/Sub workloads in Cloud Run with Datadog

Feb 3, 2026 By Nina Rei In Datadog

Event-driven systems are great at decoupling services, but they also make incidents harder to untangle. A single user request can turn into dozens (or thousands) of messages, multiple consumers, retries, and delayed acknowledgments. If your tracing only tells you that a message was sent or received, you still have to guess which upstream request produced the message, whether a batch publish fanned out cleanly, and where queue time is accumulating.

Read Post

Datadog

Read more about Trace Google Pub/Sub workloads in Cloud Run with Datadog

Exponential Smoothing: A Guide to Getting Started

Feb 3, 2026 By Community In InfluxData

Exponential smoothing is a time series forecasting method that uses an exponentially weighted average of past observations to predict future values. In other words, it assigns greater weight to recent observations than to older ones, allowing the forecast to adapt to changing data trends. In this post, we’ll look at the basics of exponential smoothing, including how it works, its types, and how to implement it in Python.

Read Post

InfluxData

Read more about Exponential Smoothing: A Guide to Getting Started

How to Implement Distributed Tracing in Microservices with OpenTelemetry Auto-Instrumentation

Feb 3, 2026 By Sematext In Sematext

This guide shows you how to implement OpenTelemetry’s auto-instrumentation for complete distributed tracing across your microservices, from initial setup through production optimization and troubleshooting.

Read Post

Sematext

Read more about How to Implement Distributed Tracing in Microservices with OpenTelemetry Auto-Instrumentation

Every CIO is asking the same question: Am I Next?

Feb 3, 2026 By Virtana In Virtana

Every CIO is asking the same question: Am I next? We’ve seen it across cloud providers, carriers, and global platforms—organizations with enormous scale and investment still experience public, business-impacting outages. The risk isn’t lack of effort. It’s the growing gap between AI-driven complexity and the ability to see, understand, and resolve issues fast enough to protect availability commitments.

View Video

Virtana

Read more about Every CIO is asking the same question: Am I Next?

Skylar Advisor: Proactive Guidance for Modern Operations

Feb 3, 2026 By ScienceLogic In ScienceLogic

View Video

ScienceLogic

Read more about Skylar Advisor: Proactive Guidance for Modern Operations

How Prometheus Remote Write v2 can help cut network egress costs by as much as 50%

Feb 3, 2026 By Sam DeHaan In Grafana

Back in 2021, Grafana Labs CTO Tom Wilkie (then VP of Products) spoke at PromCON about the need for improvements in Prometheus' remote write capabilities. “We use between 10 and 2 bytes per sample to send via remote write, and Prometheus only uses 1 or 2 bytes per sample on the local disk so there’s big, big room for improvement,” Wilkie said at the time.

Read Post

Grafana

Read more about How Prometheus Remote Write v2 can help cut network egress costs by as much as 50%

Grafana Assistant: Why you can trust our agent-and yourself-in an era of AI hallucinations

Feb 3, 2026 By Amelie Sutsakhan In Grafana

Let’s be real: AI can hallucinate. And in observability, that feels risky. No one wants an assistant that sends your SREs chasing ghosts. At best, that burns expensive engineering time. At worst, it slows incident response in production and pushes teams toward the wrong remediation path. So here’s the big question: What makes Grafana Assistant different, and why should you trust it? Let’s start by acknowledging the fear. AI hallucinations are a real issue.

Read Post

Grafana

Read more about Grafana Assistant: Why you can trust our agent-and yourself-in an era of AI hallucinations

Modernizing Middleware for the AI Era

Feb 3, 2026 By meshIQ In meshIQ

As AI adoption accelerates, middleware complexity intensifies. In this discussion with meshIQ CEO Navdeep Sidhu, discover why governance—not speed—has become the defining factor for enterprise success, and why fragmented middleware environments can no longer be ignored in the AI era.

Read Post

meshIQ

Read more about Modernizing Middleware for the AI Era

Elastic 9.3: Chat with your data, build custom AI agents, automate everything

Feb 3, 2026 By Dan Courcy In Elastic

Today, we are pleased to announce the general availability of Elastic 9.3 as the latest version of the Elasticsearch Platform — the world’s most popular open source platform for transforming both structured and unstructured data into trusted answers and outcomes. In addition to including new features that help developers with context engineering and agent building, Elastic 9.3 introduces a broad set of new capabilities to Elastic Search & AI, Elastic Observability, and Elastic Security.

Read Post

Elastic

Read more about Elastic 9.3: Chat with your data, build custom AI agents, automate everything

What's new in VictoriaMetrics Anomaly Detection (2025)

Feb 3, 2026 By Fred Navruzov In VictoriaMetrics

It’s been a while since the last update on “What’s New” series, so I will try and keep it short yet informative. Stay tuned for upcoming content on anomaly detection.

Read Post

VictoriaMetrics

Read more about What's new in VictoriaMetrics Anomaly Detection (2025)

You Need an Advisor. Not an AI Assistant.

Feb 3, 2026 By ScienceLogic In ScienceLogic

Complex environments don’t fail because teams lack data. They fail when teams can’t trust what the data is telling them. There are too many signals, too little time, and too much risk riding on every decision. That’s the reality Skylar Advisor is built for: delivering guidance teams can verify, so they can act faster without gambling on opaque, black-box answers.

Read Post

ScienceLogic

Read more about You Need an Advisor. Not an AI Assistant.

Tool Consolidation Is Dead. Long Live Agentic AI.

Feb 3, 2026 By Asaf Yigal In logz.io

It’s 2026, and developers have more tools at their disposal than at any point in the industry’s history: CI/CD platforms are richer; observability stacks are deeper; security, data, and AI tooling have exploded into crowded, competitive ecosystems. And yet, delivery is still slow, incidents are still noisy, workflows are still brittle. The problem is no longer tool scarcity or feature depth. It’s integration debt.

Read Post

logz.io

Read more about Tool Consolidation Is Dead. Long Live Agentic AI.

Size Analysis In 90 Seconds

Feb 3, 2026 By Sentry In Sentry

Size Analysis gives you the tools to monitor and reduce the size of your mobile apps. Get specific recommendations on how to make your mobile apps smaller. Upload builds from CI to spot regressions early, understand what's inside each bundle, and keep release artifacts lean.

View Video

Sentry

Monitoring

Read more about Size Analysis In 90 Seconds

Are We Letting AI Think for Us? | SolarWinds TechPod #105

Feb 3, 2026 By solarwindsinc In SolarWinds

We’re more dependent on technology than ever—and AI is changing how we make decisions. But what happens when the systems fail? Or when bad actors decide to “pull the plug”? This clip dives into a scary but necessary question: Are we losing our ability to critically think and problem-solve by relying too much on AI? Is AI leveling the playing field—or quietly taking over human decision-making? A must-watch conversation about innovation, outages, AI risk, and why having a backup plan matters more than ever.

View Video

SolarWinds

Read more about Are We Letting AI Think for Us? | SolarWinds TechPod #105

How does Coralogix go beyond basic migration?

Feb 3, 2026 By Chris Cooney In Coralogix

When a team, division or organization is assessing a new vendor, there are some basic questions that must be answered. At Coralogix, we look at migrations in a different way. It isn’t about transporting the current state of play into a new vendor, often called a “lift and shift”. These are the basics. There is a whole new level of onboarding and support that doesn’t just replicate value across platforms – it expands it.

Read Post

Coralogix

Read more about How does Coralogix go beyond basic migration?

Andy Wojnarek Appointed Chief Technology Officer

Feb 2, 2026 By Kristy Slimmer In Galileo

ATS Group and Galileo are pleased to announce the appointment of Andy Wojnarek as Chief Technology Officer. Andy’s appointment reflects the evolution of a technical leadership role he has developed over more than 16 years with the company, grounded in hands-on expertise, cross-functional influence, and a sustained focus on solving complex infrastructure and observability challenges for clients.

Read Post

Galileo

Read more about Andy Wojnarek Appointed Chief Technology Officer

How One ISP Flipped Transit-to-CDN from 10% to 90% (Midco's Peering Playbook)

Feb 2, 2026 By Kentik In Kentik

"When we started, maybe ten percent of our traffic went to a CDN. Now it's ninety percent on the CDNs or local caches and ten percent goes to transit. We've really flipped the whole script." — John Lubeck, Midco.

View Video

Kentik

Read more about How One ISP Flipped Transit-to-CDN from 10% to 90% (Midco's Peering Playbook)

Observability vs Monitoring: Getting a Full Picture of the Environment

Feb 2, 2026 By Jeff Darrington In Graylog

Driving down the highway, you usually glance intermittently at your speedometer to ensure that you stay within the speed limit, or whatever window above the speed limit you’re willing to drive. While monitoring your speed mitigates the risk of a ticket, you still need to look out for various threats on the road, like cars going through stop signs. By observing your surroundings, you take in real-time information that can help prevent a crash.

Read Post

Graylog

Read more about Observability vs Monitoring: Getting a Full Picture of the Environment

"Not Having Kentik Is Unacceptable": 5 Service Providers on Kentik

Feb 2, 2026 By Kentik In Kentik

Five customers explain why Kentik is essential for understanding traffic, controlling network cost, and planning for growth. Hear from Sorin Esanu (Race Communications), Michael Leclaire (MetroNet), John Lubeck (Midco), Wallace Lee (Imperva), and Everett Sinclair (Conway Corporation) on how they use Kentik to see “the bits on the wire,” dig deeper than traditional reporting tools, and turn network data into better customer experiences.

View Video

Kentik

Read more about "Not Having Kentik Is Unacceptable": 5 Service Providers on Kentik

3 Service Providers on Kentik AI Advisor: Faster Answers, Faster Fixes

Feb 2, 2026 By Kentik In Kentik

Three service provider customers share how Kentik AI Advisor helps them move faster, troubleshoot smarter, and put network data in more hands across the team. Hear from Everett Sinclair (Conway Corporation), Michael Leclaire (MetroNet), and John Lubeck (Midco) on why they chose Kentik to unify flow analytics, baselining, and anomaly detection in one platform, and how Kentik AI features make it easier to explore, explain, and act on what’s happening in their networks.

View Video

Kentik

Read more about 3 Service Providers on Kentik AI Advisor: Faster Answers, Faster Fixes

Introducing WHOIS History & Monitoring and Phishing Sentinel: Complete Brand Protection for Your DNS Infrastructure

Feb 2, 2026 By DNS Spy In DNS Spy

DNS Spy now offers complete brand protection with WHOIS History & Monitoring and Phishing Sentinel—automatically tracking domain registration changes and detecting phishing variants before they become security incidents.

Read Post

DNS Spy

Read more about Introducing WHOIS History & Monitoring and Phishing Sentinel: Complete Brand Protection for Your DNS Infrastructure

How to Build AIPowered Search with Elasticsearch [2 Min Live Demo]

Feb 2, 2026 By Elastic In Elastic

In this demo, we show how Elasticsearch enables production‑ready GenAI and AI‑powered search applications—from indexing and embedding your data to grounding large language models with RAG. You’ll see how developers can go from raw data to a fully functional GenAI search experience—fast Additional Resources.

View Video

Elastic

Read more about How to Build AIPowered Search with Elasticsearch [2 Min Live Demo]

Watching everything is watching nothing: Sampling strategy for Sentry

Feb 2, 2026 By Kyle Tryon In Sentry

In a high-traffic production environment, telemetry is your most direct link to the user experience. Every Span, Trace, Log, and Replay sent to Sentry gives you high-fidelity visibility into what is actually happening in production. But to extract the most value out of that visibility, you have to know how to filter signal from noise.

Read Post

Sentry

Read more about Watching everything is watching nothing: Sampling strategy for Sentry

Agentic Observability - The Top Observability Trends in 2026

Feb 2, 2026 By Splunk In Splunk

Learn how autonomous agents are using real-time observability telemetry data to diagnose, fix and verify their own work.

View Video

Splunk

Read more about Agentic Observability - The Top Observability Trends in 2026

How Okta keeps 99.99 percent uptime with #datadog

Feb 2, 2026 By Datadog In Datadog

How do you maintain 99.99 percent uptime across thousands of Kubernetes hosts and multiple cloud providers? Okta engineers explain why observability is critical to keeping authentication and authorization services running at scale. Watch how Okta uses Datadog to bring metrics, logs, and traces into a single view, speed up root cause analysis, and reduce time to mitigation while controlling costs.

View Video

Datadog

Read more about How Okta keeps 99.99 percent uptime with #datadog

10 Benefits of Remote Network Monitoring (RMON)

Feb 2, 2026 By Andrii Kernitskyi In Obkio

The rise of hybrid work has fundamentally changed where IT problems occur. Five years ago, most network issues happened in your data center or office network (infrastructure you could access, control, and troubleshoot directly). Today, the majority of critical issues occur in home offices, coffee shops, and remote locations where you have zero infrastructure access and limited visibility.

Read Post

Obkio

Read more about 10 Benefits of Remote Network Monitoring (RMON)

Top 10 SSL Monitoring Tools.

Feb 2, 2026 By Laura Clayton In Uptime Robot

SSL failures don’t usually break a site all at once. A certificate expires, a chain changes, or a browser update tightens rules, and users start seeing warnings before teams notice. By the time alerts fire, trust has already taken a hit. This post reviews SSL monitoring tools from an operational standpoint. How they detect upcoming expirations, validate certificate chains, and surface issues across environments and domains.

Read Post

Uptime Robot

Read more about Top 10 SSL Monitoring Tools.

How to Find and Fix SEO Errors on Your Website with Ahrefs

Feb 2, 2026 By Super Monitoring In Super Monitoring

Ahrefs Site Audit helps you look at your website the way a search engine does. The tool crawls all major website pages to examine their interconnections and loading methods and page display formats which results in a comprehensive SEO problem report that shows which issues require immediate resolution.

Read Post

Super Monitoring

Read more about How to Find and Fix SEO Errors on Your Website with Ahrefs

Operations | Monitoring | ITSM | DevOps | Cloud