Operations | Monitoring | ITSM | DevOps | Cloud

The Benefits of Distributed Network Monitoring for Multi-Site Businesses: Why Hybrid Work Changed Everything

Most companies rewired how their people work, not once but twice. First for remote, then for RTO (Return to Office). Their network monitoring never caught up. So, what happened? IT teams are managing a network that spans headquarters, branch offices, home setups, and cloud apps with tools that still assume everyone's connecting back to one place. When something breaks (and it will), nobody can pinpoint where. IT takes the blame. Users lose productivity. Leadership loses patience.

Evaluating our AI Guard application to improve quality and control cost

This article is part of our series on how Datadog’s engineering teams use LLM Observability to build, monitor, and improve AI-powered systems. Organizations are building AI agents that help users automate work, analyze data, and interact with complex systems through natural language. As these agents become more capable, they also become more complex and exposed to risks such as prompt injection, data leaks, and unsafe code execution.

Using Core Web Vitals in Honeycomb Frontend Telemetry

Google's Core Web Vitals (CWVs) measurements have been used by web administrators and SREs to review frontend application performance metrics, and have been factored into Google's page rankings since 2021. They are also used in Google Analytics, which crawls websites and evaluates performance metrics over a period of multiple days, and with various frontends (desktop web, mobile web, etc.) to establish how well a website performs in production.

The limits of MCP and how Olly surpasses them

Model Context Protocol (MCP) servers act as adapter layers between clients and AI based workloads. MCP installation into an IDE, such as Cursor, brings a wealth of information directly into the developers primary tool, minimizing context switching and, especially in the world of observability, bringing telemetry closer to the code. MCP is not without its limits. These limits initially seem trivial, but in time, some of the inherent limitations to a basic MCP implementation become apparent.

A 4-Month Bug Fixed in <10 Minutes with Olly

In today’s highly interconnected systems, the subtle relationships between services are rarely obvious. Modern, complex architectures generate telemetry that functions less as “flashing signs” and more as faint “breadcrumbs” to be followed across a vast network of signals. In 2025, about two-thirds of outages involved third-party systems like cloud platforms and APIs.

DNS blocklist monitoring now available to all Oh Dear users

Your domain is on a spam blocklist. Password reset emails aren't arriving, order confirmations land in spam, and customers are complaining that "your site doesn't work." By the time you hear about it, the damage has been building for days. We've shipped DNS blocklist monitoring to catch this early. Oh Dear now checks your domain against 11 major blocklists and notifies you the moment you're listed, with direct links to get removed.
Sponsored Post

Cisco Live'26 - Amsterdam: Aligning with the AI-Driven Future

The energy at Cisco Live EMEA in Amsterdam (February 9-13, 2026) was primarily driven by groundbreaking AI announcements, & the event provided Fabrix.ai an opportunity to strengthen our strategic position alongside Cisco and Splunk ecosystems. The event’s focus on AI, highlighted by the recent Cisco AI Summit, emphasizes a clear market direction in which Fabrix.ai is perfectly poised to accelerate innovation.

Database Partitioning: Types, Strategies, and When to Use Each

How database partitioning works in PostgreSQL and MySQL. Range, list, and hash partitioning with SQL examples and guidance on when to partition vs shard. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Database Sharding: How It Works and When You Actually Need It

How database sharding works, common strategies (hash, range, directory), shard key selection, and the operational cost of running a sharded database in production. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Trello outage on February 19, 2026

On February 19, 2026, Trello users around the world began experiencing issues loading boards and accessing their workspaces. StatusGator received the first outage reports at 14:24 UTC and triggered an Early Warning Signal at 14:28 UTC. Trello did not officially acknowledge the incident until 15:08 UTC, after user reports had already subsided. This incident highlights how real time user reports and Early Warning Signals can identify widespread service degradation before providers confirm a problem.