Latest News

Honeycomb Canvas: The Multiplayer Workspace for the Agentic Era

May 20, 2026 By Kale Bogdanovs In Honeycomb

Last week, we launched a major update to Canvas, our investigation workspace. The new Canvas has evolved from an AI co-pilot you chat with to a place where your whole team, human and agent, can work the same problem on the same surface. Auto-investigations begin the moment a trigger, SLO, or anomaly fires. Custom skills encode your team's runbooks so every agent investigates with your team's expertise built in.

Read Post

Honeycomb

Read more about Honeycomb Canvas: The Multiplayer Workspace for the Agentic Era

Introducing Atatus Sensitive Data Classifier

May 20, 2026 By Mohana Ayeswariya J In Atatus

Your logs know too much. Every debug statement, every traced request, every APM span can carry the risk of capturing something they shouldn't. A customer email. A JWT token. A credit card number. An API key that was never meant to leave your payment service. It doesn't look like a breach. There's no alert. Your observability platform just quietly accumulates sensitive data like indexed, replicated, and accessible to every engineer with log query access.

Read Post

Atatus

Read more about Introducing Atatus Sensitive Data Classifier

How we made a SQL query optimization agent 59% more accurate using autoresearch and LLM Observability

May 20, 2026 By Datadog In Datadog

Without experiment infrastructure to help you test your LLM applications, every research session starts with the same questions: What have we tried previously? What were the numbers? Which prompt version produced that result? Why did we discard that approach? The answers live in scattered notes, terminal history, and half-remembered conversations. Each handoff between sessions loses context. In practice, iteration can slow down as teams get bogged down in testing and analysis.

Read Post

Datadog

Read more about How we made a SQL query optimization agent 59% more accurate using autoresearch and LLM Observability

How to audit and clean up monitors effectively

May 20, 2026 By Capucine Marteau In Datadog

Alert fatigue and blind spots develop together. Monitoring stacks that generate noise while missing critical issues may have incomplete coverage or poorly configured alerts. As they grow reactively and without structured coverage assessment, both issues worsen. Teams will often add monitors when something breaks and tune thresholds when alerts become unbearable, but rarely audit their overall setup to see if it works.

Read Post

Datadog

Read more about How to audit and clean up monitors effectively

Microsoft Fabric outage disrupted analytics workloads on May 18, 2026

May 19, 2026 By Andy Libby In StatusGator

On May 18, 2026, organizations using Microsoft Fabric experienced a multi-hour outage that disrupted analytics workloads, reporting systems, and access to platform services across several regions. StatusGator detected the developing incident at 14:00 UTC using Early Warning Signals, 37 minutes before Microsoft officially acknowledged the outage at 14:37 UTC.

Read Post

StatusGator

Read more about Microsoft Fabric outage disrupted analytics workloads on May 18, 2026

The $600 billion wake-up call: New Splunk research reveals downtime is a systemic business crisis

May 19, 2026 By Splunk In Splunk

600 billion annual impact: Aggregate downtime costs for the Global 2000 have soared 50% in two years. $15,000 per minute: The average cost of downtime for organisations, highlighting the immediate financial impact of service disruptions. 3.4% stock price drop: The average decline in shareholder value following a single downtime incident.

Read Post

Splunk

Read more about The $600 billion wake-up call: New Splunk research reveals downtime is a systemic business crisis

The New Compliance Crisis: AI Is Outrunning Its Controls

May 19, 2026 By ScienceLogic In ScienceLogic

Enterprises have spent decades refining compliance frameworks around workflows that were linear, predictable, and well-documented. These frameworks were built for systems that executed actions deterministically and for human operators who made decisions slowly enough for oversight to keep up. In that environment, compliance could function as a retrospective discipline because the evidence required to validate behavior generally existed in complete, stable form.

Read Post

ScienceLogic

Read more about The New Compliance Crisis: AI Is Outrunning Its Controls

12 IT Infrastructure Best Practices Every IT Leader Should Follow

May 19, 2026 By Jagdish Sajnani In Motadata

Why do IT infrastructure issues continue to slow down teams even when tools keep improving? In most IT environments, the challenge is not a single failure. It is a set of ongoing operational gaps that are easy to overlook but difficult to control over time. A few of the common challenges include: In 2026, IT environments are more distributed and fast-changing than before. Hybrid infrastructure, cloud adoption, and strict compliance requirements make consistency harder to maintain.

Read Post

Motadata

Read more about 12 IT Infrastructure Best Practices Every IT Leader Should Follow

Why SRE agents need orchestration, not just more tools

May 19, 2026 By Mezmo In Mezmo

Single agents are a useful starting point for SRE workflows. They are not where the architecture should end. The first version is simple enough: connect an LLM to a few tools, give it a system prompt, and point it at your infrastructure. It can summarize an alert, pull logs, answer questions, and draft a useful next step. Then the workflow gets real. You add GitHub for runbooks, Kubernetes for cluster state, PagerDuty for incident context, Prometheus for metrics, and Mezmo for telemetry.

Read Post

Mezmo

Read more about Why SRE agents need orchestration, not just more tools

Agent Timeline: The Flight Recorder for Your AI Agents

May 19, 2026 By Dan Juengst In Honeycomb

Last week, we introduced Agent Timeline, a powerful new observability experience purpose-built for debugging AI agent workflows in production. Agent Timeline uniquely connects AI-layer visibility to full-stack observability by organizing telemetry around an agentic conversation. A conversation contains one or more agent executions, each of which may contain LLM calls, tool invocations, handoffs, retries, human escalations, and downstream system calls.

Read Post

Honeycomb

Read more about Agent Timeline: The Flight Recorder for Your AI Agents

Operations | Monitoring | ITSM | DevOps | Cloud

Honeycomb Canvas: The Multiplayer Workspace for the Agentic Era

Introducing Atatus Sensitive Data Classifier

How we made a SQL query optimization agent 59% more accurate using autoresearch and LLM Observability

How to audit and clean up monitors effectively

Microsoft Fabric outage disrupted analytics workloads on May 18, 2026

The $600 billion wake-up call: New Splunk research reveals downtime is a systemic business crisis

The New Compliance Crisis: AI Is Outrunning Its Controls

12 IT Infrastructure Best Practices Every IT Leader Should Follow

Why SRE agents need orchestration, not just more tools

Agent Timeline: The Flight Recorder for Your AI Agents

Monthly Archive

Follow Us