Monthly Archive

Automation Observability: See It, Fix It, Skip the Firefighting

Sep 30, 2025 By Derek Pascarella In Resolve

IT leaders know the drill. An alert storm rolls in and the tickets pile up. Your team scrambles to piece together root causes before service degradation kicks in. But the firefighting rages on, even when you have enough dashboards, monitoring, and alerts to light up a Christmas tree. Enterprise leaders need to quit burning budget on shiny dashboards that look good in the boardroom but do nothing to stop outages in the real world.

Read Post

Resolve

Read more about Automation Observability: See It, Fix It, Skip the Firefighting

Paving the way for a new era: Mezmo's Active Telemetry

Sep 30, 2025 By Mezmo In Mezmo

The world of software development has fundamentally changed. We've moved from monthly releases to continuous delivery measured in minutes, and the rise of AI means velocity is no longer just a goal—it's a requirement for survival. But this relentless speed has exposed a critical flaw in how we approach observability. The industry relies on a "store first, ask questions later" model where you collect every log, metric, and trace, and then hope to find the root cause when something breaks.

Read Post

Mezmo

Read more about Paving the way for a new era: Mezmo's Active Telemetry

OpenMetrics vs OpenTelemetry - A guide on understanding these two specifications

Sep 28, 2025 By Bhupesh Varshney In SigNoz

OpenMetrics and OpenTelemetry are popular standards for instrumenting cloud-native applications. Both projects are part of the Cloud Native Computing Foundation (CNCF) and aim to simplify how we generate, collect and monitor services in a modern cloud-native distributed application environment. Let's have a look at how both the standards are aiming to help solve the observability conundrum.

Read Post

SigNoz

Read more about OpenMetrics vs OpenTelemetry - A guide on understanding these two specifications

How to Become an SRE Engineer

Sep 27, 2025 By Alexandr Bandurchin In Uptrace

Site Reliability Engineering has emerged as one of the most sought-after careers in tech, combining software engineering expertise with operational excellence. SRE engineers ensure that critical systems remain reliable, scalable, and performant while enabling rapid feature development. With the global SRE job market projected to grow by over 25% in 2025, skilled professionals in this field command competitive salaries and enjoy diverse career opportunities across industries.

Read Post

Uptrace

Read more about How to Become an SRE Engineer

Grafana Labs Co-founder Woods: Market maturity, OpenTelemetry, and AI are reshaping observability

Sep 26, 2025 By Mikhail Kho In Grafana

As organizations navigate increasingly complex tech environments, unified observability practices have become essential. That was one of the main takeaways from Grafana Labs Co-founder Anthony Woods’ recent appearance on “Tech Keys by by Mercari India,” a podcast hosted by Vaibhav Khurana, Head of Platform Engineering at Mercari India.

Read Post

Grafana

Read more about Grafana Labs Co-founder Woods: Market maturity, OpenTelemetry, and AI are reshaping observability

CriblCon sneak peek with AlphaSoc

Sep 25, 2025 By Cribl In Cribl

The countdown to is on and we’re giving you an exclusive first look at the expert insights, innovative solutions, and success stories you’ll see on the big stage. Join us as we chat with Chris McNab, Founder of AlphaSOC, a security startup that processes network telemetry to uncover infected hosts, emerging threats, and targeted attacks.

View Video

Cribl

Read more about CriblCon sneak peek with AlphaSoc

Monitor your data pipelines with Airflow lineage

Sep 25, 2025 By Thomas Sobolik In Datadog

In complex data pipelines with dozens of jobs and intermediary datasets, it can be difficult to effectively monitor how data travels and changes through various steps. When tracking issues in these pipelines, you need visibility into upstream components where the root cause may originate from, as well as downstream datasets and consumers of data that may be experiencing further impacts.

Read Post

Datadog

Read more about Monitor your data pipelines with Airflow lineage

Introducing the BigPanda observability and monitoring tool rationalization framework

Sep 25, 2025 By BigPanda In BigPanda

When enterprises run dozens of monitoring and observability tools, performance gaps almost always emerge. By applying the BigPanda Observability Scorecard, our customers consistently see their tool portfolio fall into three groups: In some cases, removing bottom-tier tools can reduce portfolio complexity by double digits while cutting operational noise by as much as 35-40%. This simplification reduces costs while creating a leaner, more reliable monitoring environment that strengthens service availability and operational efficiency.

View Video

BigPanda

Read more about Introducing the BigPanda observability and monitoring tool rationalization framework

How to analyze observability and monitoring tools for actionability

Sep 25, 2025 By BigPanda In BigPanda

Choosing the right observability tools is critical so ensure your teams get actionable insights. In this video, we explore how to evaluate observability platforms based on their ability to detect anomalies, link causes, and trigger effective responses.

View Video

BigPanda

Read more about How to analyze observability and monitoring tools for actionability

Qovery Observe is Here: Your Deployments, Your Data, Your Visibility

Sep 25, 2025 By Julien Dan In Qovery

Monitor your deployments with Qovery Observe: real-time metrics, logs, and events, directly integrated with your AWS applications and containers.

Read Post

Qovery

Read more about Qovery Observe is Here: Your Deployments, Your Data, Your Visibility

Monitor and optimize your systems with Uptrace

Sep 24, 2025 By Uptrace In Uptrace

Uptrace is your single source of truth for monitoring, understanding, and optimizing complex distributed systems. Proven in production for over five years and trusted by more than a thousand installations worldwide, it lets you see your system like never before. What makes the difference is that Uptrace is pure OpenTelemetry, built natively from day one. This isn't a translation layer—it's a direct connection that eliminates friction and ensures zero vendor lock-in. Your homepage serves as your command center, providing complete visibility across your stack at a glance.

View Video

Uptrace

Read more about Monitor and optimize your systems with Uptrace

How to Push Prometheus Metrics to Splunk Observability Cloud with the OpenTelemetry Collector

Sep 24, 2025 By Splunk In Splunk

In this video, you’ll learn how to scrape Prometheus endpoints with the OpenTelemetry Collector’s Prometheus receiver and send metrics to Splunk Observability Cloud. We’ll walk through configuring three common data sources (a Python Flask app, node_exporter for host metrics, and the NGINX Prometheus exporter), show how to enrich metrics with resource attributes, and build simple charts in Splunk Observability Cloud. You’ll see how centralized scraping and consistent tagging make it easy to manage and visualize Prometheus metrics in Splunk Observability Cloud.

View Video

Splunk

Read more about How to Push Prometheus Metrics to Splunk Observability Cloud with the OpenTelemetry Collector

MCP Design Principles

Sep 24, 2025 By Honeycomb In Honeycomb

You can give AI agents everywhere fingers & eyes into your tool or service, by implementing an MCP (Model Context Protocol) server. It’s a great idea! It’s also a new kind of design and engineering. Jessica describes how it’s different from implementing an API or a GUI, and why it’s more exciting than either.

View Video

Honeycomb

Read more about MCP Design Principles

Kubernetes Observability: Your Q&A Guide to Calico Whisker

Sep 24, 2025 By Reza Ramezanpour In Tigera

Getting the most out of Whisker requires understanding its inner workings and this guide is designed to help you master this exciting tool with support from the Calico community. We’ve compiled the most frequently asked questions from our community Slack, support conversations, and CalicoCon sessions. This Q&A covers everything from initial installation tips and version requirements to advanced topics like filtering flow logs and integrating with Goldmane, the powerful API that underpins Whisker.

Read Post

Tigera

Read more about Kubernetes Observability: Your Q&A Guide to Calico Whisker

How to Responsibly and Effectively Contribute to Open Source Using AI

Sep 24, 2025 By Tyler Helmuth In Honeycomb

With the influx of AI tooling, it’s never been easier to contribute to open source communities. These tools are capable of gathering context quickly, “understanding” repositories faster than ever before. They provide instant summaries about repositories that, previously, would have meant reading lines and lines of code. They can fix bugs in programming languages you don’t know, and ultimately allow more contributors to get involved, which (almost) every open source project wants.

Read Post

Honeycomb

Read more about How to Responsibly and Effectively Contribute to Open Source Using AI

Memory stall: the agony before OOM

Sep 23, 2025 By Nikolay Sivko In Coroot

When we set a memory limit for a container, the expectation is simple: if the app leaks memory, the OOM killer steps in, the container dies, Kubernetes restarts it, done. But reality is messier. As a container gets close to its memory limit, allocations don’t just fail instantly. They get slower. The kernel tries to reclaim memory inside the cgroup, and that takes time. Instead of being killed right away, your app just crawls.

Read Post

Coroot

Read more about Memory stall: the agony before OOM

Your Next Observability RFP is All Wrong. Why AI Changes Everything

Sep 23, 2025 By Asaf Yigal In logz.io

AI-first observability addresses two of the most pressing troubleshooting challenges: complex IT environments and AI-generated code. But understanding how to implement AI in a way that brings ROI, requires cutting through the hype and maintaining realistic expectations, while keeping a forward-thinking vision. In this blog post, we bring practical tips for including AI in your next observability RFP. The article is based on a webinar held with Logz.io founders, CEO Tomer Levy and CTO Asaf Yigal.

Read Post

logz.io

Read more about Your Next Observability RFP is All Wrong. Why AI Changes Everything

Integrating JMX and OpenTelemetry

Sep 22, 2025 By Alex Boten In Honeycomb

The OpenTelemetry community and the contributors to the Java Special Interest Group (SIG) have spent a great deal of time integrating core Java technologies into the project. An integration that is particularly useful is Java Management Extensions (JMX). It has been around since J2SE 5, and has been mature for some time. Many of the most widely used Java applications have adopted it over time and support this extension.

Read Post

Honeycomb

Read more about Integrating JMX and OpenTelemetry

The one where we talk about Cribl Guard

Sep 22, 2025 By Cribl In Cribl

Manual hunts for sensitive data are slow, error-prone, and expensive. Cribl Guard combines advanced AI with a human-in-the-loop control point to spot sensitive data, such as credit card, passport, and Social Security numbers, as it flows through Cribl Stream. Whether you’re fully cloud or hybrid, Cribl Guard puts you firmly in control of every piece of sensitive information that crosses your pipes.

View Video

Cribl

Read more about The one where we talk about Cribl Guard

Instrumenting the Node.js event loop with eBPF

Sep 19, 2025 By Nikolay Sivko In Coroot

Recently, I was testing Coroot’s AI Root Cause Analysis on failure scenarios from the OpenTelemetry demo. One of them, loadgeneratorFloodHomepage, simulates a flood of excessive requests. As expected, it caused a latency degradation across the stack. Coroot’s RCA highlighted how the latency cascaded through all dependent services. At the same time, we noticed a moderate increase in CPU usage for the frontend service and the node itself.

Read Post

Coroot

Read more about Instrumenting the Node.js event loop with eBPF

Your Next Observability RFP Is All Wrong: Why AI Changes Everything

Sep 18, 2025 By Logz.io In logz.io

Watch how AI is reshaping observability for the years ahead. In this fireside chat, Logz.io founders Tomer Levy and Asaf Yigal reveal how the most innovative AI-first companies are breaking free from dashboards, avoiding common RFP mistakes, and building future-ready stacks. You’ll see: Watch and learn how autonomous AI eliminates noise, slashes costs, and gives engineering teams back their velocity.

View Video

logz.io

Read more about Your Next Observability RFP Is All Wrong: Why AI Changes Everything

LLM app Observability: Opentelemetry as a standard

Sep 18, 2025 By SigNoz - Open Source Observability Platform In SigNoz

LLM observability is broken There are too many new libraries floating around, but they don't follow accurately the OpenTelemetry conventions. OTel isn’t perfect for LLMs yet—but extending a proven standard beats inventing another one. Why not use the same standard (OTel) which works so well for rest of the apps, and just work on top of it? This is what I was ranting with Pranav Raj S, co-founder at Chatwoot and we thought there must be other folks facing similar issues.

View Video

SigNoz

Read more about LLM app Observability: Opentelemetry as a standard

OpenTelemetry Observability: An In-Depth Look at Features and Best Practices

Sep 18, 2025 By Rotem Froimovici In logz.io

OpenTelemetry (OTel) is a unified framework of APIs, SDKs and tools, for collecting, processing, and exporting telemetry data (logs, metrics, and traces) across applications and infrastructure. OTel is especially required in today’s cloud-native world, where applications run on microservices, Kubernetes, and distributed systems.

Read Post

logz.io

Read more about OpenTelemetry Observability: An In-Depth Look at Features and Best Practices

Observability Day San Francisco: The Future of AI and Observability Is Bright

Sep 18, 2025 By Ken Rimple In Honeycomb

AI and observability are no longer separate conversations—they’re deeply intertwined. Across keynotes, panels, and demos, speakers at Honeycomb's Observability Day San Francisco unpacked what that means for engineering teams today: faster insights, smarter tools, and new challenges to solve.

Read Post

Honeycomb

Read more about Observability Day San Francisco: The Future of AI and Observability Is Bright

Monitor and optimize your systems with Uptrace

Sep 18, 2025 By Uptrace In Uptrace

View Video

Uptrace

Read more about Monitor and optimize your systems with Uptrace

From Monitoring to Meaning: Why Service Observability Platforms Are Essential for Modern Enterprises

Sep 17, 2025 By david.arrowsmith In Interlink

At Interlink, we believe the future of IT Operations (ITOps) is about Service Observability, incident prevention and automated remediation.

Read Post

Interlink

Read more about From Monitoring to Meaning: Why Service Observability Platforms Are Essential for Modern Enterprises

What does the EU Data Act mean for Observability?

Sep 17, 2025 By Chris Cooney In Coralogix

The EU Data Act came into effect on January 12th, 2024 and most of its provisions apply from September 12th, 2025. The EU Data Act is designed to give individuals and businesses more control over the data they generate, ensuring fair access, use, and sharing across sectors. For any data generating platform that intends to operate in the European Union, this new legislation matters.

Read Post

Coralogix

Read more about What does the EU Data Act mean for Observability?

Observability and IT Monitoring Governance: Establishing Order (Part 3 of 4)

Sep 16, 2025 By Ravishu Arora In Broadcom

In our previous posts, we explored why robust IT monitoring governance is no longer a luxury but a strategic imperative. We highlighted how a disciplined framework prevents blind spots, reduces risk, and ensures the reliability and scalability of your critical business applications. But how do you translate these principles into practical, actionable governance within your IT environment?

Read Post

Broadcom

Read more about Observability and IT Monitoring Governance: Establishing Order (Part 3 of 4)

Observability and IT Monitoring Governance (Part 4 of 4)

Sep 16, 2025 By Steve Danseglio In Broadcom

Following parts one, two, and three of this blog series, this post offers a short, real-world example that shines light on why strong monitoring governance is a must have.

Read Post

Broadcom

Read more about Observability and IT Monitoring Governance (Part 4 of 4)

Unlock Real-Time AWS Observability With Streaming Ingestion in DX Operational Observability

Sep 16, 2025 By Ashish Aggarwal In Broadcom

In fast-paced cloud environments, traditional monitoring methods often fall short. This leaves teams with latency and data gaps. It’s time to gain near real-time visibility into your AWS telemetry, enabling faster incident response and deeper insights. With its new streaming ingestion capabilities, DX Operational Observability (DX O2) is revolutionizing cloud monitoring—enabling teams to leverage AWS CloudWatch Metric Streams and Amazon Kinesis Data Firehose.

Read Post

Broadcom

Read more about Unlock Real-Time AWS Observability With Streaming Ingestion in DX Operational Observability

Calico Whisker vs. Traditional Observability: Why Context Matters in Kubernetes Networking

Sep 16, 2025 By Reza Ramezanpour In Tigera

Are you tired of digging through cryptic logs to understand your Kubernetes network? In today’s fast-paced cloud environments, clear, real-time visibility isn’t a luxury, it’s a necessity. Traditional logging and metrics often fall short, leaving you without the context needed to troubleshoot effectively. That’s precisely what Calico Whisker’s recent launch (with Calico v3.30) aims to solve. This tool provides clarity where logs alone fall short.

Read Post

Tigera

Read more about Calico Whisker vs. Traditional Observability: Why Context Matters in Kubernetes Networking

Bridging the Gap Integrating Logs Metrics and Flow for Observability

Sep 16, 2025 By VictoriaMetrics In VictoriaMetrics

In this video, we discuss handling both old and new systems in IT environments. From legacy SNMP setups to modern telemetry, most organizations juggle multiple data sources, which can make observability feel overwhelming. We explore how to combine logs, metrics, and flow data into one system that provides actionable insights. You’ll see practical examples of simplifying scattered tools and making sense of complex, disparate information. Understanding how these different types of data work together is key to getting observability right.

View Video

VictoriaMetrics

Read more about Bridging the Gap Integrating Logs Metrics and Flow for Observability

Smoother, smarter observability with the updated Site24x7 iOS 26

Sep 15, 2025 By Ramkumar Ramaswamy In ManageEngine

Enjoy improved control, clarity, and communication using the Site24x7 app on iOS 26. This update blends Apple's dynamic liquid glass design language with fast, secure, on-device AI summaries that help you observe your IT stack instantly and act decisively, from anywhere.

Read Post

ManageEngine

Read more about Smoother, smarter observability with the updated Site24x7 iOS 26

Background Job Observability Beyond the Queue

Sep 15, 2025 By Anjali Udasi In Last9

Background jobs handle the critical work that happens outside the request path: processing payments, sending emails, generating reports, syncing data. They keep applications running smoothly, but the signals they produce look different from API endpoints. Most teams start with queue metrics—how many jobs are waiting and how quickly they complete. These metrics provide the foundation, but job health extends beyond throughput.

Read Post

Last9

Read more about Background Job Observability Beyond the Queue

Honeycomb Observability Day San Francisco

Sep 15, 2025 By Honeycomb In Honeycomb

Did you miss Honeycomb's Observability Day San Francisco? Here are some highlights of the day.

View Video

Honeycomb

Read more about Honeycomb Observability Day San Francisco

LangChain Observability: Monitoring Guide for Production Apps

Sep 15, 2025 By Alexandr Bandurchin In Uptrace

LangChain applications fail differently than traditional web apps. A single user request can trigger 15+ LLM calls, cost $5 in tokens, and fail silently without throwing errors. One team discovered a $12,000 OpenAI bill caused by a recursive chain with no monitoring. This guide shows how to implement observability for LangChain applications, giving you complete visibility into performance, costs, and errors before they impact your users or budget.

Read Post

Uptrace

Read more about LangChain Observability: Monitoring Guide for Production Apps

Introducing Cost Meter - Proactive Observability Cost Control with Per-Hour Granularity

Sep 12, 2025 By Anushka Karmakar In SigNoz

The irony isn't lost on us - observability platforms are built to be proactive about system health, yet when it comes to managing observability costs themselves, teams are forced to be reactive. Today, that changes with Cost Meter, now live in our platform. Cost Meter transforms observability spend management from a monthly billing surprise into a proactive, data-driven process with hourly aggregated metrics that give you complete visibility into your telemetry ingestion patterns.

Read Post

SigNoz

Read more about Introducing Cost Meter - Proactive Observability Cost Control with Per-Hour Granularity

Why Open Source Is Important: Accesibility to Innovation for Everyone

Sep 12, 2025 By Coroot In Coroot

Valkey OSS Developer Advocate Roberto Luna Rojas shares why open source matters to him - and the world.

View Video

Coroot

Read more about Why Open Source Is Important: Accesibility to Innovation for Everyone

What is Service Catalog Observability and How Does It Work?

Sep 12, 2025 By Faiz Shaikh In Last9

A service catalog gives teams a shared view of their systems—what services exist, who owns them, how dependencies are structured, and the SLAs that guide expectations. It’s an important part of development infrastructure because it helps everyone speak the same language about services. Service catalog observability builds on that foundation.

Read Post

Last9

Read more about What is Service Catalog Observability and How Does It Work?

APM vs Observability: Observing beyond APM

Sep 11, 2025 By Leon Adato In Catchpoint

In my previous post I made a bold, sweeping statement that APM is not - in the most specific sense - a subset of observability. Still standing by it I stand by that because words matter and - like many "monitoring engineers" (IT folks who make monitoring and observability their specialty) - I, too, bear scars from the flame-wars on Twitter back in the 2020's where we fought internecine battles over the proper definition of (and number of pillars in) “observability”.

Read Post

Catchpoint

Read more about APM vs Observability: Observing beyond APM

Introducing Honeycomb Intelligence Canvas

Sep 11, 2025 By Honeycomb In Honeycomb

Canvas is an AI-guided workspace inside Honeycomb that combines an AI assistant with an interactive notebook for visualizing query results and traces. You can ask a natural language question about your data and Canvas will immediately start exploring your traces, through multiple queries and other tools, to find the right next steps. Instead of having to write each query yourself, Canvas automatically proposes relational queries, comparisons, and visualizations that explain why an SLO fired or what changed after a deploy.

View Video

Honeycomb

Read more about Introducing Honeycomb Intelligence Canvas

Pastries with SREs: Limitless observability and uncompromised donuts

Sep 11, 2025 By Elastic In Elastic

In this episode of Pastries with SREs, we dig into Limitless Observability with a sweet side of unified observability strategy. If you're tired of siloed tools, fractured data, and swivel-chair investigations, this one’s for you. We explore: Why are silos still the norm in modern observability? What’s the true cost of inefficiencies across logs, metrics, and traces? How can SREs, IT operations, and dev teams shift to a no-compromise, unified observability model?

View Video

Elastic

Read more about Pastries with SREs: Limitless observability and uncompromised donuts

Meet Canvas: Your AI-guided Workspace Within Honeycomb

Sep 11, 2025 By Morgante Pell In Honeycomb

Modern systems are wonderfully capable, but relentlessly complex. Debugging across microservices, frontends, and cloud edges often means switching between five or more tools, trying to stitch together “what changed” and “why it broke.” Honeycomb’s wide events model has proven to be a superpower for taming that complexity, by allowing you to easily observe and query end-to-end traces without worrying about how much granular data you attach to your events.

Read Post

Honeycomb

Read more about Meet Canvas: Your AI-guided Workspace Within Honeycomb

Full-Stack Observability with VictoriaMetrics in the OTel Demo

Sep 10, 2025 By Diana Todea In VictoriaMetrics

The OpenTelemetry Astronomy Shop is a widely used demonstration environment designed to illustrate the concepts and practical implementation of observability in distributed systems. Built as a microservice-based e-commerce application, the demo provides developers with a near real-world environment where they can explore how telemetry data—metrics, logs, and traces—can be collected, processed, and visualized.

Read Post

VictoriaMetrics

Read more about Full-Stack Observability with VictoriaMetrics in the OTel Demo

Introducing Anomaly Detection: Your Early Warning System for Service Health

Sep 10, 2025 By Matt Ransford In Honeycomb

Modern engineering teams face a persistent challenge: knowing when something goes wrong before their customers do. With microservices architectures sprawling across dozens or hundreds of services, creating comprehensive alerting becomes an overwhelming task. You're left playing whack-a-mole with manual alert configurations, often missing critical issues or drowning in false positives.

Read Post

Honeycomb

Read more about Introducing Anomaly Detection: Your Early Warning System for Service Health

Visually identify observability gaps with Cloudcraft in Datadog

Sep 10, 2025 By Jace Harker In Datadog

Modern cloud environments are highly complex and dynamic, with critical services relying on large numbers of ephemeral resources. Ensuring observability coverage across this landscape is essential for troubleshooting, maintaining reliability, optimizing performance, and enforcing security standards. But as environments grow more elaborate and their ownership more dispersed, tracking observability coverage becomes increasingly challenging.

Read Post

Datadog

Read more about Visually identify observability gaps with Cloudcraft in Datadog

Visualize Logs Alongside Metrics: Complete Observability Elasticsearch Performance

Sep 10, 2025 By Benjamin Pitts In MetricFire

Elasticsearch is a distributed search and analytics engine that powers everything from log management platforms to e-commerce search bars. It excels at indexing and retrieving large volumes of data quickly, but like any complex system it can slow down under heavy load or inefficient queries.

Read Post

MetricFire

Read more about Visualize Logs Alongside Metrics: Complete Observability Elasticsearch Performance

Introducing Honeycomb Intelligence Anomaly Detection

Sep 10, 2025 By Honeycomb In Honeycomb

Modern teams face a persistent challenge: knowing when something goes wrong before their customers do. With architectures sprawling across dozens or hundreds of services, creating comprehensive alerting becomes an overwhelming task. You're left playing whack-a-mole with manual alert configurations, often missing critical issues or drowning in false positives. Today, we're excited to announce our solution to this challenge: Anomaly Detection (currently in alpha), Honeycomb's proactive approach to understanding and acting on service health.

View Video

Honeycomb

Read more about Introducing Honeycomb Intelligence Anomaly Detection

Observability and Monitoring Governance (Part 1 of 4)

Sep 9, 2025 By Steve Danseglio In Broadcom

In contrast to the many flavors of governance used for IT, such as data governance, audit and compliance, and governance and security, IT monitoring governance lacks a definition in many organizations. This is true even as teams have decades of experience monitoring the health, performance, and availability of applications, infrastructures, networks, and user experience. Good monitoring governance “just sort of happens—naturally, organically.” Not exactly!

Read Post

Broadcom

Read more about Observability and Monitoring Governance (Part 1 of 4)

Observability and Monitoring Governance (Part 2 of 4)

Sep 9, 2025 By Steve Danseglio In Broadcom

“How did we fail to monitor xyz prior to this incident?" “We should monitor everything" “Are we vetting applications prior to deployment, including security apps that may adversely affect application performance and responsiveness?”

Read Post

Broadcom

Read more about Observability and Monitoring Governance (Part 2 of 4)

Introducing Honeycomb Intelligence MCP Server - Now GA!

Sep 9, 2025 By Honeycomb In Honeycomb

In the months since we launched our public beta, we’ve been hard at work making Honeycomb MCP more useful and capable for agents and human operators alike. Our goal with this project has been, from the start, to allow AI to engage in the same kind of investigatory loops that we guide users towards. Many of the new features are designed expressly with this in mind, the most exciting of which is BubbleUp, now available in.

View Video

Honeycomb

Read more about Introducing Honeycomb Intelligence MCP Server - Now GA!

Creating Calculated Fields with Honeycomb AI

Sep 9, 2025 By Honeycomb In Honeycomb

Did you know you can define a calculated field in your Honeycomb queries? You can, and with the power of Honeycomb AI you can ask it to write the calculated field definition for you. Find out how in this short video.

View Video

Honeycomb

Read more about Creating Calculated Fields with Honeycomb AI

Honeycomb MCP Is Now In GA With Support for BubbleUp, Heatmaps, and Histograms

Sep 9, 2025 By Austin Parker In Honeycomb

If you’ve been following my public journey with LLMs this year, it probably won’t surprise you to learn that this blog post is an announcement about the general availability of Honeycomb’s hosted MCP server. I want to share a few updates about what’s new in the GA release, discuss some interesting learnings from building it, and share examples of how we’re using MCP internally. First: if you're still in the dark about MCP and AI agents, go read the earlier blogs I linked.

Read Post

Honeycomb

Read more about Honeycomb MCP Is Now In GA With Support for BubbleUp, Heatmaps, and Histograms

Observability Journey Panel - Dell x TekStream

Sep 8, 2025 By Grafana In Grafana

Join Dell Technologies, TekStream Solutions, and Grafana Labs for a candid panel on scalining observability. Learn how enterprise teams scale observability, balance centralized vs. decentralized models, and accelerate adoption. The panel explores challenges with culture, governance, tool sprawl, and how AI is reshaping monitoring and incident response.

View Video

Grafana

Read more about Observability Journey Panel - Dell x TekStream

Software-Defined Healthcare: Modernizing Through DevOps, Observability & AIOps

Sep 8, 2025 By OpsMatters In OpsMatters

Healthcare delivery is undergoing a transformation unlike any other. Digital systems now shape how physicians deliver care, how practices are managed, and how patients experience the health system. From cloud-native platforms to intelligent automation, the shift toward software-defined healthcare is revolutionizing clinical operations. At the heart of this change are three critical enablers: DevOps, Observability, and AIOps. Together, they form the backbone of a modern healthcare IT environment, driving resilience, agility, and patient-centered outcomes.

Read Post

OpsMatters

Read more about Software-Defined Healthcare: Modernizing Through DevOps, Observability & AIOps

How Teams Are Using AI to Tackle Observability Challenges (2025 Survey Insights) | Grafana Labs

Sep 5, 2025 By Grafana In Grafana

In Grafana’s 3rd annual Observability Survey, over 1,000 engineers and leaders shared their challenges — tool sprawl, complexity, rising costs, and nonstop alerts — and their hopes for how AI can help.

View Video

Grafana

Read more about How Teams Are Using AI to Tackle Observability Challenges (2025 Survey Insights) | Grafana Labs

SvelteKit observability just got 10x better, and we're here for it

Sep 4, 2025 By Lukas Stracke In Sentry

The Svelte Team recently announced full observability and tracing support for SvelteKit! This is great news for SvelteKit and Sentry users, since Sentry is already compatible with the new feature! In addition, this is even greater news for the JavaScript ecosystem as a whole because SvelteKit just became the first ESM-based meta-framework to support instrumentation and tracing out of the box.

Read Post

Sentry

Read more about SvelteKit observability just got 10x better, and we're here for it

Sharpening My React Hooks Knowledge With ChatGPT

Sep 4, 2025 By Kat Telles In Honeycomb

I’m a product engineer at Honeycomb. While my work spans the stack, I’m currently focused on deepening my frontend expertise. To support this, I’ve been using ChatGPT as a study assistant. It’s helped me break down complex topics with clear explanations, real-world examples, and—critically—interactive practice. The most effective formats I’ve found.

Read Post

Honeycomb

Read more about Sharpening My React Hooks Knowledge With ChatGPT

Problems with Cloud Workload Observability

Sep 4, 2025 By Pepperdata In Pepperdata

#shorts #cloudcomputing #aws #costoptimization #kubernetes

View Video

Pepperdata

Read more about Problems with Cloud Workload Observability

Grafana Cloud: Beyond "Just" Observability

Sep 4, 2025 By Grafana In Grafana

From AWS to Zendesk, and across all teams, Ove at CLAAS explains a variety of approaches that go beyond traditional app monitoring. By focusing on the data itself, one can deliver deeper insights and a fuller understanding of your entire system.

View Video

Grafana

Read more about Grafana Cloud: Beyond "Just" Observability

The Fourth Pillar of Observability

Sep 3, 2025 By Lily Waldorf In Coralogix

Your application is only as reliable as the infrastructure it runs on. Most commonly, that means Kubernetes is doing the job by managing fleets of containers, scaling services on demand, and keeping workloads distributed across nodes. Traditional dashboards weren’t built to scale with this reality. They give you snapshots of raw metrics. They don’t scale to multi-cluster environments. They don’t map relationships between resources.

Read Post

Coralogix

Read more about The Fourth Pillar of Observability

Bridging the Gap: Legacy Systems and Modern Observability

Sep 3, 2025 By Datadog In Datadog

Technology moves quickly and while the spotlight has shifted to dynamic, cloud-based systems, many organizations have legacy applications and infrastructure that they must maintain. In this fireside chat, Datadog’s Matt Moore (Principal Observability Strategist) will host James Flores (Enterprise Systems Engineer) at Australian Community Media to discuss their journey of modernization and bridging legacy systems with the cloud using a bit of ingenuity and observability.

View Video

Datadog

Read more about Bridging the Gap: Legacy Systems and Modern Observability

Bringing Observability to Claude Code: OpenTelemetry in Action

Sep 3, 2025 By Goutham Karthi In SigNoz

AI coding assistants like Claude Code are becoming core parts of modern development workflows. But as with any powerful tool, the question quickly arises: how do we measure and monitor its usage? Without proper visibility, it’s hard to understand adoption, performance, and the real value Claude brings to engineering teams. For leaders and platform engineers, that lack of observability can mean flying blind when it comes to understanding ROI, productivity gains, or system reliability.

Read Post

SigNoz

Read more about Bringing Observability to Claude Code: OpenTelemetry in Action

Actionable insights into the end-user experience: an overview of Grafana Cloud Frontend Observability dashboards

Sep 2, 2025 By Bukola Ayodele In Grafana

One of the biggest challenges in frontend development is identifying when and why users encounter performance issues, whether it’s slow page loads, JavaScript errors, or failed HTTP requests. With Grafana Cloud Frontend Observability — a hosted service for real user monitoring (RUM) — you get immediate, clear, and actionable insights into the end-user experience of your web applications.

Read Post

Grafana

Read more about Actionable insights into the end-user experience: an overview of Grafana Cloud Frontend Observability dashboards

What is Vector Search? (Ft. Symphonic Metal)

Sep 2, 2025 By Coroot In Coroot

Valkey OSS Developer Advocate Roberto Luna Rojas explains what a Vector is, how Valkey (and Vector) search works, and gives us an example using an interesting combination of music tastes.

View Video

Coroot

Read more about What is Vector Search? (Ft. Symphonic Metal)

Operations | Monitoring | ITSM | DevOps | Cloud