Operations | Monitoring | ITSM | DevOps | Cloud

DX Operational Observability: Troubleshoot WebHook Notification Channels with WebHook Data Collector

The power of AIOps and Observability relies on the ability to ingest, normalize, and correlate the large volumes and huge variety of data available to IT operations teams. With its support for both Broadcom and third-party data, DX Operational Observability (DX O2) gives these teams unmatched observability and insights. With so much data coming to DX O2, monitoring operators need to be notified when important events may occur: Without notifications, important alerts may be overlooked.

Troubleshoot microservice-based apps faster with Splunk Observability Cloud

When something goes wrong with your microservice-based apps, Splunk Observability Cloud offers a unified Observability platform to make debugging processes easier and faster. By using features like the Service Map to identify the cause of the error and Related Logs in Log Observer to pinpoint its location, you can get back up and running quickly, limiting the impact to your bottom line and keeping your customers happy.

Enabling Design System Observability Using Honeycomb

At Honeycomb, we’re actively growing our design system, Lattice, to ensure accessibility, optimize performance, and establish consistent design patterns across our product. One metric we use to measure Lattice is the adoption of components across the product. Adoption is about understanding how, where, and why they’re being used.

Simplifying public sector observability with OpenTelemetry and Elastic

Public sector organizations today face unique challenges in maintaining and optimizing their IT infrastructure and prioritizing efficiency and interoperability. With a mix of modern cloud and legacy systems, ensuring consistent performance, reliability, and security is paramount. To effectively observe across these environments, government agencies need observability tools that are open, flexible, and scalable. OpenTelemetry (OTel) is fast becoming a pivotal part of that flexible toolset.

Coroot v1.9: Kubernetes-Native Database Monitoring Made Easy

From day one, we built Coroot to work beyond just Kubernetes. Many teams still run databases and other stateful services on dedicated VMs or bare-metal servers. But that’s starting to change. More and more teams no longer see Kubernetes as a platform just for stateless apps. Powerful Kubernetes operators now handle day-2 operations like failover, backups, and disaster recovery—making it easier than ever to run databases on Kubernetes. And the number of teams choosing this path keeps growing.

The Future of Dynamic Observability with Sumo Logic -- Customer Brown Bag -- March 27th, 2025

Join us as Sr. Dir. Technical Marketer, Adam White, and Sr. Product Marketing Manager, Hadijah Creary, go beyond the usual technical deep dive—focusing on the mindset, industry trends, and thought leadership shaping modern observability and the future of dynamic observability with Sumo Logic.

License to observe: Why observability solutions need agents

Note: The original version of this blog post published on ;login: on February 24, 2025. When architecting the flow of observability data such as logs, metrics, traces or profiles, you’ve likely noticed that most solutions ask you to deploy an agent or collector. Understandably, you might be hesitant to deploy yet another application just so you can get your data into your storage system of choice.

Better CloudWatch Metrics in Honeycomb with the OpenTelemetry Collector

CloudWatch metrics can be a very useful source of information for a number of AWS services that don’t produce telemetry as well as instrumented code. There are also a number of useful metrics for non-web-request based functions, like metrics on concurrent database requests. We use them at Honeycomb to get statistics on load balancers and RDS instances. The Amazon Data Firehose is able to export directly to Honeycomb as well, which makes getting the data into Honeycomb straightforward.

Container Observability: Optimizing Every Layer with Innovative New Capabilities for Kubernetes & Windows

Managing containerized workloads and Windows environments requires more than just basic monitoring—it demands deep observability to prevent performance bottlenecks, optimize costs, and accelerate troubleshooting. Virtana’s latest Container Observability enhancements provide IT teams with greater control, visibility, and analytics across Kubernetes and Windows-based workloads.

Infrastructure Observability: Optimizing Every Layer with Innovative New Capabilities

Modern IT environments are complex, spanning on-premises, cloud, and hybrid infrastructures. Without deep observability at every layer, performance bottlenecks, inefficiencies, and troubleshooting challenges can drain resources and impact business outcomes. Virtana’s latest Infrastructure Observability enhancements are designed to eliminate blind spots, automate performance tuning, and simplify IT operations.

Understanding observability metrics: Types, golden signals, and best practices

Observability metrics provide insights into the performance, behavior, and health of applications, systems, and infrastructure — enabling observability practices, which is how a system’s internal state is understood by examining its data. As organizations continue to collect more and more data, observability metrics are a key telemetry signal for observability.

Observability Pipeline: An Easy-to-Follow Guide for Engineers

You've got systems spitting out more logs, metrics, and traces than you can handle. Your monitoring costs are through the roof. And somehow, when something breaks at 3 AM, you still can't find the exact data you need. Sound familiar? Welcome to the observability pipeline conversation—no jargon, no fluff.

Zero Code Instrumentation: The Missing Link in Observability

Have you ever struggled with systems that fail to tell you what went wrong? The kind where you’re digging through logs at 2 AM while alerts keep piling up. In DevOps, clear visibility into your applications isn’t a luxury—it’s essential. This is where instrumentation without code changes can help. It simplifies observability, reducing the manual effort needed to track down issues. If you haven’t explored it yet, you might be making troubleshooting harder than it needs to be.

The state of observability in 2025: a deep dive on our third annual Observability Survey

Across companies of all shapes and sizes, observability practices are maturing and getting attention at the highest levels. At the same time, cost and complexity continue to hinder efforts as teams look to emerging tools to help simplify their processes in hopes of better outcomes. With so much in flux, we went into our third annual Observability Survey hoping to get a window into the ways the community is approaching observability and where it wants it to go next.

Internet Connectivity Plays a Critical Role: Make it a Part of Your Observability Picture

In today’s digital age, businesses and customers alike are increasingly reliant on internet connectivity for day-to-day operations, communications, and transactions. Now more than ever, organizations depend on ISPs and cloud providers to deliver critical applications and services, making uninterrupted connectivity essential for success.

Tiered Observability: How To Prioritize and Mature Observability Investments

You may be surprised that delivering observability is a journey and isn’t about observing everything at once — it’s about driving outcomes like proactive detection, faster troubleshooting, and aligning with business priorities. If you’ve followed this series, you’ve already taken steps to.

State of Observability in Communications and Media

We surveyed ITOps and engineering professionals worldwide to learn how communications and media organizations build leading observability practices. In our webinar, “The State of Observability in Communications and Media,” we explore three priorities for today’s organizations — and what it takes to claim your spot on the observability leaderboard. Join us to discuss the implications of insights including.

How state, local, and education organizations can manage logs flexibly and efficiently using Datadog Observability Pipelines

State, local, and education (SLED) organizations need their logs to provide clear, structured insights into system performance, user behavior, and security risks. But often, the picture becomes scattered and chaotic instead, with critical log data buried in noise and gaps that make logs difficult to interpret.

Introducing Coralogix's AI Center: Real-time AI Observability

Traditional observability wasn't built for. The reason? AI operates in shades of grey, where outcomes are non-deterministic. That's why we built the AI Center, bringing real-time AI observability to thousands of enterprises worldwide. As part of our AI Center, we built an evaluation engine, designed to oversee and detect specific issues that are most common when building AI agents. Teams can choose the evaluators they want to oversee each agent and receive live alerts and reports into specific quality, security and compliance issues.

Modernizing Data Centers for AI: Bridging Observability, Cost Control, and Intelligent Automation

Attend our webinar on April 3 to see our latest innovations live. Register IT Operations are more complex than ever, with modern data centers spanning on-premises, containers, multi-cloud environments, and AI-powered infrastructure. The rapid expansion of data sources has created an overwhelming volume of information, making manual monitoring across multiple tools impractical. Visibility gaps slow down troubleshooting and delay critical decisions, impacting business performance.

Observability Reimagined: How AI is Transforming Monitoring

Observability needs to evolve. With AI reshaping IT monitoring, how can businesses leverage predictive analysis, AI-driven monitoring, and auto-remediation workflows to create more resilient infrastructures? At Civo Navigate San Francisco 2025, Jemiah Sius, New Relic, explores how AI is transforming observability, shifting from reactive responses to proactive, intelligent solutions.

Lightrun Named to Fast Company's Annual List of the World's Most Innovative Companies of 2025

(March 18, 2025) — Lightrun is proud to have been named to Fast Company’s prestigious list of the World’s Most Innovative Companies of 2025. This year’s list shines a spotlight on businesses that are shaping industry and culture through their innovations to set new standards and achieve remarkable milestones in all sectors of the economy. Alongside the World’s 50 Most Innovative Companies, Fast Company recognizes 609 organizations across 58 sectors and regions.

Why observability is crucial for your Kubernetes deployments: A fireside chat with ManageEngine and DevOps Toolkit

Kubernetes is at the heart of modern cloud-native applications, but achieving effective observability is no easy feat. Managing workloads, ensuring performance efficiency, and keeping costs under control demand the right strategies and tools. If you’re grappling with Kubernetes complexity, struggling with monitoring blind spots, or seeking to optimize your deployments, we have the perfect event for you.

So, What's the Difference Between Observability and Monitoring?

Observability and monitoring are not about gathering different data—they differ in their purpose, but share the same data. Monitoring is focused on notification based on predefined questions. Whether that’s through Dashboards people watch, or push-based alerts to notification systems like SMS or purpose-built platforms like PagerDuty.

Full-Stack Observability: What It Is [Minus the Fluff]

You've heard the term thrown around in meetups and Slack channels, but what exactly is full-stack observability? Simply put, you can see, understand, and quickly act on everything happening across your entire tech stack—from frontend user interactions to backend services, cloud infrastructure, and third-party integrations. Full-stack observability isn't just another tech buzzword. It's the difference between being blindsided by outages and catching issues before your users tweet about them.

Modernizing Government IT: Observability, Security & Cost Optimization with Datadog

Government IT leaders face the monumental challenge of modernizing aging systems, migrating to the cloud, and enhancing citizen services—all while ensuring security, compliance, and cost efficiency. Siloed tools and limited visibility create roadblocks to achieving these goals. Datadog’s FedRAMP-authorized platform provides full-stack observability, AI-powered security, and cloud cost optimization, helping agencies simplify complexity, strengthen Zero Trust security, and maximize IT budgets.

Effortless observability for Django applications

Observability is critical for web operations to ensure that the application is working as expected and to identify any potential issues. However, setting up observability has traditionally been challenging because it can take hours to set up all the infrastructure, instrument your code and enable observability in production. But now there is a better way using native support for Django in Charmcraft and Rockcraft which has observability built in and ready to go!

Flowing with Your Code: How Lightrun's Dynamic Traces Help Debug Complex Application Flows

Debugging software, whether during development or incident investigation, often begins with a manual and error-prone process. Developers typically scatter logs and snapshots across the codebase, allowing them to trigger multiple times. They then inspect the outputs and sift through the results to identify those relevant to the issue under investigation. Developers tend to group results that stem from the same user request or transaction.

What Is AI Autonomous Debugging? A Deep Dive into the Future of Software Troubleshooting

In the fast-paced world of software development, debugging remains one of the most time-consuming and complex tasks for engineers. Modern observability tools that use logs, metrics, and traces help developers gain insights into system behavior, but they still require manual effort to identify and fix issues.

Enhancing Observability with the OTEL Framework and Virtana

In today’s rapidly evolving technological landscape, observability has become essential for supporting robust, efficient systems. According to Gartner’s report “Preparing for the Future of Observability” from September 2024, OpenTelemetry (OTEL) is emerging as the standard framework for collecting telemetry data across different application pipelines.

Generating Calculated Fields From Natural Language

If you’ve been using Honeycomb for a bit, you know that Calculated Fields (otherwise known as derived columns) are a powerful way to transform your events to a format that’s easier to query and understand. However, they use a lisp-esque language that can be difficult to read and a pain to write. If you dislike making Calculated Fields and want something a little easier, here’s a generative AI prompt that can generate them from natural language.

The Need for Full-Stack Observability

In a recent survey, it was discovered that 57% of software developers’ time is spent in meetings resolving performance problems rather than innovating software solutions. The culprit? A lack of full-stack observability. Without the right tools, IT teams are left playing a high-stakes game of “Guess That Outage” – leading to delayed response to critical incidents and excessive time spent in intense meetings focused on these incidents and their root cause.

SolarWinds Observability Self-Hosted | 2025.1 GA Release Features Demo

This webcast shows off the latest features included in the 2025.1 GA Release of SolarWinds Observability Self-Hosted. Product experts Erik Eff and Chad Every discuss the importance of total cost of ownership and customer feedback in driving product development, highlighting key areas such as hybrid IT visibility and AI-driven solutions. The demo section showcases improvements in cloud monitoring, device support, and user experience, including a new NOC dashboard with dark theme.

Getting MTTR to zero: the failed promise of observability

There’s an old cliche about sales and jobs to be done - no one wants to buy a drill, they need a hole… actually, they want a home with pictures on the wall. To get to that beautifully designed home, they will buy a drill, make holes for brackets that can support their various artwork and family photos, and progress toward their dream home experience. Similarly, no one wants to buy observability software. They want their mean time to resolve (MTTR) issues to be zero.

Does AI Help Write Better Software, or Just... More Code?

As software teams race to integrate AI into their development workflows, we need to ask ourselves: are AI-powered tools actually making software better? The latest research from DORA confirms what many engineers have long suspected, and what we at Honeycomb have said for a long time: AI tools don’t magically lead to better software. In fact, without careful implementation, AI can introduce a whole slew of challenges, including decreased productivity and unreliable code.
Sponsored Post

Using observability tools for security monitoring and incident detection

Most security teams overlook a goldmine of data sitting right in their applications - crash reports and Real User Monitoring (RUM) telemetry. While engineers typically use these tools for performance tracking, they can reveal security incidents that might otherwise go unnoticed. Let's explore some practical ways to turn your observability data into a powerful security monitoring system. I'll help create a table of contents in the requested format based on the headings in the article.

Challenges in Kubernetes monitoring and how to overcome them

Kubernetes has revolutionized how organizations deploy, scale, and manage containerized applications, offering unprecedented efficiency and flexibility. However, the very characteristics that make Kubernetes so powerful—its dynamic, distributed, and ephemeral nature—also create significant challenges for monitoring. Without robust monitoring capabilities, organizations struggle to identify and resolve performance bottlenecks, optimize resource utilization, and maintain security.

How I Code With LLMs These Days

I first started using AI coding assistants in early 2021, with an invite code from a friend who worked on the original GitHub Copilot team. Back then, the workflow was just single-line tab completion, but you could also guide code generation with comments and it’d try its best to implement what you want. Fast forward to 2025. There’s now a wide range of coding assistants that are packed with features.

How to monitor your Shopify store with Grafana Cloud Frontend Observability

Shopify is a fantastic tool for organizations who want to sell products, but don’t want to build or maintain an e-commerce platform themselves. Even some of the largest brands that have built their own e-commerce platforms in the past have seen the value of using Shopify to accelerate their business. As your Shopify site scales and grows, however, you may need more insight into the performance of your store.

Unlocking the Value of Network Observability

Today, a strong network forms the backbone of business success, making network visibility crucial. As modern networks continue their rapid evolution, it's essential to have an observability solution that is robust, resilient, and scalable. Teams need a solution that helps them enhance network performance and improve user experiences. They need a solution that enables them to confidently face current and future network operations challenges. Network Observability by Broadcom is that solution.

Is your #observability always one step behind?

Guess what: It is designed to be like that! And the only way for you to get ahead of your operational challenges is to think differently. With Netdata, you get high-fidelity, ultra-detailed insights with unmatched granularity and cardinality and instant root cause analysis. See your infrastructure like never before! Get X-Ray Vision for your infrastructure!

AI: Where in the Loop Should Humans Go?

AI is everywhere, and its impressive claims are leading to rapid adoption. At this stage, I’d qualify it as charismatic technology—something that under-delivers on what it promises, but promises so much that the industry still leverages it because we believe it will eventually deliver on these claims. This is a known pattern.