Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

Innovating Security with Managed Detection & Response (MDR) and ChaosSearch

Managed Detection and Response (MDR) services occupy an important niche in the cybersecurity industry, supporting SMBs and enterprise organizations with managed security monitoring and threat detection, proactive threat hunting, and incident response capabilities. In this week's blog, we're taking a closer look at the role of MDRs in cybersecurity, the biggest challenges they face, and how integrating ChaosSearch is helping MDRs manage complexity, reduce data retention costs, and enable long-term security analytics use cases that are critical for customer success.

Paving the way for a new era: Mezmo's Active Telemetry

The world of software development has fundamentally changed. We've moved from monthly releases to continuous delivery measured in minutes, and the rise of AI means velocity is no longer just a goal—it's a requirement for survival. But this relentless speed has exposed a critical flaw in how we approach observability. The industry relies on a "store first, ask questions later" model where you collect every log, metric, and trace, and then hope to find the root cause when something breaks.

13 Best Windows Monitoring Tools in 2025

It’s 2 AM, and your phone buzzes with an urgent alert—your primary server application is down, and users are flooding the support channels with complaints. As you dive into the logs, the cause is elusive, buried somewhere in the sea of system events. Is it a rogue service eating up memory? A failing disk? Or a network bottleneck? Without powerful Windows monitoring tools, you’re left troubleshooting in the dark.

Best Web Transaction Monitoring Tools in 2025

Websites are no longer static pages. They’re dynamic, transaction-heavy ecosystems where every click, form submission, and login matters. Whether you’re in e-commerce, SaaS, or finance, transaction failures can lead to revenue loss, frustrated customers, and even damage to your brand. That’s where web transaction monitoring tools come in — a critical component to make sure every interaction goes smoothly.

The telemetry time bomb - and what to do about it

Telemetry data is growing at an average of 29% a year — doubling costs every 18 months. That’s putting pressure on ITOps budgets, observability platforms, SecOps teams, and SIEM deployments alike. In this post, we’ll explore how unchecked data volumes, siloed tools, and aging architectures are creating a telemetry cost crunch that limits visibility, slows both troubleshooting and threat detection, and impacts business outcomes.

How GenAI Is Empowering Elastic Workforce

With over 10,000 questions answered and a 99% satisfaction rate in just 90 days, ElasticGPT, our internal generative AI assistant built on Elastic’s Search AI Platform, is transforming how our teams find information, make decisions, and complete day-to-day tasks. Matt Minetola, CIO, explains how ElasticGPT helps employees access company knowledge faster using natural language queries. Learn how we’re using retrieval augmented generation (RAG) and a secure, scalable architecture to deliver trusted, real-time AI experiences across the organization.

CriblCon sneak peek with AlphaSoc

The countdown to is on and we’re giving you an exclusive first look at the expert insights, innovative solutions, and success stories you’ll see on the big stage. Join us as we chat with Chris McNab, Founder of AlphaSOC, a security startup that processes network telemetry to uncover infected hosts, emerging threats, and targeted attacks.

Audit log streaming for real-time security visibility in your CI/CD pipeline

Security and compliance teams face a critical challenge: by the time they discover suspicious activity in their development pipeline, it’s often too late to prevent damage. Manual audit log requests create bottlenecks that delay incident response, and gaps in visibility leave organizations vulnerable to insider threats and compliance violations. If your team struggles with any of these issues, you need a systematic approach to real-time audit monitoring.

Soft navigations: The future of seamless browsing

In the ever-evolving world of web standards, a new experimental feature is quietly reshaping how browsers perceive navigation: Soft Navigations. While still in the early stages, this concept has the potential to redefine user experience metrics, improve performance monitoring, and better align browsers with the behavior of modern web applications. Let’s dive into what soft navigations are, why they’re important, and how you can start exploring them today.

Securing the Future: Responsible AI on AWS with Sumo Logic -- Customer Brown Bag -- Sept 25th, 2025

This session with Moumita Saha, Sr. Security Partner SA – WW Consulting Partners, AWS, and Adam White, Sr. Dir. Technical Marketer at Sumo Logic explores how AWS and Sumo Logic partner to deliver practical strategies for securing generative AI applications, ensuring they remain safe, compliant, and trustworthy.

How to Push Prometheus Metrics to Splunk Observability Cloud with the OpenTelemetry Collector

In this video, you’ll learn how to scrape Prometheus endpoints with the OpenTelemetry Collector’s Prometheus receiver and send metrics to Splunk Observability Cloud. We’ll walk through configuring three common data sources (a Python Flask app, node_exporter for host metrics, and the NGINX Prometheus exporter), show how to enrich metrics with resource attributes, and build simple charts in Splunk Observability Cloud. You’ll see how centralized scraping and consistent tagging make it easy to manage and visualize Prometheus metrics in Splunk Observability Cloud.

Zooplus Found Faster Root Cause Detection with Elastic Observability

Zooplus Platform Engineering Lead Aram Hakobayan shares how Elastic Observability helps manage 3,000+ microservices and 15,000+ logs/sec across their AWS cloud. Learn how Elastic powers their French market, centralizes monitoring, simplifies root cause analysis, and avoids costly vendor migration. Ideal for DevOps, SREs, and cloud architects scaling fast.

Your Next Observability RFP is All Wrong. Why AI Changes Everything

AI-first observability addresses two of the most pressing troubleshooting challenges: complex IT environments and AI-generated code. But understanding how to implement AI in a way that brings ROI, requires cutting through the hype and maintaining realistic expectations, while keeping a forward-thinking vision. In this blog post, we bring practical tips for including AI in your next observability RFP. The article is based on a webinar held with Logz.io founders, CEO Tomer Levy and CTO Asaf Yigal.

The one where we talk about Cribl Guard

Manual hunts for sensitive data are slow, error-prone, and expensive. Cribl Guard combines advanced AI with a human-in-the-loop control point to spot sensitive data, such as credit card, passport, and Social Security numbers, as it flows through Cribl Stream. Whether you’re fully cloud or hybrid, Cribl Guard puts you firmly in control of every piece of sensitive information that crosses your pipes.

OpenTelemetry Logs - A Complete Introduction & Implementation

OpenTelemetry is a Cloud Native Computing Foundation(CNCF) incubating project aimed at standardizing the way we instrument applications for generating telemetry data(logs, metrics, and traces). OpenTelemetry aims to provide a vendor-agnostic observability framework that provides a set of tools, APIs, and SDKs to instrument applications.

Elastic Cloud Serverless on Google Cloud doubles region availability

We’re pleased to announce the availability of Elastic Cloud Serverless on Google Cloud in three new regions: This doubles the number of available regions on Google Cloud and dramatically increases serverless deployment options in the US. Elastic Cloud Serverless provides the fastest way to start and scale observability, security, and search solutions without managing infrastructure.

How GenAI is Shaping Elastic Customer Support

Discover how GenAI has accelerated Elastic's customer and support efficiency. Built on Elastic’s Search AI Platform, the Support Assistant delivers self-service in-product customer support and capacity gains within our support function. Julie Rudd, VP of Support at Elastic, shares how it speeds up issue resolution by combining generative AI with Elastic’s deep knowledge base. Hear directly from a support engineer how the Support Assistant streamlines case resolution and helps engineers and customers find answers faster.

OpenTelemetry Observability: An In-Depth Look at Features and Best Practices

OpenTelemetry (OTel) is a unified framework of APIs, SDKs and tools, for collecting, processing, and exporting telemetry data (logs, metrics, and traces) across applications and infrastructure. OTel is especially required in today’s cloud-native world, where applications run on microservices, Kubernetes, and distributed systems.

Your Next Observability RFP Is All Wrong: Why AI Changes Everything

Watch how AI is reshaping observability for the years ahead. In this fireside chat, Logz.io founders Tomer Levy and Asaf Yigal reveal how the most innovative AI-first companies are breaking free from dashboards, avoiding common RFP mistakes, and building future-ready stacks. You’ll see: Watch and learn how autonomous AI eliminates noise, slashes costs, and gives engineering teams back their velocity.

Cribl.Cloud Government Is a New Era of Secure Cloud Telemetry for Federal Agencies

As a Co-founder and CPO at Cribl, I'm genuinely stoked that our new federal suite, Cribl.Cloud Government, has achieved an “In Process” designation under the Federal Risk and Authorization Management Program (FedRAMP). This isn’t any old milestone. We’re bringing all of Cribl’s kickass capabilities to government agencies, even those that require the strictest compliance and security standards. Because, who doesn’t love a good set of rules?

Cribl.Cloud Goes to Washington: Cribl.Cloud Government FedRAMP Authority to Operate Milestone

Way back in 2009, when I was serving as a second lieutenant in the U.S. Army, I worked in a network operations center for a deployed Army unit. Our mission was to provide network connectivity across central and northern Iraq. Our observability tools were incredibly limited. We had a network map that would turn nodes and network links red, yellow, and green when they were up or down. We had to write down in a physical logbook any status changes and what we did about them.

BYOS with Cribl Lake: Data ownership meets flexibility

Today, more than ever, organizations face a difficult balancing act: how to keep sensitive data fully under their control while still making it accessible and usable so teams can unlock the value and insights they need. Industries such as financial services, healthcare, and government agencies often must comply with strict regulations that require data to remain in environments they directly own and manage.

What does the EU Data Act mean for Observability?

The EU Data Act came into effect on January 12th, 2024 and most of its provisions apply from September 12th, 2025. The EU Data Act is designed to give individuals and businesses more control over the data they generate, ensuring fair access, use, and sharing across sectors. For any data generating platform that intends to operate in the European Union, this new legislation matters.

Logstash Alternative: Why Security Teams Are Choosing Modern Data Pipelines

Logstash has been a workhorse in data processing pipelines for years, but it was not designed with today’s security operations in mind. Security teams now deal with massive telemetry volumes, rising SIEM costs, and diverse log formats that require constant normalization. In this environment, Logstash shows its age: manual configuration, outdated parsing, and scalability bottlenecks introduce fragility instead of efficiency.

Bridging the Gap Integrating Logs Metrics and Flow for Observability

In this video, we discuss handling both old and new systems in IT environments. From legacy SNMP setups to modern telemetry, most organizations juggle multiple data sources, which can make observability feel overwhelming. We explore how to combine logs, metrics, and flow data into one system that provides actionable insights. You’ll see practical examples of simplifying scattered tools and making sense of complex, disparate information. Understanding how these different types of data work together is key to getting observability right.

Pastries with SREs: OTel me where the cronuts are

In this episode of Pastries with SREs, we tackle an observability debated topic: Do you need a Single Pane of Glass OR is OpenTelemetry a better strategy? We explore: Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

Logs & Lattes: Episode 1 - Smart Logging Without the Price Trap

How much value are you really getting from your logs, and what are you giving up to stay on budget? In this episode of Logs and Lattes, host Palmer Wallace sits down with Seth Goldhammer, VP of Product Management at Graylog, for a candid conversation about the hidden cost of traditional SIEM pricing. Seth explains how ingest-based and resource-heavy licensing models pressure security teams into tough tradeoffs, such as dropping logs, tuning down detections, or limiting retention just to avoid budget overages.

Pastries with SREs: Limitless observability and uncompromised donuts

In this episode of Pastries with SREs, we dig into Limitless Observability with a sweet side of unified observability strategy. If you're tired of siloed tools, fractured data, and swivel-chair investigations, this one’s for you. We explore: Why are silos still the norm in modern observability? What’s the true cost of inefficiencies across logs, metrics, and traces? How can SREs, IT operations, and dev teams shift to a no-compromise, unified observability model?

Logs vs. Metrics: Why You Need Both for Observability

Picture this: Your dashboards are calm. CPU load is steady. Error rates are low. Everything looks fine. That is, until the alarms go off. Now what? Metrics tell you something’s wrong, but not what, where, or why. They reveal symptoms, not root causes, and in high-stakes environments, that’s only half the story. Say your API response times spike. Metrics raise the flag, but they don’t tell you if it’s a code deployment, a database hang, or a traffic surge.

Visualize Logs Alongside Metrics: Complete Observability Elasticsearch Performance

Elasticsearch is a distributed search and analytics engine that powers everything from log management platforms to e-commerce search bars. It excels at indexing and retrieving large volumes of data quickly, but like any complex system it can slow down under heavy load or inefficient queries.

Monitor Cloud-Native & Hybrid Apps and Business Transactions With Observability Cloud APM

As organizations modernize, most applications don’t fit neatly into one category—they span both traditional three-tier architectures and cloud-native microservices. To monitor these hybrid environments effectively, teams need APM tools that can seamlessly connect the two worlds.

Instrumentation Your Way: Introducing a Combined Splunk AppDynamics Agent

In 2025, microservices are everywhere and Kubernetes is the de facto standard for operating cloud native applications. But not all apps are built in microservices architectures. For most enterprises, hybrid environments are the reality, with their business run on a mix of three-tier and cloud native applications.

Custom OpenTelemetry Collectors: Build, Run, and Manage at Scale

I tried thinking back to when the last time I read an actual tutorial that did not include a bunch of em (—) dashes, semicolons, normal dashes, and an unnervingly large quantity of the phrases like “XYZ-thing Alert ” and “Exciting News!”. Well, hold on to your suspenders folks, here we go again. Part 2 is up and it’s a controversial one.

The Answer to SRE Agent Failures: Context Engineering

AI agents for SREs were supposed to slash mean time to resolution and eliminate alert fatigue. Instead, most teams got expensive, unreliable tools that burn through tokens without delivering insights. But what if the problem isn't the AI models themselves? Recent benchmarking reveals the real bottleneck: context engineering. When we tested our context engineering approach against conventional methods, the results were dramatic: Scroll down for our benchmark results to see the full comparison.

Cribl to the rescue for SIEM migrations

Your security teams face escalating data volumes, vendor changes, and cost pressures when they migrate between SIEM platforms. Cribl simplifies these migrations by giving you flexible data routing, reducing storage costs, and accelerating time-to-value. How? Let’s look at how a global customer used Cribl Stream to migrate CrowdStrike FDR logs from Splunk to Microsoft Sentinel efficiently and cost-effectively.

Introducing Event iQ: Smarter Event Correlation in Splunk IT Service Intelligence (ITSI)

Every day, IT teams are flooded with alerts—thousands of messages about performance issues, service outages, or suspicious activity. With so many notifications, it’s easy to get overwhelmed, miss critical problems, or waste time chasing false alarms. Correlating related alerts into groups can help reduce the noise and make sense of everything, but setting up those correlations takes time, experience, and a lot of both system and historic knowledge.

Monitor the Health, Performance, and Security of Your AI Application Stack with AI Agent and AI Infrastructure Monitoring

At this year’s.conf25, we introduced an exciting new chapter in observability at Splunk — one that is unified, AI-powered, and agentic — to ensure ITOps and engineering teams are digitally resilient in the AI era.

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

If you are leading technology innovation in your organization, you know the relentless surge of machine data is rewriting the rules of the digital enterprise. The upside? Incredible opportunities for AI-driven transformation. The challenge? Unprecedented complexity. Today’s leaders are under enormous pressure to unify, analyze, and act on a deluge of data streams across multiple environments.

What Are Buckets in Elasticsearch? (Explained in 60 Seconds)

Overwhelmed by raw data? In this short video, we demonstrate how Elasticsearch utilizes buckets to group and organize data by time, value, region, or any other shared trait. Whether you're tracking error codes or hourly sales trends, buckets and nested aggregations help turn chaos into clarity. Additionally, discover how time-based bucketing enables you to spot patterns and zoom in on valuable insights quickly.

Empowering an MCP server with a telemetry pipeline

This blog was authored by Jason Bloomberg, Managing Director, Intellyx BV ‍ Observability depends upon telemetry – the data streaming from various applications, services, and systems that indicate their internal state in real-time. Various tools consume such telemetry to enable both operational and cybersecurity tasks.

How to Transform Telemetry Data with the OpenTelemetry Transformation Language

This demonstration shows how to use the OpenTelemetry Transformation Language (OTTL) to transform, filter, and enrich telemetry in the OpenTelemetry Collector without changing application code. We walk through a sample Python application and OpenTelemetry configuration file, generate real traffic, and then analyze the results in Splunk Observability Cloud.

What Are Vector Embeddings? (Explained in 2 Minutes)

In under 2 minutes, we explain what vector embeddings are, how they work, and how to use them in real-world applications like text expansion. We'll also show how Elasticsearch supports vector search with two powerful models: E5, open-source text embedding models designed for multilingual search, and ELSER, a sparse embeddings model from Elastic.

What is Infrastructure Monitoring? How it Works, Key Metrics & Use Cases

Infrastructure monitoring is the process of continuously collecting, analyzing, and visualizing data from an organization’s IT infrastructure. With infrastructure monitoring, DevOps teams can maintain system health, meet SLAs, reduce downtime, and detect and resolve issues proactively. This ensures optimal performance, availability, and reliability. Key networks components infrastructure monitoring typically covers.

Transform your public sector organization with embedded GenAI from Elastic on AWS

Elastic featured in AWS Generative AI Hub for public sector Elastic is proud to be featured in the new AWS Generative AI Content Hub for public sector — a destination showcasing the most impactful ways agencies can securely adopt and scale generative AI (GenAI).

The Fourth Pillar of Observability

Your application is only as reliable as the infrastructure it runs on. Most commonly, that means Kubernetes is doing the job by managing fleets of containers, scaling services on demand, and keeping workloads distributed across nodes. Traditional dashboards weren’t built to scale with this reality. They give you snapshots of raw metrics. They don’t scale to multi-cluster environments. They don’t map relationships between resources.

Logs are Generally Available (Still logs, just finally useful)

When we started building Logs in Sentry we had one goal: make them useful for real debugging, not just another high-volume text storage. This meant making them "trace connected" from day one. This let us ensure they were tightly connected to the actions and performance happening in your application, right where developers already go to investigate errors, performance, and latency issues. Now, Logs is out of beta and generally available to everyone.

Visualize Logs Alongside Metrics: Complete Observability for Slow MongoDB Operations

MongoDB’s strength of flexible schema and fast iteration can also hide costly queries until they surface as user-facing latency, replica lag, or spiky CPU. A handful of slow operations can impact the cache, starve other workloads, and cascade into timeouts across services. Monitoring slow queries gives you an early warning system for index gaps and query-plan regressions introduced by code deploys, schema changes, or shifting data shapes.

The Debugging Bottleneck: A Manual Log-Sifting Expedition

Imagine a developer at a fast-growing company. A customer support agent reports a critical issue: a user's recent order is stuck in a "pending" state. The agent provides a customer ID and a request ID. The developer's typical process is a familiar, painful dance: This process is slow, tedious, and prone to human error. The Mean Time to Resolution (MTTR) is measured in hours, not minutes, and it's a huge drain on engineering resources.