Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

Data-Led Growth: How FinTechs Win with App Event Analytics

In the rapidly shifting world of financial technology (FinTech), acquiring and retaining new customers to achieve long-term business growth requires a proactive approach to user experience and application performance optimization. As FinTech companies compete against rivals to grow a user base and revolutionize how consumers manage their finances, they increasingly depend on data-driven insights to optimize their mobile applications and deliver exceptional user experiences. This is where application event analytics comes into play.

Your telemetry, your apps. Inside apps on the Cribl Platform

You already use Cribl to tame your telemetry data. Now you can turn that data into apps your teams actually want to use. In this video, we walk through how to create apps in the Cribl Platform and show how real apps solve real problems: guided troubleshooting for noisy incidents, opinionated security views, and exec-friendly ROI dashboards. You’ll see how apps sit on top of Cribl Stream, Edge, Search, and Lake, so you reuse the data and logic you already have instead of building custom tools from scratch.

How to Install and Configure an OpenTelemetry Collector

Originally published June 2024. Updated May 2026. A lot has changed since the first version of this guide. In May 2026, OpenTelemetry officially graduated within the CNCF, the highest maturity level a project can achieve. All three core signals (metrics, logs, and traces) are now stable across every major language SDK. Collector adoption has never been higher, and the ecosystem around it, particularly OpAMP for remote management, has matured significantly. This update walks through three things.

Federated Search | From Silos to Insight | Splunk Cloud with Apache Iceberg REST and AWS S3

This walk-through shows how Splunk Cloud can search AWS S3 data through an Apache Iceberg REST catalog backed by Nessie. Learn how Iceberg table metadata, S3 storage, and Splunk Federated Search work together so analysts can query historical security data where it lives without reingesting it into Splunk.

Hybrid Cloud Monitoring Explained: On-Prem + Cloud + Kubernetes in One View

Understand what hybrid cloud monitoring is and why it’s critical for managing modern distributed IT environments. Hybrid cloud monitoring helps organizations unify visibility across on-prem infrastructure, public cloud platforms, virtual machines, containers, and Kubernetes clusters in a single monitoring platform. In this video, learn how fragmented monitoring tools create operational blind spots and slow down incident response across hybrid environments.

Builder in the loop: Eric Lake on making AURA smarter after every incident

Builder in the Loop is a Mezmo interview series focused on the engineers, product leaders, and operators shaping AURA, an open-source, MCP-native agent harness for production operations. The goal is to get past the polished product layer and talk through the decisions that matter when AI starts interacting with real systems. Key questions include: What should agents be allowed to do? How do they get better over time? Where should humans stay in the loop?

Search Azure Blob data in-place with BYOS for Cribl Lake

See how Bring Your Own Storage (BYOS) in Cribl Lake allows teams to connect directly to Azure Blob Storage and instantly search data in place — without moving, duplicating, or rehydrating telemetry. In this demo, Cribl Product Manager Risk Salsa walks through setup, dataset creation, and how to run fast investigations across your Azure-hosted data using Cribl Search.

Explore for Spans: One View with Infinite Depth

It’s 20 minutes into a P0 incident, and you have already switched between four different tools, re-authenticated twice, and translated queries across three incompatible syntax languages. The root cause you are searching for. Well, that is still out there somewhere. The reality of investigative latency is that most engineering teams face navigation problems, not data problems. During high-pressure incidents, teams lose cognitive momentum due to context switching between disconnected telemetry silos.

New Explore: Faster answers, less friction, and a better way to investigate your data

There is a moment every engineer knows too well. Something is wrong in production. You have an alert, a vague symptom, and pressure to find the one signal that explains what changed. You open your logs and traces, and you immediately hit the same two problems: the dataset is huge, and the path from “I see something odd” to “I understand why” is full of tiny, exhausting steps. Meet new Explore, our redesigned investigation experience for logs, traces, and spans.

Your Microsoft Azure storage, our data lake power: The best of both worlds

The wait is over for Azure-first organizations. Cribl just launched Cribl Lake Bring Your Own Storage (BYOS) for Microsoft Azure, giving you full data lake power without moving a single byte of telemetry out of your environment. Join us to see how you can finally get the flexibility of a modern data lake while keeping your data in Azure.

Meet the new Mobot: Your log analysis partner

Every single day, the Sumo Logic Platform analyzes more than four exabytes of log data. The good news? The answers to your application performance, infrastructure health, and security incidents are hidden in those logs. The challenge? Historically, uncovering those answers required query language fluency. That’s why we built Mobot, our conversational interface that connects users to advanced AI capabilities using natural language.

How to Overcome Government Payment Fraud with Speed and Scale

Government payment fraud is a fast-growing risk for public sector organisations in Australia and globally. From welfare and healthcare payments to business grants and disaster relief, increasingly sophisticated organised criminal networks and other actors exploit complex, high-volume government programs to unlawfully access public funds. The impact is significant—billions lost, program integrity undermined, and essential resources diverted.

Inside the Anthropic + Claude Code Hype at AWS Summit London: Live Laugh Logs ep. 2

Are companies blowing through their entire 2026 AI budget in a matter of months? Welcome to Episode 2 of Live Laugh Logs, the podcast from Annie, Lewis, and Andre from the Coralogix Developer Relations team, where we get together and recap everything going on in our worlds!

What is Log Management? The IT Team's Guide to Taming Log Data

Understand what log management is and why it’s essential for troubleshooting, security, and observability across modern IT environments. Log management helps organizations collect, centralize, parse, and analyze logs from servers, applications, cloud platforms, containers, and network devices in one searchable platform. Learn how centralized log monitoring reduces mean time to resolution (MTTR), eliminates siloed troubleshooting, and helps IT teams detect anomalies faster using AI-powered analytics.

Using AI to Instrument Applications with OpenTelemetry

OpenTelemetry is one of the best things that’s happened to observability in the last decade. It’s open. It has SDKs for every language that matters. It’s vendor neutral. The OTel community has been doing the hard work of standardizing how applications emit telemetry, so that you, the engineer, don’t have to learn five different agent formats to monitor five different services.

Certificate Audit logs are live

Certificate automation does a lot of work on your behalf. Agents running on your servers, talking to certificate authorities, deploying certs to your infrastructure. At some point someone (your CISO, your auditor, or your own brain at 3am) is going to ask: what exactly happened, and when? Today we’re shipping audit logs. Every action taken in CertKit is now recorded: logins, invitations, certificates added, issued, renewed, revoked, and deployed. Agent registrations, approvals, and config changes.

Unlock telemetry value with a well-planned data lake

Your SIEM only holds a slice of your telemetry. Your data lake holds the rest. We'll show you how to use that to your advantage for investigations, threat hunting, and reporting. Why your data lake beats your SIEM for investigations – Your SIEM keeps a short window of expensive, filtered data. Your data lake keeps everything. When something goes wrong, that difference matters more than you think Threat hunting without the handcuffs – Hunting across months of data in a SIEM is painful and costly. We'll show you how a well-planned lake makes broad, deep searches practical and affordable.

The $600 billion wake-up call: New Splunk research reveals downtime is a systemic business crisis

600 billion annual impact: Aggregate downtime costs for the Global 2000 have soared 50% in two years. $15,000 per minute: The average cost of downtime for organisations, highlighting the immediate financial impact of service disruptions. 3.4% stock price drop: The average decline in shareholder value following a single downtime incident.

Multiple API Keys Are Here - More Keys, Better Control, Stronger Security

Today we're rolling out a major upgrade to API Keys in Bindplane. You can now create up to 25 API keys per project, give each one a description, set an expiration date, and delete keys you no longer need. Under the hood, every key is now hashed with Argon2, the modern standard for credential storage. If you've been working around the old single-key limit by sharing one key across CI jobs, scripts, and teammates, this release is for you.

Why SRE agents need orchestration, not just more tools

Single agents are a useful starting point for SRE workflows. They are not where the architecture should end. The first version is simple enough: connect an LLM to a few tools, give it a system prompt, and point it at your infrastructure. It can summarize an alert, pull logs, answer questions, and draft a useful next step. Then the workflow gets real. You add GitHub for runbooks, Kubernetes for cluster state, PagerDuty for incident context, Prometheus for metrics, and Mezmo for telemetry.

What's New in Graylog V7.1 Webinar

What to Expect? Graylog 7.1 is built for lean security and IT operations teams who need real outcomes, not more tools, more add-ons, or more manual work. This 30-minute deep dive session covers what's new and what it means for your team. What you'll learn: See Graylog 7.1 in action: detection, triage, and documentation without compromise.

Cribl Notebook templates in Cribl Search

Investigations are time-sensitive, and analysts shouldn’t waste time recreating the same workflows or rewriting familiar queries. Whether troubleshooting infrastructure, investigating suspicious IPs, or analyzing host activity, teams often rely on duplicating old processes and copying query snippets — a slow, inconsistent approach that’s hard to scale.

Action trails: The missing link between AI and human trust

When people talk about trusting AI, they usually focus on the interface. It summarizes and uses confident language with a level of clarity that feels reliable. But that’s all window dressing. None of it builds trust. Trust doesn’t come from what the AI says. A verifiable record of what the AI did makes it trustworthy.

When your agents hallucinate at 2 am, it is not a model problem

The first time an AI assistant suggests "restart the service" during a live incident and nobody on the bridge can tell whether that suggestion came from a current runbook, a stale wiki page, or thin air, you stop caring about model benchmarks. You start caring about what the agent actually knew, where that knowledge came from, and whether you can trust the chain of reasoning behind it.

3 things you need to know about headless observability

If you're building agents trying to figure out the best way to actually make them successful in production, you're going to want to know about headless observability. Headless observability means an agent can access information about the health of your system through a CLI instead of clicking around dashboards. It's the data layer that going to unlock serious autonomy and allow you to scale with agentic workloads.

One Collector, Two Teams: How Bindplane Bridges Security and Observability with OpenTelemetry

Observability engineers will spend weeks tuning instrumentation. Security engineers? They want a collector installed and logs flowing — yesterday. And that's actually the magic of OpenTelemetry + Bindplane: from day one you're routing firewall logs, endpoint data, server logs straight into your SIEM with zero instrumentation lift. One toolchain. Two teams. No silos. Filmed at Google Cloud Next '26 — Las Vegas bindplane.com#OpenTelemetry.

From Phishing to SQL Injection: How Breaches Actually Happen

Critical vulnerabilities are critical because they're easy to exploit — but most breaches don't even need them. Tony explains why phishing remains the dominant attack vector, why strong instrumentation matters for forensics (tracing an API call through a database to see exactly what was leaked), and how observability data becomes security data when something goes wrong. The system is harder to breach than the human. And that's the whole game.

What Is an Incident Commander? Role, Skills, and Best Practices

The fastest incident response teams treat coordination as a craft. Someone owns the call, drives the decisions, and keeps everyone moving in the same direction while the team puts the system back together. That person is the incident commander (IC), and getting the role right is what separates your 15-minute fix from a four-hour war room where nobody’s sure who’s making the call.

What Is Log Monitoring? Pipeline, Pitfalls, and Practices for 2026

Catching a cascading failure in the first 90 seconds is one of the better feelings in production engineering, and it almost always comes back to your log monitoring pipeline doing its job upstream of the alert. The teams that land there consistently treat log monitoring as a real-time detection layer in its own right, and the choices you make in that pipeline shape how every incident plays out for years.

Builder in the loop: Henry Andrews on building AURA like production software

An interview series with the people building Mezmo’s open-source agentic harness for production operations. Builder in the loop is a Mezmo interview series focused on the engineers, product leaders, and operators shaping AURA, our open-source, MCP-native agentic harness for production operations. The goal is to get past the polished product layer and talk through the decisions that matter when AI starts interacting with real systems. What should agents be allowed to do?

What Is APM? A Guide to Application Performance Monitoring

A well-instrumented service tells your on-call engineer which deploy broke checkout, which span ate the latency budget, and which line to revert before the support queue fills up. Getting there depends on how cleanly your application performance monitoring layer turns telemetry into answers. The sections ahead walk through how APM works, the metrics and components worth tracking, the cloud-native challenges at scale, and how to evaluate APM tooling against your real workload.

Contributing Distributed Partition Ownership to the Azure Event Hub Receiver

If you're running OpenTelemetry collectors against Azure Event Hubs, distributed partition ownership and checkpointing just got significantly better. Your fleet now self-organizes. Failover is automatic. Restarts don't lose data. Here's how we got here.

OpenTelemetry Fleet Management: Scalable Control

OpenTelemetry has turned observability pipelines into production infrastructure, but managing them at scale often creates a massive operational burden. In this demo, we show how Coralogix Fleet Management acts as the central control plane for your OTel ecosystem, providing the governance and orchestration required for modern DevOps. Stop the "manual marathon" of PRs and Helm upgrades. Move toward a safer, more predictable operating model where telemetry is consistent, audited, and scalable.

Turn Noisy Logs Into Structured Data with Uptrace Grouping Rules

Here are 3 YouTube title options plus a description optimized for technical/dev audiences: Same log pattern. Hundreds of useless groups. In this video, we show how to use Uptrace Grouping Rules to automatically turn noisy logs into structured, searchable data — without changing application code. You'll learn how to: Examples covered: Perfect for:#OpenTelemetry users, backend engineers, SREs, and anyone dealing with noisy logs.

The Best Kubernetes Monitoring Tools of 2026

Effective Kubernetes monitoring in 2026 is critical due to increased cluster scale and microservices complexity, demanding a shift toward unified observability (logs, metrics, and traces). The core focus is leveraging AI-driven features to automate anomaly detection, correlate diverse data, and significantly reduce Mean Time to Recovery (MTTR).

AURA in Practice: Mezmo's SRE bot, demo walkthrough

A walkthrough of the Slack-based SRE bot Mezmo's engineering team built on AURA, the open-source agent harness, running against Mezmo's own production tooling. Adrian Furlong shows the bot answering questions in a DM with tool calls visible inline, then in a shared channel where it reads the conversation before responding. He opens a fresh PagerDuty incident on camera. The webhook fires AURA, and within seconds, the agent posts a triage note back on the incident and a structured analysis in the dedicated incident channel.

Managing OpenTelemetry at Scale: Why OTel Pipelines Need a Control Plane

OpenTelemetry made telemetry possible everywhere – turning observability pipelines into distributed production infrastructure. Distributed infrastructure requires a control plane for inventory, governance, and safe change. At 500 collectors across hybrid environments, operational overhead becomes a production risk. The moment telemetry pipelines become a distributed infrastructure, they inherit the operational problems of one.

From noise to knowledge: How GenAI is revolutionizing log management and analytics

Focusing on GenAI and logs for IT efficiency Efficiency is everything for managing today’s digital systems. Technology is constantly transforming and expanding operations are driving an explosion in data. Consequently, data ingest and storage costs have soared. But it’s not just storage data costs that keeps teams behind.The challenge of managing all that observability data forces IT teams to choose between efficiency and the bottom line.

Federated Search | From Silos to Insight | AWS S3 Schema Discovery with Splunk-Managed Tables

This walk-through shows how Splunk's crawler, available through the Data Management app, can discover schema and partition keys for S3 backed datasets and create Splunk managed catalog tables. Once the data is mapped, analysts can search AWS S3 data through Splunk and bring it into broader security, observability, and operational workflows.

The Journey to Production AI: Five Steps for SRE and Platform Teams

In a recent webinar, The Journey to Production AI, Andre Elizondo walked through what separates a working agent demo from an agent worth trusting on a 2 a.m. page. Live polls during the session put numbers behind a pattern most platform teams already feel. ‍ ‍ Most teams are early. The ones who are further along did not get there by shipping a flashier demo. They got there by treating production AI as a platform problem.

Operational Intelligence and the Hidden Structure in System Logs

Most IT teams do not suffer from a lack of data. They suffer from the amount of effort required to make sense of it. Every network device, application, cloud service, and infrastructure component generates a constant stream of machine output. Logs capture state changes, failures, retries, warnings, and thousands of other small signals about how systems behave. The problem is that raw logs are hard to use at operational speed.

How one partnership powers search for over 2 million WP Engine users

How do you make search faster, smarter, and more scalable? During our recent webinar, I sat down with Luke Patterson, senior product manager at WP Engine, and Delphin Barankanira, independent software vendor partner engineering lead and data & AI specialist at Google Cloud, to answer that question. We dug into the mechanics behind WP Engine’s ability to deliver near-instant updates to over 2 million users.

Faster OpenTelemetry Migrations from Splunk to SecOps with Bindplane

Many security teams are looking to move off Splunk, whether to reduce licensing costs, consolidate their SIEM, or take advantage of Google SecOps' built-in threat intelligence and YARA-L detection capabilities. But migrations aren’t easy, and no one wants to run blind while they evaluate and move to a new platform. With OpenTelemetry and Bindplane, you can easily make the switch to SecOps without impacting your existing stack.

Eliminate noisy log lines with Adaptive Logs drop rules

Most platform and observability teams have logs they know are noise. These could be throwaway health check logs, forgotten DEBUG logs, or verbose INFO logs from little used services that only serve to inflate your bill. Regardless of what they contain and why they're there in the first place, the hard part is getting rid of them. Centralized teams want to easily and quickly prevent these logs from being ingested, without having to work with toilsome infrastructure change management to do so.

Elasticsearch 9.4 powers the next phase of the Elastic AI Ecosystem: Dell AI Data Platform with NVIDIA

AI is moving fast. Enterprise adoption needs to move with purpose. Over the past year, one thing has become clear: Organizations are not looking for more AI hype. They are looking for a path to production — one that connects infrastructure, data, and intelligence in a way that delivers real business value. That is exactly what the Elastic AI Ecosystem is built to do. At Elastic, we believe AI is only as powerful as the data foundation behind it. Great models matter.

The cost of knowledge

In the world of observability, “cardinality” has become a heavy word. It is a ghost used to justify skyrocketing bills or degraded query performance. When cardinality rises, the advice is almost always the same: reduce it. Drop your labels, or reduce the dimensions. It is usually framed as “optimization.” Every label you add to a metric is a dimension of knowledge. Each one gives you a way to slice, compare, and explain the chaos of production.

Ep 41: The cost of not thinking: Who's responsible when AI agents get it wrong?

In this episode of Masters of Data, we get into the messier side of AI adoption, tackling questions like who actually owns the output when AI gets it wrong, and whether chasing efficiency is making us forget what it means to be human in the first place. We discuss tech CEOs proudly announcing they no longer think for themselves and debate whether AI is quietly eroding our critical thinking skills. We make the case that purpose-built, narrow AI is genuinely exciting, but that no efficiency gain is worth losing the human touch that makes work, connection, and creativity meaningful.

Observability vs Monitoring: What's the Real Difference in 2026?

Understand the real difference between observability and monitoring — and why modern IT teams in 2026 need both. Monitoring tells you something is broken; observability explains why. See real examples, faster troubleshooting workflows, and how Motadata ObserveOps unifies both in one platform. Don’t forget to like, share, and subscribe for more IT insights.

Introducing the Coralogix CLI: Headless Observability for Every Agent

This article is a high-level overview of the Coralogix CLI. For a deeper look at how it works in practice, read the full technical deep dive here. Agent-driven investigation sounds simple: read the alert, query the data, return the cause. In reality, most agents either overload their context window with raw logs or guess at queries and return incorrect results.

Taming Log Noise With the OpenTelemetry Collector's Drain Processor

Do you receive 50 million log lines per day and struggle to see what actually matters? Health checks, heartbeat pings, connection pool messages—they all drown out the errors and anomalies you're trying to find. Most teams deal with this by writing filter rules to drop the noisy patterns. But those rules are manual, per-pattern, and brittle. A new deployment changes a log format and the filter misses it. A new service starts logging a chatty startup sequence nobody thought to exclude.

May the Logs Be With You: Graylog 7.1 Is Here

A long time ago, in a SOC far, far away…analysts were drowning in alerts, chasing context across fragmented screens, and watching real threats slip past detection gaps. Today, the Rebellion fights back. This isn’t a release built around a single marquee feature. It’s the result of our team listening to you on the front lines with an ear for removing the friction that makes your jobs harder than they need to be.

Federated Search | From Silos to Insight | Unified Datasets in AWS S3 with Ingest Processor

Are storage costs and data silos slowing down your investigations? In this video, we dive into the Unified Dataset Experience to show you how to search data where it lives. Learn how to use the Splunk Ingest Processor to route high volume logs directly to AWS S3 while maintaining instant visibility via Federated Search. No more re-hydrating data, just fast cost-effective insights.

How the Coralogix CLI Adds Production Intelligence to Any Agent for Any Use Case

The new interface into production telemetry is a tool call, made from whichever agent runtime the operator happens to be using at that moment. A finance lead in Claude Code, a product manager in Cursor, an engineer in Codex. Three different jobs, three different agents, three different reasoning loops. The thing they have in common is the data layer underneath.

Real-Time Database Monitoring: Solving Database Latency with Zero-Code eBPF Tracing

In high-throughput database environments, a latency spike is rarely a simple story. Modern data layers are distributed, stateful, and constantly changing as shards move, nodes rebalance, caches warm, queries evolve, and connections churn. In practice, spikes usually come from one of three places: For many SRE and Platform teams, the real challenge is disconnected tooling. As one engineering lead recently shared during a technical workshop: “It’s all disconnected.