|
By Mezmo
Single agents are a useful starting point for SRE workflows. They are not where the architecture should end. The first version is simple enough: connect an LLM to a few tools, give it a system prompt, and point it at your infrastructure. It can summarize an alert, pull logs, answer questions, and draft a useful next step. Then the workflow gets real. You add GitHub for runbooks, Kubernetes for cluster state, PagerDuty for incident context, Prometheus for metrics, and Mezmo for telemetry.
|
By Mezmo
The first time an AI assistant suggests "restart the service" during a live incident and nobody on the bridge can tell whether that suggestion came from a current runbook, a stale wiki page, or thin air, you stop caring about model benchmarks. You start caring about what the agent actually knew, where that knowledge came from, and whether you can trust the chain of reasoning behind it.
|
By Mezmo
An interview series with the people building Mezmo’s open-source agentic harness for production operations. Builder in the loop is a Mezmo interview series focused on the engineers, product leaders, and operators shaping AURA, our open-source, MCP-native agentic harness for production operations. The goal is to get past the polished product layer and talk through the decisions that matter when AI starts interacting with real systems. What should agents be allowed to do?
|
By Mezmo
In a recent webinar, The Journey to Production AI, Andre Elizondo walked through what separates a working agent demo from an agent worth trusting on a 2 a.m. page. Live polls during the session put numbers behind a pattern most platform teams already feel. Most teams are early. The ones who are further along did not get there by shipping a flashier demo. They got there by treating production AI as a platform problem.
|
By Mezmo
Runbooks are rarely missing because teams don't value them. They're usually missing because incident response, follow-up, and platform work compete for the same limited time. By the time an issue is resolved, the knowledge is fresh, but the window to document it is already closing. That gap creates familiar failure modes: over-reliance on senior engineers, slower handoffs, and less confidence for whoever is on call next.
|
By Mezmo
How platform and SRE teams are using Mezmo's open-core agent framework — with any LLM, any tools, any observability backend.
|
By Henry Andrews
Over the last year, I’ve talked to dozens of SRE teams about AI. The excitement is real, but conversations hit a wall when we get to production reality. How does an agent manage complex context without losing the plot? How does it avoid hallucinating relationships between signals? Who owns the orchestration logic that ties it all together? We realized the bottleneck wasn’t model intelligence. It was the lack of a reliable logic layer between the data and the model.
|
By Mezmo
Grok structures logs. Context engineering connects systems. AI explains behavior. For years, Grok patterns have been the workhorse of the SRE world. Built on regular expressions, Grok helps teams extract structure from unstructured logs. As we explored in "Do You Grok It?", Grok is the key to turning messy log lines into usable fields. It's why our Grok Pattern Reference remains one of our most-visited resources — SREs are hungry for structure.
|
By Mezmo
As budgets reset for 2026, engineering leaders are making a resolution: no more vendor lock-in. Here’s how to keep that promise by building on the technical foundations of data reliability and simplified collection. It’s January 2026, and if you’re like most engineering leaders, you’re staring at your observability vendor contracts with a mix of frustration and resignation.
|
By Mezmo
A note from Lauren Nagel, Mezmo's VP of Product: At Mezmo, we believe the best observability tools aren't just built for users, they're built with them. Since the launch of Mezmo's AI SRE agent, we've listened and learned from our customers. The feedback and insights have been invaluable in helping our teams refine and enhance the experience. Today, we're excited to share our latest release, packed with improvements and powerful new capabilities that make our AI SRE even faster and more intuitive.
|
By Mezmo
A walkthrough of the Slack-based SRE bot Mezmo's engineering team built on AURA, the open-source agent harness, running against Mezmo's own production tooling. Adrian Furlong shows the bot answering questions in a DM with tool calls visible inline, then in a shared channel where it reads the conversation before responding. He opens a fresh PagerDuty incident on camera. The webhook fires AURA, and within seconds, the agent posts a triage note back on the incident and a structured analysis in the dedicated incident channel.
|
By Mezmo
See how Mezmo LiveTail helps teams move from passive log search to active, real-time investigation. In this demo, you'll watch live telemetry stream across services and environments, identify emerging issues as they happen, and use real-time context to troubleshoot faster before signals are delayed, buried, or lost in the noise. LiveTail is part of Mezmo's Active Telemetry platform — built for platform engineers and SREs who need immediate visibility into what's happening across their stack right now, not after the fact.
|
By Mezmo
AI-powered root cause analysis only works when the data going into the model is clean, relevant, and structured. In this demo, we show how Mezmo's Active Telemetry approach helps engineers and SREs move from noisy application errors to immediate clarity. Using a restaurant ordering application running in Kubernetes, we trigger a database connection pool exhaustion issue and walk through two ways to investigate it with Mezmo.
|
By Mezmo
This video shows how Mezmo's AI Assistant turns noisy telemetry into clear answers when errors spike. By preprocessing data and surfacing only the most relevant patterns, Mezmo quickly identifies issues like database connection failures or resource shortages and delivers actionable recommendations. Watch how AI-powered root cause analysis helps teams troubleshoot faster and with confidence. Mezmo's AI Assistant is built for platform engineers and SREs who need fast, reliable root cause analysis across high-volume telemetry pipelines — without manually sifting through noise.
|
By Mezmo
Watch AURA autonomously respond to a production incident in real time—from building its reasoning context and querying PagerDuty and ClickHouse, to triggering a human-in-the-loop approval with the on-call SRE, to removing the stuck pod and validating remediation. Every behavior is defined in a simple config. AURA is Mezmo's AI-powered incident response agent built for platform engineers and SREs managing high-volume telemetry pipelines.
|
By Mezmo
Many engineering teams rely on ElasticSearch for search and analytics, but as data volumes grow, so do the challenges of scale, cost, and performance. At Mezmo, we faced this reality head-on, recognizing the need for a more efficient and scalable solution to support our multi-cluster, multi-petabyte telemetry data backend. After extensive evaluation, we made the leap to Quickwit, an open-source, cloud-native search engine for logs. But making such a fundamental architectural shift—without disrupting customers—was no small feat.
|
By Mezmo
Managing telemetry data efficiently is a constant balancing act—how do you maximize visibility while controlling costs? In this webinar, we’ll show you how Mezmo’s telemetry pipeline helps you make smarter decisions about your data.
|
By Mezmo
Are you looking to enhance your observability and gain deeper insights into your systems? Curious about how a Telemetry Pipeline can revolutionize your monitoring and troubleshooting capabilities while keeping the cost low? Join Mezmo’s Bill Balnave (Vice President of Technical Services) for an insightful webinar unraveling Telemetry Pipeline’s key concepts, highlighting its significance in modern software development and operations. Discover how a Telemetry Pipeline enables you to collect, profile, transform, and analyze crucial telemetry data from your applications and infrastructure.
|
By Mezmo
Watch our discussion on the 2024 DORA Accelerate State of DevOps report, where we dive into insights impacting software delivery, organizational strategy, and AI adoption in DevOps. We’ll review key findings and highlight practical steps for leaders to optimize development and delivery performance. Whether your organization is embracing AI, building internal platforms, or addressing burnout and resilience, this webinar will provide actionable takeaways for adapting to today’s evolving DevOps landscape.
|
By Mezmo
In today's digital-first, cloud-native world, effective log management is crucial. It enhances software quality, operational efficiency, and the customer experience. However, with the rise of distributed and microservices-based architectures, organizations now generate petabytes of log data daily, making analysis and storage increasingly challenging.
|
By Mezmo
Logging in the age of DevOps has become harder and more critical than ever because it is key to maintaining visibility and security in today's fast-moving, highly dynamic environments. With these needs and challenges in mind, Mezmo has prepared this eBook to offer guidance on how best to approach the log management challenges that teams face today.
|
By Mezmo
A growing number of log management solutions available on the market today are offered as cloud-only services. Although cloud logging has its benefits, many organizations have requirements that can only be fulfilled with self-hosted/on-premises log management systems.
|
By Mezmo
Here's a complete guide covering all core components to help you choose the best log management system for your organization. From scalability, deployment, compliance, and cost, to on-prem or cloud logging, we identify the key questions to ask as you evaluate log management and analysis providers.
|
By Mezmo
Despite having an extensive feature set and being open source, organizations are beginning to realize that a free ELK license is not free after all. Rather, it comes with many hidden costs due to hardware requirements and time constraints that easily add to the total cost of ownership (TCO). Here, we uncover the true cost of running the Elastic Stack on your own vs using a hosted log management service.
- May 2026 (5)
- April 2026 (5)
- March 2026 (2)
- February 2026 (1)
- January 2026 (4)
- December 2025 (1)
- November 2025 (3)
- October 2025 (1)
- September 2025 (4)
- August 2025 (5)
- July 2025 (7)
- June 2025 (5)
- May 2025 (3)
- April 2025 (5)
- March 2025 (1)
- February 2025 (2)
- January 2025 (1)
- December 2024 (4)
- November 2024 (6)
- October 2024 (3)
- September 2024 (5)
- August 2024 (4)
- July 2024 (4)
- June 2024 (5)
- May 2024 (4)
- April 2024 (6)
- March 2024 (1)
- February 2024 (2)
- January 2024 (2)
- December 2023 (5)
- November 2023 (2)
- October 2023 (5)
- September 2023 (1)
- July 2023 (1)
- June 2023 (4)
- May 2023 (1)
- April 2023 (8)
- March 2023 (2)
- February 2023 (6)
- January 2023 (4)
- December 2022 (3)
- November 2022 (4)
- October 2022 (3)
- September 2022 (1)
- August 2022 (2)
- July 2022 (2)
- June 2022 (3)
- May 2022 (1)
- April 2022 (3)
- March 2022 (2)
- February 2022 (2)
- January 2022 (3)
- December 2021 (7)
- November 2021 (4)
- October 2021 (11)
- September 2021 (4)
- August 2021 (5)
- July 2021 (6)
- June 2021 (7)
- May 2021 (9)
- April 2021 (3)
- March 2021 (6)
- January 2021 (1)
- November 2020 (2)
- October 2020 (2)
- September 2020 (3)
- August 2020 (5)
- July 2020 (9)
- June 2020 (8)
- May 2020 (3)
- April 2020 (2)
- March 2020 (1)
- February 2020 (1)
- January 2020 (4)
- November 2019 (3)
- October 2019 (4)
- September 2019 (1)
- August 2019 (2)
- July 2019 (7)
- June 2019 (5)
- May 2019 (7)
- April 2019 (9)
- March 2019 (4)
- February 2019 (8)
- January 2019 (9)
- December 2018 (8)
- November 2018 (12)
- October 2018 (4)
- September 2018 (1)
- July 2018 (3)
- May 2018 (2)
- April 2018 (3)
- July 2017 (1)
Log Management Modernized. Instantly collect, centralize, and analyze logs in real-time from any platform, at any volume.
Why Mezmo?
- Powerful Logging at Scale: Get powerful log aggregation, auto-parsing, log monitoring, blazing fast search, custom alerts, graphs, visualization, and a real-time log analyzer in one suite of tools. We handle hundreds of thousands of log events per second, and 20+ terabytes per customer, per day and boast the fastest live tail in the industry. Whether you run 1 or 100,000 containers, we scale with you.
- Easy, Instant Setup: Mezmo's SaaS log management platform sets up in under two minutes. Instantly collect logs from AWS, Docker, Heroku, Elastic, and more with the flexibility to deploy anywhere - cloud, multi-cloud, or self-hosted. Logging in Kubernetes? Logs start flowing in just 2 kubectl commands. Whether you wish to send logs via Syslog, Code library, or agent, we have hundreds of custom integrations.
- Affordable: Mezmo’s simple, pay-per-GB pricing model eliminates contracts, paywalls, and fixed data buckets. Try our free plan, or only pay for the data you use with no overage charges or data limits. Our user-friendly, frustration-free interface allows your team to get started with no special training required, saving even more time and money.
- Secure & Compliant: Our military grade encryption ensures your logs are fully secure in transit and storage. We offer SOC2, PCI, and HIPAA-compliant logging. To comply with GDPR for our EU/Swiss customers, we are Privacy Shield certified. The privacy and security of your log data is always our top priority, and we are ready to sign Business Associate Agreements.
Blazing fast, centralized log management that's intuitive, affordable, and scalable.