Operations | Monitoring | ITSM | DevOps | Cloud

Surface and Confirm Buggy Patterns in Your Logs Without Slow Search

Debugging with logs in distributed systems can be a pain. It’s tough to search raw data looking for a pattern, relating potential causes with other logs, and checking trace and metrics data for more confirmation. Is finding one pattern enough? What if there are other problems? Who knows how many colliding factors are relevant? At Honeycomb, we’re flipping the script on the log search problem. Hear our resident experts, (former Splunk Ninja) Michael Wilde and Andy Dufour, discuss how Honeycomb customers have technically evolved their log analysis process to achieve fast pattern detection, skipping the search grep/search loop entirely.

Ask Miss O11y: Is There a Beginner's Guide On How to Add Observability to Your Applications?

I want to make my microservices more observable. Currently, I only have logs. I’ll add metrics soon, but I’m not really sure if there is a set path you follow. Is there a beginner's guide to observability of some sort, or best practice, like you have to have x kinds of metrics? I just want to know what all possibilities are out there. I am very new to this space.

How Do We Cultivate the End User Community Within Cloud-Native Projects?

The open source community talks a lot about the problem of aligning incentives. If you’re not familiar with the discourse, most of this conversation so far has centered around the most classic model of open source: the solo unpaid developer who maintains a tiny but essential library that’s holding up half the internet. For example, Denis Pushkarev, the solo maintainer of popular JavaScript library core-js, announced that he can’t continue if not better compensated.

Platform Engineering Is the Future of Ops

Ops and DevOps roles as we know them are on their way to becoming extinct—the future is platform engineering. While DevOps engineers typically focus on the application layer, platform engineers focus on the underlying infrastructure layer. Without a solid and reliable platform, it can be challenging to deploy and maintain software applications effectively. This can result in downtime, poor performance, and security vulnerabilities. Platform engineering enables software applications and services to run effectively and efficiently and has a direct impact on the user experience and the success of the entire organization.

How We Define SRE Work, as a Team

Last year, I wrote How We Define SRE Work. This article described how I came up with the charter for the SRE team, which we bootstrapped right around then. It’s been a while. The SRE team is now four engineers and a manager. We are involved in all sorts of things across the organization, across all sorts of spheres. We are embedded in teams and we handle training, vendor management, capacity planning, cluster updates, tooling, and so on.

Deploys Are the WRONG Way to Change User Experience

I'm no stranger to ranting about deploys. But there's one thing I haven't sufficiently ranted about yet, which is this: Deploying software is a terrible, horrible, no good, very bad way to go about the process of changing user-facing code. It sucks even if you have excellent, fast, fully automated deploys (which most of you do not). Relying on deploys to change user experience is a problem because it fundamentally confuses and scrambles up two very different actions: Deploys and releases.

How Coveo Reduced User Latency and Mean Time to Resolution with Honeycomb Observability

When you’re just getting started with observability, a proof of concept (POC) can be exactly what you need to see the positive impact of this shift right away. Coveo, an intelligent search platform that uses AI to personalize customer interactions, used a successful POC to jumpstart its Honeycomb observability journey—which has grown to include 10,000+ machine learning models in production at any one time. Wondering how Coveo got there? So were we.

Caring for Complex Systems: We Can Do This

When we work at it, professionals are pretty good at analysis. We can break down a simple system, look at its parts and their relations, and master it. Given enough time and teammates, we can analyze a very complicated system and fix it when it breaks. But complex systems don’t yield to analysis. We have to add another skill: sense-making. Complex systems have parts that learn and change, with relations that vary with state and history. They respond to and influence their environment.