Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

How to collect Prometheus metrics and store them anywhere (with Sensu!)

As my co-founder Caleb Hailey likes to say, collecting monitoring and observability data is essentially a solved problem. The only remaining challenges are related to getting that data where you want it to go. When dealing with different formats — say, collecting Prometheus metrics and storing them in Elasticsearch — this can be a non-trivial problem. Put simply, it’s like trying to put a square peg into a round hole.

All in on APM

It’s been just over six months since Splunk disrupted the Application Performance Monitoring (APM) market with the launched SignalFx Microservices APM, combining the technologies of SignalFx and Omnition. We have pushed ourselves harder and continued to invest in creating more value for our customers by making it easier for them to ingest ALL data and providing ever more powerful analytics on top of that data.

Splunk > Clara-fication: Job Inspector

Do you SPL? Well, if you do, you probably either already know about the job inspector, or you’re about to. Either way, you probably don’t know enough. Don’t worry though, that’s all about to change. There are a few different aspects of the job inspector that everyone should be familiar with. These include the execution costs, the search job properties, and the search.log. I’m going to walk us through these areas, and some others, and their importance.

StatsD: What Is It and How To Monitor It

StatsD is among the most popular monitoring solutions used to instrument code with the help of custom metrics. It has become very popular over the course of the last few years and emerged as the industry standard for open source inside-the-app monitoring. It has a host of advantageous features that makes it perfect for application performance measurements.

Honeycomb Learn Ep 1 Instrument Better for a Happy Debugging Team

Nathan LeClaire, Sales Engineer @honeycombio knows first-hand that the key to instrumenting code is to start with baby steps. With Honeycomb, a little instrumentation will give vast insights as soon as you ingest your data. With Honeycomb Beelines, we take the heavy lifting out of instrumenting. Listen to learn: See Honeycomb in action, hear best practices, and learn how fast and painless instrumentation can be.

Honeycomb Learn Ep. 2: De-stress Debugging -Triggers, Feature Flags, & Fast Query

This episode in our Honeycomb Learn series looks at how to cut stress levels when debugging issues in production. Starting with a hypothesis, run fast queries, and then navigate to the code where the problem lies. Be proactive and set triggers to let you know if something needs attention. When engineering is about to ship a new release, set a feature flag to watch how production behaves in real-time. Curtail performance issues and reduce customer impact with the right tools to better understand production systems, right now.

Honeycomb Learn Ep. 3: See The Trace? Discover Errors, Latency & More across Distributed Systems

Distributed systems bring complexity for developer and ops teams. When incidents occur in production, expected and unexpected, you want to pinpoint which part of the service is giving problems. Distributed tracing illuminates distributed systems, making your logs easier to navigate. Quickly identify where there are errors or latency in your code or service, even within 3rd party services you use. Instrumentation is the key to the best tracing experience possible.

Honeycomb Learn Ep. 4: Bubble-Up to Spot Outliers in Production

The power of Honeycomb lies in the way you analyze production data using different interactive views. See what's happening across many dimensions (fields) in your system with BubbleUp. Pick the timeframe, breakdown by any field, such as customer name or ID, then filter by a specific dataset or where any errors occur. The query results are heatmap that highlight events over the baseline, over time. Use BubbleUp to select outliers on the heatmap and drill down to all related fields in that data. It will help you understand which part of the code is misbehaving.