Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

The Hidden Cost of Sampling in Observability

Today’s software is incredibly complicated and creates tons of data. Metrics, logs, and traces are generated constantly by hundreds of services for even simple applications. Every transaction can generate on the order of kilobytes of metadata about the transaction — and multiplying that to account for even a small amount of concurrency can create a few megabytes a second (or ~300GB/day) of data that needs to be captured and analyzed for later use.

What is Network Congestion? Common Causes and How to Fix Them

There are few areas of networking so problematic, and at the same time so fixable, as network congestion. Understanding the common causes network congestion causes can help you detect them, fix them, and keep them from cropping up again. Network congestion is generally seen by the end-user as “network slow down”, or response times on our computer not being up to par.

Why you should focus on page speed & stop using pop-ups

Have you ever wondered why your bounce rate is always over 70% and can never quite figure out why? Your content reads great, you’ve got top-notch videos of your products, and you’ve even got a testimonial from Microsoft saying how good your company is! Well, all of these things seem to have little impact on visitors to your website if you have a) constant pop-ups or b) slow page loading speed (and if you have both, I’d disable Google Analytics now…).

Speed up your dashboard workflow with dynamic template variable syntax

Template variables enable you to use tags to filter your Datadog dashboards to the hosts, containers, or services you need for faster troubleshooting. However, there are some cases where it may be difficult to use a standard set of template variables to aggregate all of the data you need without creating a complicated, difficult to manage set of variables. For example, you may use tag values that are a subset of another tag.

No-code Lambda Monitoring

Auto-instrumenting Lambda Monitoring didn’t originate through a focus group or business plan. It started as a hackathon project in which our growth team used Cloudwatch to build a prototype that could instrument Lambda functions with Sentry. We did this by using Cloudformation’s stack to automatically create resources in a customer environment while streaming CloudWatch Logs to Sentry through the Kinesis Firehose.

How shuffle sharding in Cortex leads to better scalability and more isolation for Prometheus

For many years, it has been possible to scale Cortex clusters to hundreds of replicas. The relatively simple Dynamo-style replication relies on quorum consistency for reads and writes. But as such, more than a single replica failure can lead to an outage for all tenants. Shuffle sharding solves that issue by automatically picking a random “replica set” for each tenant, allowing you to isolate tenants and reduce the chance of an outage.

Observability: It's the User Experience, Stupid!

Observability, which originated from control theory, measures how well you can understand a system’s internal states from its external outputs. Observability uses instrumentation to provide insights that aid monitoring. In DevOps, gaining observability is achieved through a set of monitoring solutions. The shift to use one vendor platform to do so, versus multiple solutions, make sense as.

Dashboard Server: Working with the Elasticsearch Tile

I’ll come clean and admit it – this part of the series will be a bit interesting given the fact that I know very little about Elasticsearch. So really, this is an honest test of the question – “can I still build something good with Dashboard Server even if I only have nominal knowledge of the tool where the data is sourced from?”