Operations | Monitoring | ITSM | DevOps | Cloud

OpenTelemetry Is Not "Three Pillars"

OpenTelemetry is a big, big project. It’s so big, in fact, that it can be hard to know what part you’re talking about when you’re talking about it! One particular critique I’ve seen going around recently, though, is about how OpenTelemetry is just ‘three pillars’ all over again. Reader, this could not be further from the truth, and I want to spend some time on why.

How to Implement OpenTelemetry in Next.js

OpenTelemetry is an open-source observability framework designed to instrument, generate, collect, and export telemetry data, including traces, metrics, and logs. It is vendor-agnostic, allowing developers to send data to multiple backend services like Last9, Prometheus, Datadog, or Jaeger without vendor lock-in. For Next.js applications, OpenTelemetry is particularly useful due to the framework’s hybrid rendering approach.

How to Build Observability into Chaos Engineering

If you've ever deployed a distributed system at scale, you know things break—often in ways you never expected. That’s where Chaos Engineering comes in. But running chaos experiments without robust observability is like debugging blindfolded. This guide will walk you through how observability empowers Chaos Engineering, ensuring that your experiments yield meaningful insights instead of just causing chaos for chaos’ sake.

Deploying Prometheus with Docker Compose: A Step-by-Step Guide

Prometheus is one of the most popular open-source monitoring and alerting tools. Setting up Prometheus with Docker Compose can make your monitoring stack easier to deploy and manage if you're running containerized applications. This guide will walk you through everything you need to get Prometheus up and running with Docker Compose, from installation to configuration and setting up basic alerts.

Understanding Reverse DNS Lookup

On the information superhighway, an IP address is a series of numbers telling the location of a digital resource, similar to having a street address for a building. However, when all you know is the street address, you have no idea what the building itself looks like. If you’re a visual person, you might insert that address into Google Maps to pull up a picture of the building so you have a marker to help find a drive.

Fix slow mobile apps before your users uninstall with Mobile Vitals

Mobile devs know the struggle. Small regressions can cause big issues in production, and fixing them isn't as easy as pushing a quick patch. Unlike a web app, shipping fixes for apps means navigating app store approvals, and often hopping on meetings with customers to debug because mobile issues can be so challenging to recreate. Catching these issues before the 1-star reviews roll in is crucial. Luckily, Sentry just made it easier than ever.

Instrument Google Cloud Run applications with the new Datadog Agent sidecar

Google Cloud Run is a fully managed service that allows you to deploy, manage, and scale workloads on serverless containers. Because Cloud Run abstracts away infrastructure management and runs on complex, distributed backends, it can be difficult to troubleshoot. Datadog’s integrations with Google Cloud and Google Cloud Run address that challenge by collecting and visualizing key metrics and logs.

Grafana Loki 101: How to ingest logs with Alloy or the OpenTelemetry Collector

Logs play a critical role in observability, but they do come with their own challenges. Grafana Loki, our horizontally scalable, highly available, multi-tenant log aggregation system, addresses these challenges head on, giving you an open source tool that’s both cost effective and easy to operate.

Free network monitoring: Full network visibility without the cost

Investing in a network monitoring tool should mean complete visibility and faster troubleshooting. But what happens when an unexpected outage occurs and your expensive tool misses the warning signs? The result: hours of downtime, frustrated employees, and lost business productivity. Many organizations face this challenge, realizing that even premium monitoring solutions can leave critical gaps. The good news? You don’t have to break the bank to monitor your network effectively.

Optimize MTTD with the right check frequency

Checkly enables engineers to automate the monitoring of their production services. Using the automation framework Playwright, you can run an end-to-end test on a regular cadence to make sure every feature is working for your users. But once you’ve got your check set up, either with Playwright scripting, a Terraform template, or an OpenAPI spec, we come to the question of what frequency you should run these checks. Should you be checking every few minutes, or every hour?