Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

The Android Developer's Journey into Hardware Observability

In this article, I walk through how the growth of internal observability tooling for an AOSP device might look like, and the variety of pitfalls one might encounter as they scale from 1s to 10s to 1000s of Android devices in the field, based off my experience talking to AOSP developers and teams, and personally as an Android app developer working on AOSP hardware.

Agentless monitoring for cloud VMs: Simplify scaling and observability

Managing cloud infrastructure is challenging enough without adding the burden of deploying and maintaining monitoring agents. What if there was a simpler, more efficient way to monitor your virtual machines (VMs)? In the first part of this series, we looked at the (link) and presented a better solution: agentless monitoring. Agentless monitoring is an efficient approach to observability that eliminates the need to install and manage software agents on each monitored device.

OpenTelemetry Metrics Explained: A Guide for Engineers

OpenTelemetry (often abbreviated as OTel) is the golden standard observability framework, allowing users to collect, process, and export telemetry data from their systems. OpenTelemetry’s framework is organized into distinct signals, each offering an aspect of observability. Among these signals, OpenTelemetry metrics are crucial in helping engineers understand their systems.

How to Build Observability into Chaos Engineering

If you've ever deployed a distributed system at scale, you know things break—often in ways you never expected. That’s where Chaos Engineering comes in. But running chaos experiments without robust observability is like debugging blindfolded. This guide will walk you through how observability empowers Chaos Engineering, ensuring that your experiments yield meaningful insights instead of just causing chaos for chaos’ sake.

OpenTelemetry Is Not "Three Pillars"

OpenTelemetry is a big, big project. It’s so big, in fact, that it can be hard to know what part you’re talking about when you’re talking about it! One particular critique I’ve seen going around recently, though, is about how OpenTelemetry is just ‘three pillars’ all over again. Reader, this could not be further from the truth, and I want to spend some time on why.

Increase control and reduce noise in your AWS logs using Datadog Observability Pipelines

Today’s SRE and security operations center (SOC) teams often find themselves overwhelmed by the sheer volume and variety of logs generated by critical AWS services such as VPC Flow Logs, AWS WAF, and Amazon CloudFront. While these logs can be valuable for detecting and investigating security threats, as well as troubleshooting issues in your environment, managing them at scale can be challenging and costly.

Integrating OpenTelemetry with Grafana for Better Observability

Modern application observability is essential for ensuring system performance, diagnosing issues, and optimizing user experiences. OpenTelemetry (Otel) and Grafana serve as two key components in achieving end-to-end visibility. While OpenTelemetry focuses on instrumenting applications to collect telemetry data, Grafana specializes in visualizing this data, making it actionable and insightful.

Enhance Network Performance Management With Next-Gen AIOps: Configuring Integration of DX Spectrum With DX Operational Observability

To unlock the power of observability and advanced analytics of AIOps, teams need to collect exceptional monitoring data, establish connections and correlations between the data, and understand context with the help of robust and current topological maps. Because modern networks often span on-premises, cloud, and hybrid infrastructures, monitoring their performance and troubleshooting issues can be difficult. These complex infrastructures often lead to observability gaps for network teams.

Slicing Up-and Iterating on-SLOs

One of the main pieces of advice about Service Level Objectives (SLOs) is that they should focus on the user experience. Invariably, this leads to people further down the stack asking, “But how do I make my work fit the users?”—to which the answer is to redefine what we mean by “user.” In the end, a user is anyone who uses whatever it is you’re measuring.