Operations | Monitoring | ITSM | DevOps | Cloud

How a Global Banking Leader Tackled Memory Overload with HEAL Software

In the financial sector, where system reliability directly impacts customer trust and revenue, even minor IT inefficiencies can spiral into costly crises. For one of the world’s largest banks—supporting 25 million customers, 2,000 branches, and 3,000 ATMs—a hidden challenge threatened its reputation: unpredictable memory consumption in critical applications.

The importance of error budgets for SREs and how to monitor them

Digital-first customers who are always on the go expect a seamless experience. But let’s face it—100% uptime is a myth. Trying to achieve it can drain resources and stifle innovation. This is where error budgets come in. They help site reliability engineers (SREs) find the sweet spot between delivering reliability and development velocity. With error budgets, teams can focus on building a robust system without burning out over perfection.

Finding Your Way: Using Metrics to Explore Organizational Architecture

Imagine being the new developer in a bustling tech company. Everyone is rushing to meet deadlines, and no one has time to explain the tangled web of services, databases, and messaging systems that make up the organization’s architecture. You search high and low for documentation, but the few diagrams you find are outdated or incomplete. Feeling lost? This is where metrics can come to the rescue.

Pod Exec in K8s: Advanced Exec Scenarios and Best Practices

Remember using SSH to access servers? It was the go-to method for troubleshooting or making changes to a system. But in the world of containers, SSH doesn't quite fit. Kubernetes and containers work differently; they're dynamic and spun up and down frequently. That’s where kubectl exec comes in. It lets you run commands inside a pod directly, without needing to rely on SSH or worry about the pod being ephemeral. It’s simple and fits the nature of modern, containerized environments.

OpenMetrics vs OpenTelemetry: A Detailed Comparison

When it comes to monitoring and observability, two of the most discussed standards are OpenMetrics and OpenTelemetry. While both are designed to collect and transmit metrics, they have distinct goals, use cases, and communities driving their development. In this guide, we'll break down what each of these projects is, how they compare, and how they fit into your monitoring stack.

Kubernetes Pods vs Nodes: What Sets Them Apart

Kubernetes has revolutionized how we manage containerized applications, bringing scalability, reliability, and flexibility to the forefront. Two fundamental components of Kubernetes are Pods and Nodes, and understanding their differences is crucial for anyone working with Kubernetes clusters. While most people are familiar with these terms, a deeper dive into the specifics can help you optimize your Kubernetes setup and avoid common pitfalls.

This Month in Datadog - January 2025

On the January episode of This Month in Datadog, join Jeremy Garcia (VP of Technical Community and Open Source) and Daljeet Sandu (Product Manager) for a bonus video that spotlights Datadog On-Call, which is now generally available. Also featured is a roundup of new features that Datadog recently announced. This Month in Datadog is a monthly update of the company’s latest features, product announcements, and more. Subscribe to our YouTube channel to get notifications about future episodes.

How to integrate performance testing and continuous profiling for deeper application insights

A key goal of performance testing is to ensure your applications perform well under various levels of load. While critical, these tests are often conducted with minimal insight into why a system performs a certain way during testing. Metrics, logs, and traces may tell part of the story, but can miss the deeper details. This is where continuous profiling comes in.

What Are Network Monitoring Agents & How to Deploy & Configure Them

In this article, we’ll dive into the video where we discuss Network Monitoring Agents in Obkio’s Network Performance Monitoring App. Monitoring Agents (software, hardware, virtual appliances) are deployed in key network locations to monitor performance between all network sites. This video will also teach you how to create new Monitoring Agents or to modify or delete Agents you already have in your account. .

From writing code to running a company of 300+ employees

Today we break down another exciting edition of Founders and Friends, the podcast we’ve created to hold conversations with visionary leaders shaping the tech industry. Today’s conversation features Paul Stovell, co-founder and CEO of Octopus Deploy, and of course, JD Trask, co-founder and CEO of Raygun. Together, they explore the realities of running software businesses, from the evolving nature of agile practices to scaling software teams efficiently. What’s in this article.