Operations | Monitoring | ITSM | DevOps | Cloud

The importance of error budgets for SREs and how to monitor them

Digital-first customers who are always on the go expect a seamless experience. But let’s face it—100% uptime is a myth. Trying to achieve it can drain resources and stifle innovation. This is where error budgets come in. They help site reliability engineers (SREs) find the sweet spot between delivering reliability and development velocity. With error budgets, teams can focus on building a robust system without burning out over perfection.

Finding Your Way: Using Metrics to Explore Organizational Architecture

Imagine being the new developer in a bustling tech company. Everyone is rushing to meet deadlines, and no one has time to explain the tangled web of services, databases, and messaging systems that make up the organization’s architecture. You search high and low for documentation, but the few diagrams you find are outdated or incomplete. Feeling lost? This is where metrics can come to the rescue.

Managing External-DNS & cert-manager with Komodor

Recently we’ve explored the evolving role of Kubernetes as a full ecosystem, rather than just a platform, diving into the power and complexity of add-ons. These tools, as highlighted previously, are key to augmenting Kubernetes core capabilities, and adding-on (as their name implies) essential capabilities not supported directly by Kubernetes itself.

Pod Exec in K8s: Advanced Exec Scenarios and Best Practices

Remember using SSH to access servers? It was the go-to method for troubleshooting or making changes to a system. But in the world of containers, SSH doesn't quite fit. Kubernetes and containers work differently; they're dynamic and spun up and down frequently. That’s where kubectl exec comes in. It lets you run commands inside a pod directly, without needing to rely on SSH or worry about the pod being ephemeral. It’s simple and fits the nature of modern, containerized environments.

OpenMetrics vs OpenTelemetry: A Detailed Comparison

When it comes to monitoring and observability, two of the most discussed standards are OpenMetrics and OpenTelemetry. While both are designed to collect and transmit metrics, they have distinct goals, use cases, and communities driving their development. In this guide, we'll break down what each of these projects is, how they compare, and how they fit into your monitoring stack.

Kubernetes Pods vs Nodes: What Sets Them Apart

Kubernetes has revolutionized how we manage containerized applications, bringing scalability, reliability, and flexibility to the forefront. Two fundamental components of Kubernetes are Pods and Nodes, and understanding their differences is crucial for anyone working with Kubernetes clusters. While most people are familiar with these terms, a deeper dive into the specifics can help you optimize your Kubernetes setup and avoid common pitfalls.

This Month in Datadog - January 2025

On the January episode of This Month in Datadog, join Jeremy Garcia (VP of Technical Community and Open Source) and Daljeet Sandu (Product Manager) for a bonus video that spotlights Datadog On-Call, which is now generally available. Also featured is a roundup of new features that Datadog recently announced. This Month in Datadog is a monthly update of the company’s latest features, product announcements, and more. Subscribe to our YouTube channel to get notifications about future episodes.

What's new with Microsoft Azure for 2025

Microsoft Azure remains the second largest cloud service provider with 24% of the market share globally but boasts the most availability zones, spanning 60+ regions worldwide. Over the past 12 months, the platform has seen major advancements across AI and infrastructure, and we share some of the highlights in this blog.

Cyber Security Risk Management: Frameworks and Best Practices

Cyber threats, since 2020, have become a silent epidemic for enterprises and customers alike. Sounds dramatic? Think again: In 2023, cyberattacks hit enterprises every 39 seconds and burnt through $4.99 million per hit, making security not just an IT checklist but a critical enterprise-wide priority. Fast forward to 2025, and the message is clear: adapt or lose out to your competitors.

A Smarter Type of "Intelligent Automation"? As New Collaborative Robots Start Rolling Out in 3PL Warehouses, People Should Find It Easier to Get Orders Out the Door

Even if you were watching the robots and the pickers on the screen as I was talking with Kyle and Alex, you might be confused about “what’s new” about this AMR system design or the software it’s using to improve pick efficiency and throughput in their warehouses. The robots look like typical AMRs and the carts look like lots of other carts, right? Well, they are, and they aren’t.