Operations | Monitoring | ITSM | DevOps | Cloud

3 ways to drive software delivery success with Datadog DORA Metrics

Delivering software quickly and reliably is the main focus of modern DevOps. But to improve your delivery performance, you need to understand it, and that starts with measurement. Teams primarily measure performance in this area by using DORA metrics—deployment frequency, change lead time, change failure rate, and time to restore service*. These metrics help teams understand trends in their software delivery practices in quantifiable terms that they can track and improve over time.

The Datadog Agent: Why it's essential for monitoring your infrastructure and applications with Datadog

If you’re a Datadog customer, you’re likely using our platform to gain visibility into your infrastructure and applications and to troubleshoot using logs, metrics, and traces when issues arise. To support these efforts, you’ll want access to the most granular telemetry signals and intuitive workflows that streamline your investigation.

AWS Lambda's INIT billing update: What's changing and why it matters for your cloud costs

Starting on Aug. 1, 2025, AWS will bill for the initialization (INIT) phase of Lambda functions, bringing a key change to how you are charged for serverless workloads. This billing update will impact functions using managed runtimes with ZIP archive packaging, which previously excluded the INIT phase from the billed duration. For teams that rely heavily on AWS Lambda, this is a small but significant change. The INIT phase, while short, could introduce costs that were previously invisible.

Stop Guessing, Start Measuring: Optimizing Rancher Continuous Delivery With Fleet Benchmarks

Rancher Continuous Delivery (known as Fleet) can be used in a workflow to deploy applications to many clusters. With its GitOps support, it enables downstream clusters to pull updates from a Git repository. We know of users that monitor several hundred Git repositories and deploy to a thousand clusters. To make this scale possible, several intermediate steps are necessary. First, the application is converted into separate bundles, which are then targeted at clusters.

Understanding Your App's Health With Core Mobile Vitals

Mobile apps are a little different from services run on servers. You build your mobile app, you ship it off to the world, and then it gets run by the end user on their own machine. If your app is running poorly on some percentage of users’ devices, you may never know. That’s where observability comes in. There are certain important metrics that every mobile app has in common.

CI/CD Observability Powered by OpenTelemetry

Modern engineering teams spend a lot of time and resources in setting up monitoring of their production systems - tracking uptime, catching errors, and responding to incidents before customers ever notice. But what about the journey before code reaches production? For most teams, observing the CI/CD pipeline is either an afterthought or completely overlooked. While we recognize its importance, do we truly understand how well our CI/CD process is functioning?

Monitoring your MCP Server in Production (with Sentry)

So you're building an MCP server for your project or service, to allow AI chatbots and agents to interact with it? Great! You've decided to build it using Cloudflare Workers, have written the code, shipped it, and the first users are getting onboard: you're officially running it in production. That's when problems start. I'm not here to dissuade you from shooting your shot, but let's make sure you've got your bases covered in production when something inevitably goes wrong.

Linux Security Logs: Complete Guide for DevOps and SysAdmins

Security logs are the quiet sentinels of your Linux systems, recording critical information that can mean the difference between detecting an intrusion and discovering a breach months too late. For most DevOps professionals and system administrators, these logs contain valuable insights that often go untapped. While they're essential for compliance, their real value lies in providing visibility into your system's security posture and operational health.

Using DCIM to Drive Down Data Center Energy Costs

Data centers are energy-intensive, and with the surge in AI-driven workloads, their global energy consumption is projected to more than double by 2030, potentially surpassing the current electricity consumption of Japan. For most data center operators, energy is one of their largest recurring expenses. As demand for data center capacity continues to grow and energy prices fluctuate, energy efficiency is no longer just a sustainability goal, it's a core business concern.