Operations | Monitoring | ITSM | DevOps | Cloud

Four Shades of Progressive Delivery

Progressive Delivery strategies like Blue/Green deployments, canary releases, feature flag rollouts, and feature delivery platforms help teams release safely, limit risk, and accelerate learning. Each approach builds toward sustainable, high-velocity software delivery by minimizing downtime and maximizing feedback. Combining these methods enables faster innovation with greater confidence and control. Last week we walked The Path To Progressive Delivery. This week, we go deeper.

Our New CLI: How and Why We Made It

We are happy to announce our latest project at MetricFire: a brand-new CLI tool! Get ready to start monitoring your systems in one step - no need to modify any configuration files manually. Just run a terminal command, follow the prompts, and forward your system metrics to Hosted Graphite in minutes. In this article, we’ll share an overview of the Hosted Graphite CLI, why we’re making it, and how we’re making it.

CI/CD at scale: A performance analysis of CircleCI vs GitHub Actions

When evaluating CI/CD platforms, it can be easy to view them as commodities — interchangeable tools that accomplish the same basic tasks. But as development teams scale, small differences in platform performance can be compounded, significantly impacting development velocity and resource utilization. To better understand these differences, we conducted a head-to-head comparison between CircleCI and GitHub Actions, focusing specifically on performance at enterprise scale.

Why and How You Should Use Your Learning & Visiting Budget

When I joined Checkly as Junior People Operations Manager, one of the benefits that immediately stood out to me was the Learning & Visiting budget. I found myself wondering—how is this budget actually being used across the company? At the start of the year, many of our team members plan how they’ll use their learning budget—whether to enhance professional skills or pursue self-driven projects. With flexible guidelines, we encourage them to invest in what matters most.

TCP Checks Now Available in Checkly

Checkly has always helped you monitor your APIs and web services, ensuring they stay fast, reliable, and available. But application reliability doesn’t stop there—databases, message queues, and mail servers all play a crucial role in your infrastructure. To provide full application reliability, we’re expanding into network monitoring with TCP checks. Now, you can monitor critical non-HTTP services directly in Checkly—without adding extra tools to your stack.

Improve gaming app performance with Unity support in Datadog RUM

As mobile gaming evolves, players have higher expectations for seamless experiences, real-time interactions, and cross-platform accessibility. Whether you’re developing games for iOS, Android, or another mobile operating system, maintaining and optimizing the performance of your game is critical for player retention. For instance, if a mobile game becomes laggy or begins to drop frames during gameplay, players will grow frustrated and abandon the game altogether.

Getting started with Azure cost dashboards

As an Azure admin, it is of critical importance that you keep an eye on how much cost you are incurring running your workloads in the cloud. You also want to have sight of any deployed resources that are not contributing to business and accumulating cost over time. Using a dedicated Azure plugin, SquaredUp dashboards will help you understand your Azure costs across services, resources, locations and apps – so you can keep tabs on how much you're spending and identify opportunities to save costs.

Everything You Need to Know About OpenTelemetry Agents

If you’re reading this, chances are you’re already familiar with OpenTelemetry (OTel)—the open-source standard for collecting observability data. But what about OpenTelemetry agents? How do they work, and why do they matter? This guide unpacks everything you need to know about OTel agents—where they fit in your stack, how to set them up, and common pitfalls to watch out for. Let’s get into it.

How to Effectively Monitor Nginx and Prevent Downtime

Nginx is widely known for its high performance and reliability. However, just like any software running in production, it requires continuous monitoring to ensure smooth operation. Issues such as high latency, unexpected crashes, or overwhelming traffic spikes can lead to performance degradation or even complete outages. Therefore, implementing a robust monitoring strategy is crucial to maintaining the health and stability of your Nginx deployment.

Troubleshooting Kubernetes deployment failures

Do you feel like you're solving a puzzle when deploying applications in Kubernetes? You are not alone in this! When something goes wrong during application deployment, it becomes all the more crucial to diagnose the issue methodically and get things back on track. This guide walks you through practical steps for troubleshooting deployment failures efficiently.