Operations | Monitoring | ITSM | DevOps | Cloud

Part 2: Building a Production-Grade Traffic Capture, Transform and Replay System

When developers try to build realistic mocks and automated tests from production network traffic, the real challenge isn’t just in the capturing—it’s in the data manipulation. Raw traffic is a chaotic sea of patterns, dynamic tokens, environment-specific secrets, and tangled dependencies that seem impossible to untangle by hand. Over my two decades of building these sytems, I learned that solving this problem requires more than brute-force parsing or ad hoc scripts.

The Load Testing Start Guide! #speedscale #stresstest #loadtesting #mocking #startup

Are you ready to get serious about load and stress testing, but don't know where to start? This guide highlights the trap most serious engineers fall into: trying to build a custom DIY testing environment. The traditional path means signing your team up for maintaining load drivers, test case frameworks, ephemeral environments, and endless custom mocks a massive drain on time and resources. There's a better, cheaper, and faster solution: Traffic Replay.

Playwright Check Suites Are Now GA - But What Does That Mean For You?

There are only a few companies that successfully invest in actively monitoring real user flows in production. I’ve been puzzled by the state of the art for many years, because I’m an anxious developer that always needs to know that production is “all right”. How can it be okay for all of us to wait for error logs, thrown exceptions or customer complains to learn about production issues?

Settle Your QA Debt Before the Bugs Start Breaking Kneecaps

In Part One, we discussed how QA debt builds silently over time — causing slower releases, late-night firefights, and unpredictable test cycles. The next step is understanding how much debt you have and where it hides. This post goes deeper into measuring QA debt — what to track, how to collect data, and how to use those insights to create a sustainable plan for improvement.

Now in the API: History, Custom Monitors, and Subscribers

Last month, we introduced the StatusGator API v3, a complete overhaul of our API designed to give developers more flexibility, an improved data model, and deeper integration options for monitoring the status of hundreds of services. Today, we’re excited to share three major additions to v3: the Board History API, Custom Monitors API, and Status Page Subscribers API.

Stop Debugging Blindly! How Traffic Capture Can Help Your Code #speedscale #trafficcapture #ai

Is AI "slop" or new code pushing tons of bugs into production? You can't test everything forever. Learn how traffic capture is the most efficient way to understand how your code is actually running in the real world. By grabbing data from sidecars, packet captures, or logs, you get the context you need to prevent bugs and improve performance.

Why do you only use Playwright for pre-release testing and not for production monitoring, too?

We've been running Playwright in production for years. Today, we, at Checkly, are going all in with Playwright Check Suites. Playwright Check Suites is our latest step towards uniting testing and monitoring into a single workflow. It's our biggest advancement yet! Here's why this matters: We're not adapting Playwright anymore. We're running it natively in production with full `playwright.config` support, complete custom dependency control, and support for every tag, spec, or configuration.

Introducing The Next Phase Of Synthetic Monitoring: Playwright Check Suites

We've been running Playwright in production since the beginning. Today, we're going all in. When we first launched Browser Checks with Playwright support, we proved something critical: the most popular test automation framework since Selenium isn't just for testing—it's the foundation of modern production monitoring. But that was just the beginning. Today, we're announcing Playwright Check Suites—our bet on the future of monitoring and the most significant evolution in Checkly's history.

Mitmproxy vs Proxymock: Replaying Traffic for Realistic API Testing

Replaying traffic is a core tool in your toolbox when you need to reproduce a tricky bug or validate how your app behaves. Traffic replay is especially valuable for testing complex software applications that rely on APIs and microservices, where integration and functionality must be thoroughly validated.

Part 1: Building a Production-Grade Traffic Capture and Replay System

A few years ago I was on call during the Super Bowl. At the time I was working for an observability vendor and one of our customers had an outage caused by a surge in user traffic. But our monitoring system didn’t have enough data to know what went wrong and I sat on a call for 2 hours painfully listening to them spinning up more servers and trying to catch up with the user load.

Debugging Without a Net: The Pain of Reproducing Production Issues

Every engineer has been there — a late-night page, a broken feature in production, and no clear way to reproduce it. The logs are vague. The metrics look normal. Your local environment works fine. Yet something somewhere is failing for real users. So begins the detective work — debugging a live system with almost no tools, no perfect test data, and no clone of production.

Ingest OTLP metrics directly into Datadog with the new OTLP Metrics API

Many organizations rely on OpenTelemetry (OTel) to standardize observability across distributed systems. These organizations are at varying stages of adoption and are implementing OTel in complex environments with diverse configurations. To support this range of use cases, Datadog offers many ways to use OpenTelemetry with Datadog.

Your "Technical Debt" is a LIE! Meet QA Debt.

The REAL reason your system WILL FAIL. We all talk about technical debt, but QA Debt is the silent killer costing companies millions. It's the accumulation of skipped regression checks, outdated test suites, and ignored production data. The result? Unpredictable, catastrophic outages that can sink your business (and your career!). Learn how to identify and pay down your QA Debt before it's too late. It's not about testing more it's about testing SMARTER.

From Datadog to Checkly in minutes

Looking to cut your Datadog bill and modernize your monitoring workflow? In this session, Dan Giordano and Giovanni Rago show how to migrate your Datadog synthetic monitors to Checkly in minutes, unlocking Playwright, Monitoring-as-Code, and AI-powered automation. Timestamps: Intro — Why Migrate from Datadog Dan introduces the session, what will be covered, and who it’s for.

MySQL Mocking with Speedscale's Proxymock: A Complete Guide

Testing database-driven applications is notoriously painful. If your app depends on MySQL, you’ve probably spent hours setting up local databases, running migrations, loading data, and then cleaning everything up just to rerun your tests. This repetitive cycle slows development, breaks pipelines, and introduces inconsistency between local and production environments.

Building LLM agents to validate LangGraph tool use and structured API responses

Transitioning LLM agents from intriguing prototypes to reliable, production-grade solutions introduces a unique and significant challenge: the inherent stochasticity of LLMs. Unlike conventional software, where inputs predictably yield precise outputs, an LLM’s response can exhibit variability even when presented with identical prompts. To ensure the dependability of your LLM agent, you will need a rigorous validation strategy.

QA Debt: The Silent Risk That Can Take Down Your Business

In engineering, we talk a lot about technical debt — the shortcuts and compromises made in code that pile up over time. But there’s another kind of debt that’s just as dangerous and far more invisible: QA debt. QA debt is what happens when testing isn’t given the same attention as features, architecture, or performance. It’s the accumulation of missed edge cases, outdated test suites, incomplete automation, or skipped regression checks.

Testing AI Code in CI/CD Made Simple for Developers

Generative AI can produce code faster than humans, and developers feel more productive with it integrated into their IDEs. That productivity is only real if CI/CD tests are solid and automated. When not appropriately tested, you may encounter a production issue that you haven’t seen before. According to the State of Software Delivery 2025 report, 67% of developers spend more time debugging and resolving security vulnerabilities in code generated by AI.

What is API-First Networking?

When you build your network with APIs at its core, you give your business a competitive edge. Here’s how to do it. Application Programming Interfaces (APIs) have become ubiquitous with modern networks for good reason. As companies use more service providers, endpoints, and software platforms, APIs help them get the most possible utility from their data with the least possible effort.