Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Testing the reliability of your fulfillment center

Fulfillment pipelines for order management in e-commerce have a lot of intricate moving parts that depend on one another. Sales orders, fulfillment, negotiation, shipment, and receipt are closely interconnected but require different actions while depending on one another closely. You also need messaging around order statuses, conditions, actions, rules, and inventory, just to name a few of the important parts of these complex systems.

What's the reliability of your checkout process?

One of the reasons companies practice Chaos Engineering is to prevent expensive outages in retail (or anywhere, for that matter) from happening in the first place. This blog post walks through a common retail outage where the checkout process fails, then covers how to use Chaos Engineering to prevent the outage from ever happening in the first place. Let’s dive in. Maybe you’ve been there.

Building more reliable financial systems with Chaos Engineering

The financial services industry has built in more capital buffers to prevent market shocks from bringing another economic collapse. In addition to these financial controls, many banks and personal trading platforms have begun building resiliency into information technology shocks. Despite these new precautions, we’re still seeing outages today, preventing customers from depositing and withdrawing their money, completing transactions, and executing trades during key events.

Performance tuning MongoDB with Chaos Engineering

You’ve pored over the MongoDB documentation, crafted highly polished and well-tuned queries, and confidently deployed your new code to production. Everything ran great at first, but once CPU or RAM usage hit a certain point, your queries suddenly slowed to a crawl. What happened, and how can you prepare for situations like this in the future? This is an unfortunate but common scenario with databases like MongoDB.

Announcing Status Checks to Ensure Safe Chaos Engineering Scenarios

One of the most important aspects of any Chaos Engineering program is knowing that every experiment is being run safely. And one of the simplest ways to ensure safe experiments is by having safeguards that prevent running chaos experiments on a system that is unhealthy or has an incident in progress. Today, Gremlin is excited to announce Status Checks, which run before you kick off a Chaos Engineering Scenario in order to verify your system is in a steady state.

Chaos Engineering and Windows: Mitigating common Windows failure scenarios

Microsoft Windows is a popular operating system for many enterprise applications, such as Microsoft SQL Server clusters and Microsoft Exchange Servers. About 30% of the world’s web application hosting systems are running Windows, making it an important part of every enterprise’s plans to prevent outages and enhance reliability.

Achieving AWS DevOps Competency Status (and What it Means for You)

Chaos Engineering was conceived as a direct response to the complexity and nondeterministic nature of cloud-based applications. Thoughtful fault injection closes the gap between traditional testing methodologies and modern approaches to software engineering like microservices, continuous delivery, and DevOps.