Latest News

How to Break Stuff with Chaos Engineering and Chaos Mesh

Sep 15, 2022 By Okoth Pius In Mattermost

In 2011, a Netflix engineering team introduced the concept of chaos engineering with its release of Chaos Monkey. This was initially an in-house tool developed to orchestrate fault injection that Netflix eventually made open source. However, the reliance of Chaos Monkey on Spinnaker, another Netflix engineering innovation, establishes some limitations.

Read Post

Mattermost

Read more about How to Break Stuff with Chaos Engineering and Chaos Mesh

Four tests to measure and improve reliability: what matters and how it works

Sep 2, 2022 By Andre Newman In Gremlin

Legendary race car driver Carroll Smith once said, "until we have established reliability, there is no sense at all in wasting time trying to make the thing go faster." Even though he was referring to cars, the same goes for technology: no amount of code optimization or new features can replace stable systems. Unfortunately, much like race cars, it's hard to know that a system is unreliable until it blows a tire, the brakes stop working, or the steering wheel comes off the column.

Read Post

Gremlin

Read more about Four tests to measure and improve reliability: what matters and how it works

What is a "service" in a microservices architecture?

Sep 2, 2022 By Andre Newman In Gremlin

The past ten years marked a significant change in how software teams build and deploy applications. We moved away from bulky, slow, monolithic applications toward lightweight, scalable, distributed service-based applications. Meanwhile, tools like Docker, Kubernetes, and other container platforms helped accelerate this process. Despite this sudden growth, a fundamental question remains: what exactly is a service, and how does it fit into a microservice architecture?

Read Post

Gremlin

Read more about What is a "service" in a microservices architecture?

What are the four Golden Signals?

Sep 2, 2022 By Andre Newman In Gremlin

When it comes to building reliable and scalable software, few organizations have as much authority and expertise as Google. Their Site Reliability Engineering Handbook, first published in 2016, details their practices to maintain reliability as Google scaled. But when you have over a million servers running thousands of services across more than twenty data centers, how do you monitor them in a consistent, logical, and relevant way?

Read Post

Gremlin

Read more about What are the four Golden Signals?

What is Chaos Engineering? A Guide on Its History, Key Principles, and Benefits

Aug 18, 2022 By Joey D'Antoni In SolarWinds

Many organizations invest in high availability and disaster recovery for their key applications. Too many of these organizations, however, forego the most important aspect of this process—testing the failover process regularly. Whether gripped by the fear of downtime or dreaded DNS problems, development teams are frequently hesitant to test out what they’ve built in the real world.

Read Post

SolarWinds

Read more about What is Chaos Engineering? A Guide on Its History, Key Principles, and Benefits

Chaos Engineering: What Is It & How Does It Work?

Aug 17, 2022 By Noor-ul-Anam Ruqayya In Blameless

Distributed software systems have many points of failure. Can the process of chaos engineering help identify problems and gauge resiliency?

Read Post

Blameless

Read more about Chaos Engineering: What Is It & How Does It Work?

Why SREs Need to Embrace Chaos Engineering

Jul 20, 2022 By xMatters In xMatters

Reliability and chaos might seem like opposite ideas. But, as Netflix learned in 2010, introducing a bit of chaos—and carefully measuring the results of that chaos—can be a great recipe for reliability. Although most software is created in a tightly controlled environment and carefully tested before release, the production environment is harsher and much less controlled.

Read Post

xMatters

Read more about Why SREs Need to Embrace Chaos Engineering

How to define and measure the reliability of a service

Jul 14, 2022 By Andre Newman In Gremlin

More and more teams are moving away from monolithic applications and towards microservice-based architectures. As part of this transition, development teams are taking more direct ownership over their applications, including their deployment and operation in production. A major challenge these teams face isn't in getting their code into production (we have containers to thank for that), but in making sure their services are reliable.

Read Post

Gremlin

Read more about How to define and measure the reliability of a service

How Gremlin's reliability score works

Jul 14, 2022 By Andre Newman In Gremlin

In order to make reliability improvements tangible, there needs to be a way to quantify and track the reliability of systems and services in a meaningful way. This "reliability score" should indicate at a glance how likely a service is to withstand real-world causes of failure without having to wait for an incident to happen first. Gremlin's upcoming feature allows you to do just that.

Read Post

Gremlin

Read more about How Gremlin's reliability score works

Chaos Engineering Tools: Build vs Buy

Jul 8, 2022 By Gremlin In Gremlin

Chaos Engineering, where engineers intentionally inject failure to test the reliability of their systems, is becoming a regular practice for companies who value uptime and availability. As cloud-based systems have grown more complex, Chaos Engineering has become a critical part of the software testing and release process to uncover surprise dependencies, fix problems before they become 3am outages, and bake reliability into every feature.

Read Post

Gremlin

Read more about Chaos Engineering Tools: Build vs Buy

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How to Break Stuff with Chaos Engineering and Chaos Mesh

Four tests to measure and improve reliability: what matters and how it works

What is a "service" in a microservices architecture?

What are the four Golden Signals?

What is Chaos Engineering? A Guide on Its History, Key Principles, and Benefits

Chaos Engineering: What Is It & How Does It Work?

Why SREs Need to Embrace Chaos Engineering

How to define and measure the reliability of a service

How Gremlin's reliability score works

Chaos Engineering Tools: Build vs Buy

Monthly Archive

Follow Us