Chaos Engineering

Getting started with IO attacks

Nov 4, 2021 By Andre Newman In Gremlin

Storage devices remain one of the most significant bottlenecks in modern systems. CPU and RAM speed seems to increase exponentially year over year, and although there have been large improvements in IO performance with solid state (SSD) and NVMe drives, moving data to and from persistent storage is still orders of magnitude slower than moving it to and from memory. In scalable cloud applications, this slowness can have a major impact on performance, latency, and the user experience.

Read Post

Gremlin

Read more about Getting started with IO attacks

Podcast: Break Things on Purpose | Gustavo Franco, Senior Engineering Manager at VMWare

Nov 3, 2021 By Jason Yee In Gremlin

In this episode Jason is joined by Gustavo Franco, Senior Engineering Manager at VMWare, to chat about chaos in the Gustavo’s early days. Gustavo reflects on Googles early disaster recovery practices, to the contemporary SRE movement.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | Gustavo Franco, Senior Engineering Manager at VMWare

What is a GameDay?

Nov 2, 2021 By Gremlin In Gremlin

What are GameDays in the tech industry? Jason Yee from Gremlin explains!
Visit https://gremlin.com/gamedays for more information and resources.

View Video

Gremlin

Read more about What is a GameDay?

Panel: Improving Monitoring & Reliability with Chaos Engineering - Dash 2021 (Datadog,Gremlin,Pismo)

Oct 28, 2021 By Datadog In Datadog

Monitoring and observability are critical for knowing how your systems are behaving, but how do you create the feedback loops to shift from reactive monitoring for incidents to proactively preventing them? In this roundtable discussion Mauricio Galdieri, Software Architect at Pismo.io and Kolton Andrus, CEO and co-founder of Gremlin join Tay Nishimura, Site Reliability Engineer on the Chaos Engineering team at Datadog to chat about monitoring, Chaos Engineering, and using them together to build more reliable systems.

View Video

Datadog

Read more about Panel: Improving Monitoring & Reliability with Chaos Engineering - Dash 2021 (Datadog,Gremlin,Pismo)

Announcing the Gremlin Chaos Engineering Professional Certificate Program

Oct 26, 2021 By Alex Drag In Gremlin

There’s a reason why thousands of Engineers, Testers, and other Reliability specialists signed up for Gremlin’s first Gremlin Certified Chaos Engineering Practitioner (GCCEP) certificate program: Chaos Engineering is in high demand, and the market is looking for professionals who know how to wield it well.

Read Post

Gremlin

Read more about Announcing the Gremlin Chaos Engineering Professional Certificate Program

Podcast: Break Things on Purpose | Leonardo Murillo, Principal Partner Solutions Architect at Weaveworks

Oct 19, 2021 By Jason Yee In Gremlin

Sit down with Ana and Jason for this week's show with Leonardo (Leo) Murillo, principal partner solutions architect at Weaveworks, and former DJ, who joins us from Costa Rica. Leo shares his take on GitOps, offers a lot of excellent resources to check out, and shares his thoughts on automating reliability. He also defines how to account for the “DJ variable” and “party parameters” alongside some fun anecdotes on DevOps.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | Leonardo Murillo, Principal Partner Solutions Architect at Weaveworks

PagerDuty Partner Twitch Stream with Gremlin

Oct 12, 2021 By PagerDuty In PagerDuty

Chaos Engineering can help you improve your incident response workflows. Don’t wait until an event happens to flex your response muscles, create real-world scenarios with Gremlin and practice your response with PagerDuty.

View Video

PagerDuty

Read more about PagerDuty Partner Twitch Stream with Gremlin

Getting started with Disk attacks

Oct 7, 2021 By Andre Newman In Gremlin

Persistent storage is one of the more difficult aspects of managing distributed systems. When we attach a storage device to a host—whether it’s flash storage, network attached storage (NAS), or old fashioned spinning disks—we generally don’t give it much thought until we start running distributed applications or need to increase capacity. But there’s more that can go wrong with storage, and this can have unexpected consequences for our systems, services, and applications.

Read Post

Gremlin

Read more about Getting started with Disk attacks

Podcast: Break Things on Purpose | Maxim Fateev and Samar Abbas, creators of Temporal

Oct 5, 2021 By Jason Yee In Gremlin

Join Jason for another round of “Build Things on Purpose.” This time Jason is joined by Maxim Fateev and Samar Abbas, co-founders of Temporal, to talk about the software and solutions they are developing for orchestrating micro services. Maxim and Samar talk about their joint work in the past on various projects to include the Cadence project, which has laid the foundation for what they are continuing to do at Temporal.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | Maxim Fateev and Samar Abbas, creators of Temporal

Getting started with Memory attacks

Sep 22, 2021 By Andre Newman In Gremlin

Memory (or RAM, short for random-access memory) is a critical computing resource that stores temporary data on a system. Memory is a finite resource, and the amount of memory available determines the number and complexity of processes that can run on the system. Running out of RAM can cause significant problems such as system-wide lockups, terminated processes, and increased disk activity. Understanding how and when these issues can happen is vital to creating stable and resilient systems.

Read Post