Latest Posts

Failure Flags helps build testable, reliable software-without touching infrastructure

Dec 11, 2023 By Ryan Detwiller In Gremlin

Building provably reliable systems means building testable systems. Testing for failure conditions is the only way to reliably root out issues before they impact customers. However, most current Chaos Engineering and resilience testing is focused on the underlying infrastructure. This helps identify potentially catastrophic failures, but misses the more frequent failures that still significantly impact customer experience.

Read Post

Gremlin

Read more about Failure Flags helps build testable, reliable software-without touching infrastructure

Introducing Custom Reliability Test Suites, Scoring and Dashboards

Nov 16, 2023 By Ryan Detwiller In Gremlin

Last year, we released Reliability Management, a combination of pre-built reliability tests and scoring to give you a consistent way to define, test, and measure progress toward reliability standards across your organization. Today, we fulfill the next stage of that promise with the release of Custom Reliability Test Suites, Custom Scoring, and Dashboards.

Read Post

Gremlin

Read more about Introducing Custom Reliability Test Suites, Scoring and Dashboards

Treat reliability risks like security vulnerabilities by scanning and testing for them

Nov 13, 2023 By Gavin Cahill In Gremlin

Finding, prioritizing, and mitigating security vulnerabilities is an essential part of running software. We’ve all recognized that vulnerabilities exist and that new ones are introduced on a regular basis, so we make sure that we check for and remediate them on a regular basis. Even if the code passed all the security checks before being deployed, you still perform regular security tests to make sure everything’s secure.

Read Post

Gremlin

Read more about Treat reliability risks like security vulnerabilities by scanning and testing for them

How to fix and prevent ImagePullBackOff events in Kubernetes

Oct 24, 2023 By Andre Newman In Gremlin

You'll often hear the term "containers" used to refer to the entire landscape of self-contained software packages: this includes tools like Docker and Kubernetes, platforms like Amazon Elastic Container Service (ECS), and even the process of building these packages. But there's an even more important layer that often gets overlooked, and that's container images.

Read Post

Gremlin

Read more about How to fix and prevent ImagePullBackOff events in Kubernetes

How to fix and prevent CrashLoopBackOff events in Kubernetes

Oct 18, 2023 By Andre Newman In Gremlin

It's one of the most dreaded words among Kubernetes users. Regardless of your software engineering skill or seniority level, chances are you've seen it at least once. There are a quarter of a million articles on the subject, and countless developer hours have been spent troubleshooting and fixing it. We're talking, of course, about CrashLoopBackOff.

Read Post

Gremlin

Read more about How to fix and prevent CrashLoopBackOff events in Kubernetes

Gremlin for DORA compliance: how financial services firms build digital resilience-and prove it

Oct 17, 2023 By Ryan Detwiller In Gremlin

The Digital Operational Resilience Act (DORA) is set to significantly impact the financial sector. Coming into full effect in 2025, this EU regulation will set new standards for information and communications technology (ICT) risk management. In this landscape, how can financial firms ensure they’re not only compliant, but also operationally resilient?

Read Post

Gremlin

Read more about Gremlin for DORA compliance: how financial services firms build digital resilience-and prove it

Ensuring consistent Kubernetes container versions

Oct 10, 2023 By Andre Newman In Gremlin

One of Kubernetes' killer features is its ability to seamlessly update applications no matter how large your deployment is. Did a developer make a code change, and now you need to update a thousand running containers? Just run kubectl apply -f manifest.yaml and watch as Kubernetes replaces each outdated pod with the new version.

Read Post

Gremlin

Read more about Ensuring consistent Kubernetes container versions

How to detect and prevent memory leaks in Kubernetes applications

Oct 5, 2023 By Andre Newman In Gremlin

In our last blog, we talked about the importance of setting memory requests when deploying applications to Kubernetes. We explained how memory requests lets you specify how much memory (RAM for short) Kubernetes should reserve for a pod before deploying it. However, this only helps your pod get deployed. What happens when your pod is running and gradually consumes more RAM over time?

Read Post

Gremlin

Read more about How to detect and prevent memory leaks in Kubernetes applications

Release Roundup Sept 2023: Measurably improve reliability

Oct 2, 2023 By Ryan Detwiller In Gremlin

It’s been another busy few months here at Gremlin. Overall, our team has been working on feature improvements to enable teams to measurably improve the reliability of their systems, whether that’s through broadening platform support so you can run Gremlin in more places, making it easier than ever to identify reliability risks, or improving reporting so you can manage reliability programs effectively at enterprise scale. Here’s a summary of what’s new.

Read Post

Gremlin

Read more about Release Roundup Sept 2023: Measurably improve reliability

Five mindset shifts for effective reliability programs

Sep 28, 2023 By Gavin Cahill In Gremlin

When people think about reliability, it’s easy to focus on incident response and moving fast to fix outages. This reactive approach to reliability can very quickly lead to burnout as you bounce from incident to incident. But that’s not the only way to think about reliability.

Read Post

Gremlin

Read more about Five mindset shifts for effective reliability programs

Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Failure Flags helps build testable, reliable software-without touching infrastructure

Introducing Custom Reliability Test Suites, Scoring and Dashboards

Treat reliability risks like security vulnerabilities by scanning and testing for them

How to fix and prevent ImagePullBackOff events in Kubernetes

How to fix and prevent CrashLoopBackOff events in Kubernetes

Gremlin for DORA compliance: how financial services firms build digital resilience-and prove it

Ensuring consistent Kubernetes container versions

How to detect and prevent memory leaks in Kubernetes applications

Release Roundup Sept 2023: Measurably improve reliability

Five mindset shifts for effective reliability programs

Monthly Archive

Follow Us