Finding, prioritizing, and mitigating security vulnerabilities is an essential part of running software. We’ve all recognized that vulnerabilities exist and that new ones are introduced on a regular basis, so we make sure that we check for and remediate them on a regular basis. Even if the code passed all the security checks before being deployed, you still perform regular security tests to make sure everything’s secure.
You'll often hear the term "containers" used to refer to the entire landscape of self-contained software packages: this includes tools like Docker and Kubernetes, platforms like Amazon Elastic Container Service (ECS), and even the process of building these packages. But there's an even more important layer that often gets overlooked, and that's container images.
It's one of the most dreaded words among Kubernetes users. Regardless of your software engineering skill or seniority level, chances are you've seen it at least once. There are a quarter of a million articles on the subject, and countless developer hours have been spent troubleshooting and fixing it. We're talking, of course, about CrashLoopBackOff.
The Digital Operational Resilience Act (DORA) is set to significantly impact the financial sector. Coming into full effect in 2025, this EU regulation will set new standards for information and communications technology (ICT) risk management. In this landscape, how can financial firms ensure they’re not only compliant, but also operationally resilient?
One of Kubernetes' killer features is its ability to seamlessly update applications no matter how large your deployment is. Did a developer make a code change, and now you need to update a thousand running containers? Just run kubectl apply -f manifest.yaml and watch as Kubernetes replaces each outdated pod with the new version.
In our last blog, we talked about the importance of setting memory requests when deploying applications to Kubernetes. We explained how memory requests lets you specify how much memory (RAM for short) Kubernetes should reserve for a pod before deploying it. However, this only helps your pod get deployed. What happens when your pod is running and gradually consumes more RAM over time?
It’s been another busy few months here at Gremlin. Overall, our team has been working on feature improvements to enable teams to measurably improve the reliability of their systems, whether that’s through broadening platform support so you can run Gremlin in more places, making it easier than ever to identify reliability risks, or improving reporting so you can manage reliability programs effectively at enterprise scale. Here’s a summary of what’s new.
When people think about reliability, it’s easy to focus on incident response and moving fast to fix outages. This reactive approach to reliability can very quickly lead to burnout as you bounce from incident to incident. But that’s not the only way to think about reliability.