Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Engineers should have a desire to find bugs: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Enterprise Chaos Engineering Certification Prep Session

Demonstrate your reliability expertise, increase your visibility, and advance your career with a Gremlin Enterprise Chaos Engineering certification. Chaos Engineering continues to grow in popularity and is rapidly becoming a job requirement for Engineering teams focused on reliability. In this webinar, Sr. Reliability Specialist Andre Newman goes over the mindset shifts, best practices, and key information you need to prep for your certification.

The only industry not licensed to do their job - Engineering: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The new principles of incident alerting: it's time to evolve

In the ever-evolving world of software engineering, the landscape is constantly shifting. New technologies emerge, best practices evolve, and how we build and run software continues to change. However, when it comes to incident alerting, it often feels like we're stuck in the past.

Digital transformation trends in oil and gas

The Oil and Gas sector, responsible in part for two industrial revolutions in the last 300 years, has been something of a laggard when it comes to adoption of new technologies. Cloud and edge penetration in Oil and Gas is late compared to other industries, mainly due to long standing concerns around security and data privacy.

Double Down on Your Backups

In August, a ransomware attack hit another company. Unfortunately, it hit a regional cloud provider in Europe this time, and we can call this a “critical hit.” So far, we know a virtual server got compromised and used as a jump host; from there, the attacker started to encrypt all volumes in the same domain. Based on pure luck or some profound reconnaissance, the same server migrated into a different data center and continued its unplanned job from there.

Release Roundup Sept 2023: Measurably improve reliability

It’s been another busy few months here at Gremlin. Overall, our team has been working on feature improvements to enable teams to measurably improve the reliability of their systems, whether that’s through broadening platform support so you can run Gremlin in more places, making it easier than ever to identify reliability risks, or improving reporting so you can manage reliability programs effectively at enterprise scale. Here’s a summary of what’s new.

What is IPAM (IP address management)

The ability to manage IP addresses within a network is crucial for effective network management, especially as networks become more complex and have to manage more demanding loads. Assigning hundreds or even thousands of IP addresses to devices that may be highly distributed or disparate is no simple task. Once devices leave the network, those IP addresses may need to be deleted, plus there’s always the risk of IP address conflict.