Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

From Deploy to Commit: Building the Ultimate Development Pipeline - A Comprehensive Guide

‘Manual deployment is (should be) a sin.’ Well, calling manual deployment a sin may sound strong, but consider this: building the ultimate development pipeline demands a focus on automation. Although the selection of a deployment method depends on the specific needs and requirements of a project or environment, can you really deny the power of automated deployment? There's a better way.

How Squadcast's Snooze Incidents Promotes Focussed On Call Shifts

Dealing with a flood of incidents, each with varying degrees of urgency, can be a daily struggle for Incident Response teams. Suppose a low-priority alert pings while you're tackling a critical incident. This pulls your focus away from the urgent issue. This constant alert bombardment can: How do engineers ensure that high-severity issues take precedence? Don't they want to avoid being bothered or bombarded with notifications while addressing critical matters? They sure do.

5 Easy Ways to Reduce Work-Related Stress for SRE Professionals

It's completely normal to feel a little overwhelmed and stressed out at work these days. Technology has collaboration moving at the speed of light, and time away from screens is at an all-time low, blurring the lines between work and personal time. Plus, it's hard to ignore the multitude of tech outages that have been making headlines lately, leaving teams anxiously on edge. When you are a professional with on-call cycles, the potential of outages adds another level of complexity to the mix.

The Role of APM in DevOps and SRE Practices

As the software development world becomes faster, enterprises must adapt to customer demands by increasing their application’s deployment frequency. They often rely on DevOps and Site Reliability Engineering (SRE) methodologies to achieve this. These approaches ensure high system availability amidst frequent deployments and prioritize delivering a seamless user experience.

How do you build resilient systems to manage the IPL with 30+ million concurrent users?

The Indian Premier League is a unique sporting event for a dozen reasons. But for engineers in India, it’s one of a kind. Very few companies can boast of managing 30+ million concurrent users. Every year, this number grows. Last year, we witnessed ~60 million concurrent users. And things get bigger and larger every year.

Navigating the Evolving Landscape: A Deep Dive into REST API Versioning Strategies

In the ever-evolving landscape of APIs, ensuring seamless interactions and managing changes becomes crucial. While innovation and adaptability are essential, maintaining backward compatibility is equally important to avoid disruption for existing users. This is where REST API versioning comes into play. Versioning allows you to introduce new features or changes to your API in a controlled manner, while simultaneously keeping older versions running smoothly.

Balancing Innovation and Reliability: A Guide for SRE Teams

In today's rapidly evolving technological landscape, striking a balance between innovation and reliability is a constant challenge for Site Reliability Engineering (SRE) teams. On one hand, businesses and customers crave the constant stream of new features and functionalities that fuel progress. On the other hand, ensuring system stability, minimal downtime, and optimal performance remains paramount for user experience and business continuity.

Best Practices For Building A Resilient On-Call Framework

Whether a business is small scale, medium-sized, or a large enterprise, downtime issues can affect any organization as no business is exempt from experiencing downtime. However, the swifter the acknowledgment of an issue, the quicker the response, resulting in a reduced impact on business. An effective On-Call framework not only aids in prompt issue resolution but also plays a vital role in minimizing the overall downtime impact on business operations.