April 2024

Automation Triumphs Real-World DevOps Automation Implementations

Apr 30, 2024 By Chitra Bisht In Squadcast

Remember the pre-automation days in DevOps? Endless server configurations, manual deployments that took hours (or days!), and a constant feeling of being buried in repetitive tasks. Yeah, those were the times... �� Thankfully, those days are fading fast. The magic of automation has swept through the DevOps landscape, transforming tedious workflows into streamlined processes.

Read Post

Squadcast

Read more about Automation Triumphs Real-World DevOps Automation Implementations

Reinventing Deployments: From Docker to Dagger -- Incidentally Reliable with Solomon Hykes

Apr 30, 2024 By Zenduty In Zenduty

Catch Solomon Hykes (Co-founder of @Docker and @Dagger) shares stories from the early days of Docker, the rollercoaster journey leading to 20 million active developers worldwide, the heavy crown of a tech leader and his vision to revolutionize CI/CD with Dagger today. Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty.

View Video

Zenduty

Read more about Reinventing Deployments: From Docker to Dagger -- Incidentally Reliable with Solomon Hykes

Elevating Engineering Excellence: The Imperative of Site Reliability for Every Engineer

Apr 29, 2024 By Vishal Padghan In Squadcast

In the ever-evolving landscape of technology, engineers are the architects of the digital world. Their expertise shapes the platforms, applications, and services that define our daily interactions with technology. Yet, in the pursuit of innovation and functionality, there's one crucial aspect that often takes a backseat—site reliability. Site reliability engineering (SRE) has emerged as a critical discipline in the realm of software development and operations.

Read Post

Squadcast

Read more about Elevating Engineering Excellence: The Imperative of Site Reliability for Every Engineer

Back to the Future: The R-C-A of alerting

Apr 29, 2024 By Aditya Godbole In Last9

Dissecting the RCA of Alerting - Reliability, Correlations, Actionability.

Read Post

Last9

Read more about Back to the Future: The R-C-A of alerting

Insights of an Observability Advocate: The Challenges and Rewards

Apr 28, 2024 By Anjali Udasi In Zenduty

At a recent SRE Meetup in Bangalore, we had the pleasure of meeting Akshay Deshpande. During our conversation, Akshay, who manages a Performance/Observability Engineering team at Smarsh discussed his passion for observability and his constant drive to improve the field. Smarsh helps companies gain valuable insights from their communication data, enabling them to proactively identify potential regulatory and reputational risks before they escalate.

Read Post

Zenduty

Read more about Insights of an Observability Advocate: The Challenges and Rewards

Comparing the Top 5 On-Call Management Software Solutions in 2024

Apr 27, 2024 By Chitra Bisht In Squadcast

SRE and DevOps teams are the backbone of system uptime and reliability. But managing On-Call schedules, alerts, and communication during incidents can quickly turn resolution efforts into burnout. This blog explores the top On-Call management tools in 2024, designed to streamline Incident Response and keep your team ready for action.

Read Post

Squadcast

Read more about Comparing the Top 5 On-Call Management Software Solutions in 2024

A Day in Life of DevOps Engineer

Apr 26, 2024 By Chitra Bisht In Squadcast

Let me tell you, the life of a DevOps engineer is anything but boring. It's a constant pull between automation, collaboration, and troubleshooting, all with a healthy dose of caffeine thrown in for good measure. One day you might be scripting a deployment pipeline, the next you’re diving into server logs to diagnose a critical error. It's a role that demands versatility, a problem-solving mindset, and a learner’s excitement.

Read Post

Squadcast

Read more about A Day in Life of DevOps Engineer

Igniting Innovation: The Power of Empowered Engineers

Apr 25, 2024 By Lee Atchison In Blameless

In the fast-paced world of technology, innovation is not just a buzzword—it's a necessity. As organizations strive to stay ahead of the curve and deliver cutting-edge solutions, they must foster a culture that empowers engineers to drive change and lead transformative projects. Throughout my career, I have witnessed firsthand the impact that empowered engineers can have on an organization, and I believe that unlocking their potential is key to achieving long-term success.

Read Post

Blameless

Read more about Igniting Innovation: The Power of Empowered Engineers

Beyond SLAs: Rethinking Service Level Objectives in Incident Response

Apr 24, 2024 By Vishal Padghan In Squadcast

In the context of IT service management, Service Level Agreements (SLAs) have long been the cornerstone for measuring and ensuring the quality of services provided to customers. However, as technology evolves and incidents become more complex, relying solely on SLAs may not be sufficient. This is where Service Level Objectives (SLOs) come into play, offering a more nuanced approach to Incident Response.

Read Post

Squadcast

Read more about Beyond SLAs: Rethinking Service Level Objectives in Incident Response

Launching Alert Studio

Apr 24, 2024 By Aditya Godbole In Last9

Modern monitoring systems depend heavily on ‘Alerting’ to reduce the Mean Time to Detect (MTTD) faulty systems. But, alerting hasn’t evolved to meet the demands of modern architectures. We’re changing that with Alert Studio.

Read Post

Last9

Read more about Launching Alert Studio

Bridging the IT-business comms gap comes down to this one word: Ask

Apr 24, 2024 By Leo Vasiliou In Catchpoint

A highlight of the SRE Report is the insightful analysis based on the organizational ranks of respondents. The 2023 installment exposed significant misalignment between practitioners and management in several key areas, including the benefits of AIOps, the challenge of tool sprawl, and attitudes towards blamelessness. While the 2024 SRE Report showed a rare consensus on the importance of monitoring external endpoints, it uncovered yet more ongoing differences. Let’s dive in.

Read Post

Catchpoint

Read more about Bridging the IT-business comms gap comes down to this one word: Ask

Streamlining Incident Management with Squadcast's Workflows

Apr 24, 2024 By Squadcast In Squadcast

Watch this Webinar to understand how automating with Squadcast's 'Workflows' can save your team over 1000+ productive hours. Learn about the power of automation in the Incident lifecycle and see a live demo on setting up and tailoring Workflows to boost efficiency. 🛠️

View Video

Squadcast

Read more about Streamlining Incident Management with Squadcast's Workflows

Just hired an SRE? Five onboarding tips

Apr 24, 2024 By Jorge Lainfiesta In Rootly

No matter how good a new teammate is, a lot of their success is in your hands.

Read Post

Rootly

Read more about Just hired an SRE? Five onboarding tips

SRE and the Enterprise: Building a Culture of Reliability at Scale

Apr 23, 2024 By Vishal Padghan In Squadcast

As the digital landscape evolves at breakneck speed, enterprises face an increasingly complex challenge: how to ensure their systems remain reliable and available amidst the chaos of modern technology. In this journey, Site Reliability Engineering (SRE) emerges as a beacon of hope, offering a pragmatic approach to building a culture of reliability at scale.

Read Post

Squadcast

Read more about SRE and the Enterprise: Building a Culture of Reliability at Scale

Unleashing the Change Maker Within: Secrets to Driving Change in Your Organization

Apr 18, 2024 By Blameless In Blameless

Hello, Innovators! If you've ever believed in the potential for change within your organization but weren’t sure how to advocate for it, this webinar is designed with you in mind. "Unleashing the Change Maker Within: Secrets to Driving Change in Your Organization” is not just another webinar; it's a beacon for engineers, SREs, and tech enthusiasts eager to make a tangible difference in their companies.

View Video

Blameless

Read more about Unleashing the Change Maker Within: Secrets to Driving Change in Your Organization

What Is Denormalized Data?

Apr 17, 2024 By Anjali Udasi In Zenduty

Traditional database design prioritizes data integrity through normalization. However, for read-heavy workloads, normalized data structures can lead to complex queries and slower performance. Denormalization offers an alternative approach to optimize query execution and improve efficiency. A study concluded that denormalization can improve query performance when implemented with a thorough understanding of application requirements.

Read Post

Zenduty

Read more about What Is Denormalized Data?

Navigating On-Call Compensation for SREs: Strategies and Insights

Apr 17, 2024 By Jorge Lainfiesta In Rootly

Discover 5 models of compensation for on-call.

Read Post

Rootly

Read more about Navigating On-Call Compensation for SREs: Strategies and Insights

Rootly Demo - April 2024

Apr 15, 2024 By Rootly In Rootly

View Video

Rootly

Read more about Rootly Demo - April 2024

Squadcast Ranks in the Top 10 Incident Management Tools Report by G2

Apr 12, 2024 By Sanjog Sandhu In Squadcast

Reaching the top 10 tools in the Incident Management category marks an important milestone for Squadcast. This accomplishment underscores our commitment to actively incorporate customer feedback into our product development process and vision. From the outset, our objective has been to design a platform that streamlines Incident Response workflows by integrating On-Call Management, Incident Response, SRE, AIOps, and Automation into one cohesive system.

Read Post

Squadcast

Read more about Squadcast Ranks in the Top 10 Incident Management Tools Report by G2

Streamline Incident Resolution with Squadcast's Outgoing Webhooks

Apr 12, 2024 By Chitra Bisht In Squadcast

Incident responders often find themselves under pressure to resolve issues quickly and efficiently. Once the alert comes in and the incident resolution starts, the actions taken in the next few minutes can make all the difference. Essential actions involve collaborating with team members and invoking specialized scripts for common issues like disk space shortages or server restarts.

Read Post

Squadcast

Read more about Streamline Incident Resolution with Squadcast's Outgoing Webhooks

PagerDuty Alternatives: Which is the Best for Your Team?

Apr 12, 2024 By OpsMatters In OpsMatters

PagerDuty is an incident management platform that uses its SaaS-based operations to prevent and manage business-related problems while maintaining a smooth customer experience. Used by developers, IT persons, and DevOps, PagerDuty ensures that businesses get the required data that could help them manage events that can impact their brand reputation and revenue. Their business-wide incident response, hundreds of integration tools, machine learning, on-call scheduling, and escalations make PagerDuty a popular incident management platform.

Read Post

OpsMatters

Read more about PagerDuty Alternatives: Which is the Best for Your Team?

The real cost of a blameful culture

Apr 10, 2024 By Lee Atchison In Blameless

In the fast-paced world of IT operations, the culture permeating an organization is critical to its success. It drives behavior, efficiency, and organizational accomplishment. A blame-centric culture is particularly detrimental, creating an environment where finger-pointing is more important than problem-solving and fear reduces innovation. This negative culture damages individual morale and erodes the organization's collective resilience.

Read Post

Blameless

Read more about The real cost of a blameful culture

Buy, Build, or Adopt: Seasoned Platform Engineers Weigh In

Apr 9, 2024 By Jorge Lainfiesta In Rootly

What's the secret to achieving the right balance for your platform?

Read Post

Rootly

Read more about Buy, Build, or Adopt: Seasoned Platform Engineers Weigh In

Introducing Squadcast and ServiceNow Bidirectional Integration For Enhanced Operational Efficiency

Apr 4, 2024 By Squadcast In Squadcast

Discover everything about the powerful ServiceNow Squadcast bidirectional integration, its key features and benefits, designed to streamline incident resolution and enhance collaboration within your DevOps and IT teams. Key takeaways:Accelerate Incident Response: Streamline incident response and accelerate resolution directly through Squadcast and ServiceNow Enhanced Learning and Retrospectives: Simplify tracking, retrospectives, and learning for your engineering team, ensuring a more efficient and productive incident management process.

View Video

Squadcast

Read more about Introducing Squadcast and ServiceNow Bidirectional Integration For Enhanced Operational Efficiency

Datadog on Site Reliability Engineering #shorts #datadog #observability

Apr 3, 2024 By Datadog In Datadog

There are many different ways to implement Site Reliability Engineering (SRE). From team structures to roles and responsibilities to planning and prioritization flows, there’s no golden path for how to organize things. As Datadog has shifted from a startup to a quickly-growing public company, we’ve seen our own SRE practice evolve. With over 22,000 customers sending trillions of data points each day, keeping Datadog reliable is critical to our business.

View Video

Datadog

Read more about Datadog on Site Reliability Engineering #shorts #datadog #observability

An SRE's Most Important Skill? Communication

Apr 3, 2024 By Nočnica Mellifera In Checkly

I wish someone had told me that I shouldn’t hop between frameworks. Just like learning four programming languages in your first year, in my experience spending time content switching as a beginner is wasted effort. If I’d spent a solid year learning how to deploy services on AWS, then when it was time to learn Azure, I’d see more similarities than differences and find it a lot easier to pick up a second public cloud.

Read Post

Checkly

Read more about An SRE's Most Important Skill? Communication

How Incidents Foster Leadership

Apr 3, 2024 By Zhuang (Strong) Liang In Rootly

To become battle-tested, you need to go through battles, not just read books or mentor newcomers. Both are helpful but the stakes are low. On the other hand, high stake jobs, such as running a big project or managing a team, are hard to get when you lack experience. So how can we solve this dilemma? Enter incident response.

Read Post

Rootly

Read more about How Incidents Foster Leadership

2024 SRE Report Insights: The Critical Role of Third-Party Monitoring in SRE

Apr 2, 2024 By Denton Chikura In Catchpoint

The 2024 SRE Report highlights a pivotal shift in how organizations approach the reliability and monitoring of their services, especially those that extend beyond their direct control. According to the report, 64% of organizations now recognize the importance of monitoring productivity or experience-disrupting endpoints, even beyond their physical control.

Read Post

Catchpoint

Read more about 2024 SRE Report Insights: The Critical Role of Third-Party Monitoring in SRE

Unleashing the Change Maker Within Webinar Preview

Apr 2, 2024 By Blameless In Blameless

Join us on April 16th at 10 a.m. PT for a 60-minute live webinar, where we'll discuss the secrets to driving change in your organization. We'll tackle two of reliability's biggest issues: getting budget and garnering support. Join us for Unleashing the Change Maker Within at 10 a.m. PST. We'll show you how to empower yourself to drive organizational change. Discover the secrets to selling your boss on the tools you need to automate your workflow and streamline your processes. We'll equip you with the strategies and insights to turn your great ideas into actionable plans.

View Video

Blameless

Read more about Unleashing the Change Maker Within Webinar Preview

Why and how to use site reliability golden signals

Apr 1, 2024 By Cortex In Cortex

Software complexity makes it harder for teams to rapidly identify and resolve issues. IT service management has evolved from an afterthought to a central part of DevOps. Microservices architectures are prone to delay or missed identification of such concerns. Monitoring mechanisms need to keep up with these complex infrastructures. Maintaining reliability and performance while harnessing this complexity requires a considered, data-driven approach.

Read Post

Cortex

Read more about Why and how to use site reliability golden signals

Future-Proofing IT Operations: Charter's Journey to Enhanced Reliability with Squadcast

Apr 1, 2024 By Squadcast In Squadcast

Discover the transformative journey of Charter, a leader in global IT services, towards achieving unmatched operational reliability through the strategic use of Squadcast in this insightful webinar recording. Chris Ardagh from Charter shares valuable insights and experiences, highlighting how advanced incident management practices with Squadcast have allowed the organization to redefine benchmarks in reliability engineering.

View Video

Squadcast

Read more about Future-Proofing IT Operations: Charter's Journey to Enhanced Reliability with Squadcast

Operations | Monitoring | ITSM | DevOps | Cloud

April 2024

Automation Triumphs Real-World DevOps Automation Implementations

Reinventing Deployments: From Docker to Dagger -- Incidentally Reliable with Solomon Hykes

Elevating Engineering Excellence: The Imperative of Site Reliability for Every Engineer

Back to the Future: The R-C-A of alerting

Insights of an Observability Advocate: The Challenges and Rewards

Comparing the Top 5 On-Call Management Software Solutions in 2024

A Day in Life of DevOps Engineer

Igniting Innovation: The Power of Empowered Engineers

Beyond SLAs: Rethinking Service Level Objectives in Incident Response

Launching Alert Studio

Bridging the IT-business comms gap comes down to this one word: Ask

Streamlining Incident Management with Squadcast's Workflows

Just hired an SRE? Five onboarding tips

SRE and the Enterprise: Building a Culture of Reliability at Scale

Unleashing the Change Maker Within: Secrets to Driving Change in Your Organization

What Is Denormalized Data?

Navigating On-Call Compensation for SREs: Strategies and Insights

Rootly Demo - April 2024

Squadcast Ranks in the Top 10 Incident Management Tools Report by G2

Streamline Incident Resolution with Squadcast's Outgoing Webhooks

PagerDuty Alternatives: Which is the Best for Your Team?

The real cost of a blameful culture

Buy, Build, or Adopt: Seasoned Platform Engineers Weigh In

Introducing Squadcast and ServiceNow Bidirectional Integration For Enhanced Operational Efficiency

Datadog on Site Reliability Engineering #shorts #datadog #observability

An SRE's Most Important Skill? Communication

How Incidents Foster Leadership

2024 SRE Report Insights: The Critical Role of Third-Party Monitoring in SRE

Unleashing the Change Maker Within Webinar Preview

Why and how to use site reliability golden signals

Future-Proofing IT Operations: Charter's Journey to Enhanced Reliability with Squadcast

Monthly Archive

Follow Us