April 2023

Scaling Site Reliability Engineering Teams the Right Way

Apr 28, 2023 By Biju Chacko In Squadcast

Most SRE teams eventually reach a point in their existence where they appear unable to meet all the demands placed upon them. This is when these teams may need to scale. However, it's important to understand that increasing team capacity is not the same as increasing the number of people on the team. Let's unpack what scaling a team is all about, what are the indicators, what are steps you can take, and how you know if you're done.

Read Post

Squadcast

Read more about Scaling Site Reliability Engineering Teams the Right Way

SRE vs. DevOps vs. Platform Engineering: What's The Difference?

Apr 25, 2023 By Shanika Wickramasinghe In Splunk

SRE, DevOps and Platform Engineering are important concepts in today's world of software development. There are dedicated teams to manage these areas, each with a unique primary focus, set of responsibilities, tools and metrics used to gauge their performance requirements. This article explains SRE, DevOps, and Platform Engineering, including similarities and differences, and, most importantly, how these teams help streamline modern software development, delivery, and maintenance processes.

Read Post

Splunk

Read more about SRE vs. DevOps vs. Platform Engineering: What's The Difference?

Learnings integrating jmxtrans

Apr 25, 2023 By Saurabh Hirani In Last9

JMX metrics give solid insights into the workings of your application. Integrating them with Levitate (our time series data warehosue) required us to jump some hoops with vmagent.

Read Post

Last9

Read more about Learnings integrating jmxtrans

2023 SRE Report

Apr 21, 2023 By Catchpoint

Now in its fifth year, The SRE Report has become the trusted source of trends and insights for reliability-as-a-feature practices. This year in partnership with Blameless, the report contains special contributions from Adrian Cockcroft and Steve McGhee and highlights findings from a global community of reliability practitioners, including SREs, managers, architects, and executives. As ever, we found some familiar trends and some thought-provoking anti-patterns.

Get Report

Catchpoint

Read more about 2023 SRE Report

Install Prometheus on Kubernetes: Tutorial & Examples

Apr 20, 2023 By Squadcast Community In Squadcast

As one of the most popular open-source Kubernetes monitoring solutions, Prometheus leverages a multidimensional data model of time-stamped metric data and labels. The platform uses a pull-based architecture to collect metrics from various targets. It stores the metrics in a time-series database and provides the powerful PromQL query language for efficient analysis and data visualization.

Read Post

Squadcast

Read more about Install Prometheus on Kubernetes: Tutorial & Examples

DevOps vs. SRE

Apr 20, 2023 By Sematext In Sematext

What is the difference between DevOps and SRE? In Short, DevOps should be an all-encompassing term for connecting the development team and operations team. However, DevOps tends to focus more on Deployment, whereas SRE focuses on Reliability.

View Video

Sematext

Read more about DevOps vs. SRE

What is SRE?

Apr 18, 2023 By Sematext In Sematext

SRE stands for Site Reliability Engineering and focuses on making sure your systems are always up and running. SRE teams are very similar to DevOps But have a few noticeable differences.

View Video

Sematext

Read more about What is SRE?

Incident Response Guide

Apr 17, 2023 By Squadcast Community In Squadcast

Site reliability engineering (SRE) is a critical discipline that focuses on ensuring the continuous availability and performance of modern systems and applications. One of the most vital aspects of SRE is incident response, a structured process for identifying, assessing, and resolving system incidents that can lead to downtime, revenue loss, and brand reputation damage.

Read Post

Squadcast

Read more about Incident Response Guide

High Cardinality? No Problem! Stream Aggregation FTW

Apr 15, 2023 By Piyush Verma In Last9

High cardinality in time series data is challenging to manage. But it is necessary to unlock meaningful answers. Learn how streaming aggregations can rein in high cardinality using Levitate.

Read Post

Last9

Read more about High Cardinality? No Problem! Stream Aggregation FTW

Is your incident management solution creating more problems than it solves?

Apr 11, 2023 By Aaron Lober In Blameless

When it comes to incident response, the ability to adapt and customize your approach is key. Every organization has unique needs and workflows, and a one-size-fits-all solution simply won't cut it. That's why Blameless is proud to offer a flexible platform that allows teams to tailor their incident response process to fit their exact requirements.

Read Post

Blameless

Read more about Is your incident management solution creating more problems than it solves?

MTTF vs MTBF vs MTTD vs MTTR

Apr 6, 2023 By Last9 In Last9

This article covers questions such as what are MTTF, MTBF, MTTD, and MTTR, their differences, how to adopt them, and their use cases.

Read Post

Last9

Read more about MTTF vs MTBF vs MTTD vs MTTR

Runbook Automation | What It Is & How To Do It

Apr 5, 2023 By Myra Nizami In Blameless

Looking into runbook automation? We explain how runbook automation works, with examples and tips on how to use it to streamline your incident response process.

Read Post

Blameless

Read more about Runbook Automation | What It Is & How To Do It

Squadcast + HaloPSA Integration: Enabling Streamlined Incident Response & Alerting

Apr 3, 2023 By Vishal Padghan In Squadcast

HaloPSA is a modern and intuitive all-in-one professional services automation (PSA) solution, designed for service providers. HaloPSA’s cloud platform helps you manage your entire business, modernize customer experience and automate your service. If you use HaloPSA for PSA requirements, you can integrate it with Squadcast, an end-to-end Incident Response and Reliability Workflow platform, to route detailed alerts from HaloPSA to the right users in Squadcast.

Read Post

Squadcast

Read more about Squadcast + HaloPSA Integration: Enabling Streamlined Incident Response & Alerting

Platform Engineering 101: Origins, Goals, DevOps vs SRE & Best Practices

Apr 3, 2023 By Muhammad Raza In Splunk

Platform engineering is the practice of automating infrastructure operations and enabling self-service infrastructure capabilities within collaborative Dev, Ops and QA teams. It involves designing and building platforms, technologies and workflows that enable self-service capabilities to automatically manage, provision and operate complex modern software architecture environments.

Read Post

Splunk

Read more about Platform Engineering 101: Origins, Goals, DevOps vs SRE & Best Practices

Reduce time to detect with AppDynamics Cloud Log Analytics

Apr 3, 2023 By Linda Zhou In AppDynamics

How machine learning in AppDynamics Cloud accelerates log analysis and reduces mean time to detect. Site recovery engineers (SREs) need to investigate unknown problems reported in production. The common approach is to search and filter log files to find the root cause, and we all know how painful it is to sift through log contents. It’s like finding a needle in a haystack. A machine learning approach is essential to assist SREs to quickly identify the root cause.

Read Post

AppDynamics

Read more about Reduce time to detect with AppDynamics Cloud Log Analytics

Operations | Monitoring | ITSM | DevOps | Cloud

April 2023

Scaling Site Reliability Engineering Teams the Right Way

SRE vs. DevOps vs. Platform Engineering: What's The Difference?

Learnings integrating jmxtrans

2023 SRE Report

Install Prometheus on Kubernetes: Tutorial & Examples

DevOps vs. SRE

What is SRE?

Incident Response Guide

High Cardinality? No Problem! Stream Aggregation FTW

Is your incident management solution creating more problems than it solves?

MTTF vs MTBF vs MTTD vs MTTR

Runbook Automation | What It Is & How To Do It

Squadcast + HaloPSA Integration: Enabling Streamlined Incident Response & Alerting

Platform Engineering 101: Origins, Goals, DevOps vs SRE & Best Practices

Reduce time to detect with AppDynamics Cloud Log Analytics

Monthly Archive

Follow Us