Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

How Changelog monitors and optimizes website performance with Grafana Cloud

Developers around the world get their news from Changelog, an indie media company on a mission to create inspiring content for software developers. Through their popular podcasts, including The Changelog, Go Time, JS Party, and Ship It!, the team at Changelog helps listeners stay up-to-date on the latest happenings, trends, and tools in a constantly evolving industry.

How We Use Sloth to do SLO Monitoring and Alerting with Prometheus

One of the most challenging tasks for Site Reliability Engineers is to align the reliability of the systems with the business goals. There is a constant battle between delivering more features—which increases the product’s value—and keeping the system reliable and maintainable. A significant ally to achieve both objectives is the Service Level Objective Framework.

Differences between Site Reliability Engineer Vs. Software Engineer Vs. Cloud Engineer Vs. DevOps Engineer

The evolution of Software Engineering over the last decade has lead to the emergence of numerous job roles. So how different is a Software Engineer, DevOps Engineer, Site Reliability Engineer and a Cloud Engineer from each other? In this blog, we drill down and compare the differences between these roles and their functions.

SRE vs. DevOps: What Are the Differences and How Can They Work Together?

The growing importance of technology in business success has forced practically all companies to hire competent, experienced IT professionals. As technology ecosystems become increasingly complex, organizations need a broader range of professionals to focus on tasks like product development, troubleshooting, and customer services. SRE and DevOps have emerged as two of the most critical approaches to success.

Top 13 Site Reliability Engineer (SRE) Tools

The role and responsibilities of a site reliability engineer (SRE) may vary depending on the size of the organization. For the most part, a site reliability engineer is focused on multiple tasks and projects at one time, so for most SREs, the various tools they use reflect their eve-evolving responsibilities. A typical SRE is busy automating, cleaning up code, upgrading servers, and continually monitoring dashboards for performance, etc., so they are going to see more tools in that toolbelt.

Site Reliability Engineering: Top SRE Tools As Voted On By SREs

Catchpoint is proud to present the top SRE tools as voted on by SREs. In our fourth annual SRE Survey, compiled in partnership with VMware Tanzu Observability and DevOps Institute, we simply asked, “What are a few tools that every SRE should have available in their toolbelt?” Today, we are excited to share the findings with you. While some of the answers were not strictly tools, the analysis gives us valuable insight into the mindset of an SRE.

What SREs Can Learn from Facebook's Largest Outage

Facebook’s October 2021 outage was the type of event that gives SREs nightmares: A series of critical business apps crashed in minutes and remained unavailable for hours, disrupting more than 3.5 billion users around the world and costing about 60 million dollars. As incidents go, this was a pretty big one.

4 xMatters Use Cases That May Surprise You

xMatters is part technology, part service reliability, and a little bit of magic. If you’ve spent time on the xMatters website, you’ll likely have seen a number of valuable use cases for the platform—it can alert SREs when there’s a website outage, it can accelerate product development for DevOps teams, it can manage on-call schedules and alerts for support teams.