%term

The latest News and Information on Service Reliability Engineering and related technologies.

Salesforce Cloud + Squadcast Integration: Routing Detailed Incident Alerts

Mar 17, 2022 By Vishal Padghan In Squadcast

Salesforce Cloud is one of the leading cloud-based customer relationship management (CRM) solutions. It provides a shared view of your customers and their relationship with the business. With Salesforce Cloud, users can automate service processes and streamline workflows. Squadcast is an end-to-end incident response tool. Built with an SRE mindset, it streamlines all the incident response activities. Squadcast aligns your teams towards a common organizational goal of better reliability.

Read Post

Squadcast

Read more about Salesforce Cloud + Squadcast Integration: Routing Detailed Incident Alerts

Observability for SRE & DevOps Engineer

Mar 16, 2022 By Amartya Gupta In Motadata

Software developments take place quickly as per the client’s requirements. The developments need to take place with safety and precautions. DevOps engineers can help into this matter; however, it is not possible without Observability.

Read Post

Motadata

Read more about Observability for SRE & DevOps Engineer

How to Implement Global View and High Availability for Prometheus

Mar 11, 2022 By Ricardo Castro In Squadcast

Ensuring that systems run reliably is a critical function of a site reliability engineer. A big part of that is collecting metrics, creating alerts and graph data. It’s of the utmost importance to gather system metrics, from several locations and services, and correlate them to understand system functionality as well as to support troubleshooting.

Read Post

Squadcast

Read more about How to Implement Global View and High Availability for Prometheus

What Does AIOps Mean for SREs? It's Complicated.

Mar 11, 2022 By JJ Tang In Rootly

If you’re an SRE, you might view AIOps with great excitement. By automating complex workflows and troubleshooting processes, AIOps could make your life as an SRE much easier. Alternatively, SREs may choose to view AIOps with disdain. They might think of AIOps as just a fancy buzzword that doesn’t live up to its promises, and that can become a distraction from the SRE tools that really matter. Which perspective is right?

Read Post

Rootly

Read more about What Does AIOps Mean for SREs? It's Complicated.

AppScope 1.0: Changing the Game for SREs and Devs

Mar 8, 2022 By The AppScope Team In Cribl

SREs and Devs are used to solving problems even when an awkward or inefficient way is the only way. In AppScope 1.0, SREs and Devs have a new alternative to standard methods, that the AppScope team thinks will make that problem-solving a lot more fun. We in the AppScope team constantly hear firsthand about life in the SRE trenches. For this blog, we “interview” a fictional SRE/Dev whose thoughts and comments are a mash-up of things we’ve heard from real people we know.

Read Post

Cribl

Read more about AppScope 1.0: Changing the Game for SREs and Devs

ServiceNow + Squadcast Integration: Automate IT Ticketing and Project Tracking

Mar 4, 2022 By Nir Sharma In Squadcast

ServiceNow is a workflow automation platform used by organizations for their IT ticketing and project management needs. In contrast, Squadcast is an end-to-end incident management and SRE platform that is used by organizations for their reliability requirements.

Read Post

Squadcast

Read more about ServiceNow + Squadcast Integration: Automate IT Ticketing and Project Tracking

What SREs Can Learn from Capt. Sully: When to Follow Playbooks

Mar 4, 2022 By Andre King In Rootly

When are you smarter than your playbooks, and when are your playbooks smarter than you? That’s a question that engineers rarely step back to consider. The rational, disciplined parts of our minds tell us that the playbooks we are supposed to follow were carefully designed and tested, and that we should stick to them at all costs.

Read Post

Rootly

Read more about What SREs Can Learn from Capt. Sully: When to Follow Playbooks

Golden Signals - Monitoring from first principles

Mar 2, 2022 By Safeer CM In Squadcast

Building a successful monitoring process for your application is essential for high availability. In the first of this three-part blog series, Safeer discusses the four key SRE Golden Signals for metrics-driven measurement, and the role it plays in the overall context of Monitoring. Monitoring is the cornerstone of operating any software system or application effectively. The more visibility you have into the software and hardware systems, the better you are at serving your customers. It tells you whether you are on the right track and, if not, by how much you are missing the mark.

Read Post

Squadcast

Read more about Golden Signals - Monitoring from first principles

Site Reliability Chats (Mar 2, 2022)

Mar 2, 2022 By Gremlin In Gremlin

Welcome to the first episode of Site Reliability Chats with your hosts Jason Yee @gitbisect and Julie Gunderson @julie_gund.

View Video

Gremlin

Read more about Site Reliability Chats (Mar 2, 2022)

Kubernetes Health Check Using Probes

Mar 2, 2022 By Squadcast Community In Squadcast

Kubernetes is an open source container orchestration platform that significantly simplifies an application's creation and management. Distributed systems like Kubernetes can be hard to manage, as they involve many moving parts and all of them must work for the system to function. Even if a small part breaks, it needs to be detected, routed and fixed. These actions also need to be automated. Kubernetes allows us to do that with the help of readiness and liveness probes.

Read Post