Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Honeycomb + Squadcast Integration: Routing Incident Alerts Made Easy

Honeycomb is an application monitoring tool that helps DevOps and SRE teams to operate more efficiently by offering rich observability solutions and intuitive team collaboration. It helps understand complex relationships within your distributed systems and troubleshoot issues accordingly. Squadcast is an end-to-end incident response tool. Built with an SRE mindset, it streamlines all the incident response activities.

SRE Metrics: Four Golden Signals of Monitoring

SRE (site reliability engineering) is a discipline used by software engineering and IT teams to proactively build and maintain more reliable services. SRE is a functional way to apply software development solutions to IT operations problems. From IT monitoring to software delivery to incident response – site reliability engineers are focused on building and monitoring anything in production that improves service resiliency without harming development speed.

DevOps vs SRE - Reducing Technical Debt and Increasing Efficiency and Resiliency

One more blog topic stemming from our weekly office hours that we hold with the field team here at Shipa. In our last office hours, was asked a question about “what are the difference between DevOps Engineers and SREs?”. Both professions are emerging disciplines and cultures that continue to evolve and play an importance in technology organizations. I’ve been fortunate to have written and spoken about this before; though taking a fresh look at what the two domains try to accomplish.

Salesforce Cloud + Squadcast Integration: Routing Detailed Incident Alerts

Salesforce Cloud is one of the leading cloud-based customer relationship management (CRM) solutions. It provides a shared view of your customers and their relationship with the business. With Salesforce Cloud, users can automate service processes and streamline workflows. Squadcast is an end-to-end incident response tool. Built with an SRE mindset, it streamlines all the incident response activities. Squadcast aligns your teams towards a common organizational goal of better reliability.

How to Implement Global View and High Availability for Prometheus

Ensuring that systems run reliably is a critical function of a site reliability engineer. A big part of that is collecting metrics, creating alerts and graph data. It’s of the utmost importance to gather system metrics, from several locations and services, and correlate them to understand system functionality as well as to support troubleshooting.

What Does AIOps Mean for SREs? It's Complicated.

If you’re an SRE, you might view AIOps with great excitement. By automating complex workflows and troubleshooting processes, AIOps could make your life as an SRE much easier. Alternatively, SREs may choose to view AIOps with disdain. They might think of AIOps as just a fancy buzzword that doesn’t live up to its promises, and that can become a distraction from the SRE tools that really matter. Which perspective is right?

AppScope 1.0: Changing the Game for SREs and Devs

SREs and Devs are used to solving problems even when an awkward or inefficient way is the only way. In AppScope 1.0, SREs and Devs have a new alternative to standard methods, that the AppScope team thinks will make that problem-solving a lot more fun. We in the AppScope team constantly hear firsthand about life in the SRE trenches. For this blog, we “interview” a fictional SRE/Dev whose thoughts and comments are a mash-up of things we’ve heard from real people we know.

ServiceNow + Squadcast Integration: Automate IT Ticketing and Project Tracking

ServiceNow is a workflow automation platform used by organizations for their IT ticketing and project management needs. In contrast, Squadcast is an end-to-end incident management and SRE platform that is used by organizations for their reliability requirements.