Portland, OR, USA
Apr 5, 2021   |  By Scott Fitzpatrick
Over the past decade, the processes for effective application delivery have evolved significantly. We have moved from waterfall to agile, from manual to automated, and from siloed IT operations and development teams to an approach that enables collaboration across domains. This movement to agile and DevOps has been, in part, about turning development practices into a continuous engineering workflow.
Apr 5, 2021   |  By Scott Fitzpatrick
When it comes to software, customer expectations have increased significantly in the past decade. Users expect applications that are high-performing and highly available, and this expectation is being bolstered by the highly competitive marketplace. In short, with a plethora of options available in nearly every industry, there is little reason for customers to compromise on user experience.
Mar 30, 2021   |  By Or Elimelech, Site Reliability Engineer
Glean six key insights you can apply to site reliability engineering from StackPulse's own SRE.
Mar 30, 2021   |  By Cordny Nederkoorn
In a previous article on Site Reliability Engineering (SRE), we discussed the SRE role and why it’s in high demand. At the end of the article, some advice was given on how to become an SRE. It is not easy. But with the right determination, it can be done. This article will highlight the skills you need to become an SRE.
Mar 22, 2021   |  By Chris Tozzi
The beauty of a generic mitigation is that it solves a wide variety of problems using a single solution. When executed smoothly, generic mitigations allow fast remediation without requiring a great deal of troubleshooting or incident analysis. But the key words there are “when executed smoothly.” If you don’t have the right plan in place for operationalizing generic mitigations, they may not deliver the fast, simple remediations that they should.
Mar 15, 2021   |  By Eldad Rudich
Recently, I started to read the new and invaluable book “Software Engineering at Google”. It’s a great book by Google, describing their engineering practices across many different domains. One of the first chapters discusses the matter of making a “scalable impact,” which I find very interesting, and something that I believe has been overlooked by many organizations.
Mar 9, 2021   |  By Steve Tidwell
In recent years, many engineering organizations have embraced DevOps as a means to improve the software development lifecycle and increase software quality. One of the key pillars of the DevOps philosophy is to “measure everything.” Site Reliability Engineering (SRE) implements processes and actions that seek to make DevOps and DevOps methodologies a reality.
Mar 4, 2021   |  By Steve Tidwell
Unreliable services can affect businesses in myriad ways, from slowed development velocity, to unhappy users, to impacted revenue streams. Reliability often takes a backseat to feature releases and other business initiatives that drive development requirements. This post will discuss key elements of SRE practice that you can use to instill a reliability-first culture in your organization, while also meeting business requirements and keeping your users happy.
Mar 2, 2021   |  By Scott Fitzpatrick
One of the most important aspects of a software system is its reliability - and for good reason. With so many digital options available in every industry, customers have little reason to continue utilizing applications or services that experience frequent issues with quality or availability. Therefore, it's critical that organizations invest in the processes and tools that are necessary to ensure system reliability. Utilizing reliability platforms is one way to increase or maintain the quality and reliability of an application. Keep reading for an overview of the functionality provided by reliability platforms and the specific ways in which such platforms provide value to the business.
Mar 2, 2021   |  By Theo Despoudis
Site Reliability Engineering (SRE) can mean different things to different companies; and operators that are responsible for reliability typically use a DevOps toolset. However, one thing is certain: SREs combine the skills of software engineers and production and operations management, to achieve high reliability and ensure that SLO/SLA targets are met. So SREs not only need to get a firm grip on the technologies involved in the system, but also on the intricacies of production deployments.
Dec 1, 2020   |  By jeannie CHRISTENSEN
This 2 minute video demo shows how StackPulse works with GitHub Actions.
Oct 12, 2020   |  By Nuaware
Leonid, co-founder and CTO of StackPulse joins Luke to discuss SRE platform automation and why incident response as code is the future for enterprises.

StackPulse empowers SREs and developers to reduce toil, remediate incidents faster, and build more reliable services.

StackPulse gives engineers and SREs everything needed to build and run more reliable services - during incidents, at deployment, or when writing code. By centralizing and automating reliability across an entire environment, StackPulse makes it easy to manage services in production at any scale.

Let’s build a more reliable world:

  • Automated Alert Enrichment: When an alert is triggered, StackPulse automatically triages and enriches the alert with impact, environment details, and root cause analysis. This context is delivered in real-time to on-call teams - simplifying remediation and helping drive down both MTTD and MTTR.
  • Powerful Playbooks to Reduce Toil: StackPulse playbooks are powerful code-based workflows that investigate, remediate and maintain your software services - reducing toil for your teams. Playbooks can be imported from StackPulse’s playbook library, built via drag and drop step builder, or deployed as part of a GitOps workflow.
  • Centralized Knowledge and Insight: StackPulse automatically documents and analyzes incident details and remediation patterns - centralizing tribal knowledge and delivering recommendations to proactively improve your services.
  • Easy to Integrate; Simple to Scale: StackPulse deploys in minutes, with out-of-the-box integrations to your existing alerting, on-call and compute stacks. With a consistent framework for playbook trigger and execution, StackPulse lets you modify or scale your underlying environment with no impact to your incident response practice.
  • Best Practice Playbooks: The StackPulse playbook library contains tested and verified steps and playbooks - making it easy to quickly improve your MTTR and MTTD in common scenarios. Playbooks can be easily exported or shared for portability.

Reliability at your fingertips.