Portland, OR, USA
May 24, 2021   |  By Mike Mackrory
Suppose your organization is concerned about optimizing the availability of its services or overly worried about being reliant on a single cloud provider. In that case, you might be considering moving to a multicloud solution. Migrating from a single provider to multiple providers is a lot more complicated than deploying your applications in a different environment. Moving to a multicloud solution presents many unique and complicated challenges in terms of both deployment and support.
May 19, 2021   |  By Chris Tozzi
Lots of folks would have you believe that multicloud is the way to go – with good reason. There are lots of benefits of a multicloud architecture. Yet if you’re an SRE, deploying workloads on multiple clouds isn’t all fun all the time.
May 16, 2021   |  By Mike Mackrory
Site Reliability Engineering – or SRE – is what happens when you ask software engineers to design an operations function. That is how Google VP Ben Sloss described the background and definition of SRE. The idea of SRE originated at Google. Since its inception, more and more organizations are adopting tactics related to SRE to maintain their competitive edge in information technology’s rapidly evolving world.
May 12, 2021   |  By Chris Tozzi
If you start typing “DevOps vs.” in your search engine, you’ll probably see that “DevOps vs. SRE” is one of the top queries that people search for. But so are terms like “is SRE DevOps?” If Google autocomplete is to be believed, then, there is a fair amount of uncertainty out there about what, exactly, the SRE role has to do with DevOps. Some see SRE and DevOps as distinct concepts, while others apparently think that they mean more or less the same thing.
May 11, 2021   |  By Steve Tidwell
It’s the year 2021, and Site Reliability Engineering (SRE) has become one of the fastest growing and hottest professions in the tech industry. With all of the attention on SRE, many software developers and operations engineers are now interested in moving into this burgeoning field. There is an enormous amount of information about SRE on the Internet – some helpful, some not so much. It can be hard to know where to begin.
May 4, 2021   |  By Theo Despoudis
Kubernetes (K8s) is a container orchestration platform that facilitates workloads for applications or services in a scalable manner. Out of the box and only in a favourable path, K8s is a wonderful platform that magically deploys and manages workloads and restart-services when a service is not ready or is unhealthy. K8s works behind the scenes, though there are many hidden caveats in terms of making sure the platform stays stable and unfettered.
Apr 29, 2021   |  By Konstantin Ostrovsky
This blog post builds on the knowledge from my previous gRPC posts (Part 1: gRPC vs. REST, Part 2: A Breakdown of gRPC in Practice, and Part 3: Using gRPC in your Front End Application), however it can also be read as a standalone. More than a year has passed since we wrote the first line of code at StackPulse. Having previously used REST API, we wanted to use gRPC as our protocol of choice both for internal and external communication.
Apr 26, 2021   |  By Steve Tidwell
Site Reliability Engineering (SRE) has become a hot topic over the last few years. It seems like everyone has been talking about it, but if you ask different people or organizations what SRE means to them, you will likely receive different and highly nuanced answers. One thing that is consistent across all of these definitions, though, is the idea that there is a core set of best practices that engineering organizations can adopt to achieve more reliable systems.
Apr 21, 2021   |  By Chris Tozzi
If you’ve managed reliability for either a microservices or a monolithic app, you know that – as we detailed in an earlier blog post – both types of environments come with their own reliability challenges. What can you do about those challenges? Which best practices should SREs adopt in order to simplify reliability for both microservices and monoliths? Read on for guidance.
Apr 19, 2021   |  By Jonathan Brown
A customer-impacting incident can mean all-hands-on-deck until the issue is mitigated. Typically, you’d have a runbook or a wiki that you’d follow to begin assessing the impact, triage, analysis, etc. Somewhere in that process, you have steps to set up team communications for the incident at hand. This involves more basic, administrative-type tasks that someone would need to complete before you can really get the ball rolling on resolving the issue.
Dec 1, 2020   |  By jeannie CHRISTENSEN
This 2 minute video demo shows how StackPulse works with GitHub Actions.
Oct 12, 2020   |  By Nuaware
Leonid, co-founder and CTO of StackPulse joins Luke to discuss SRE platform automation and why incident response as code is the future for enterprises.

StackPulse empowers SREs and developers to reduce toil, remediate incidents faster, and build more reliable services.

StackPulse gives engineers and SREs everything needed to build and run more reliable services - during incidents, at deployment, or when writing code. By centralizing and automating reliability across an entire environment, StackPulse makes it easy to manage services in production at any scale.

Let’s build a more reliable world:

  • Automated Alert Enrichment: When an alert is triggered, StackPulse automatically triages and enriches the alert with impact, environment details, and root cause analysis. This context is delivered in real-time to on-call teams - simplifying remediation and helping drive down both MTTD and MTTR.
  • Powerful Playbooks to Reduce Toil: StackPulse playbooks are powerful code-based workflows that investigate, remediate and maintain your software services - reducing toil for your teams. Playbooks can be imported from StackPulse’s playbook library, built via drag and drop step builder, or deployed as part of a GitOps workflow.
  • Centralized Knowledge and Insight: StackPulse automatically documents and analyzes incident details and remediation patterns - centralizing tribal knowledge and delivering recommendations to proactively improve your services.
  • Easy to Integrate; Simple to Scale: StackPulse deploys in minutes, with out-of-the-box integrations to your existing alerting, on-call and compute stacks. With a consistent framework for playbook trigger and execution, StackPulse lets you modify or scale your underlying environment with no impact to your incident response practice.
  • Best Practice Playbooks: The StackPulse playbook library contains tested and verified steps and playbooks - making it easy to quickly improve your MTTR and MTTD in common scenarios. Playbooks can be easily exported or shared for portability.

Reliability at your fingertips.