|
By Ashley Sawatsky
This blog post is adapted from my talk at SRECon EMEA 2023 - original slides are available here! Status pages are a simple yet underutilized element of incident communication. Done well, they’re a low-lift way to keep your customers and stakeholders informed when incidents impact them. But without a solid approach, updating status pages can easily become a tedious and often neglected task during incidents. In this post, we’ll cover some tips to get your status page right.
|
By Ashley Sawatsky
You’re in the incident channel rocking yet another incident. Comms are flowing, resolution is in sight, the team is grinding, and you’re feeling good. Then…
|
By Rohit Ghumare
In today’s world, resilience is no longer a conditioned desire or methodology to try but has become a necessity for sustained success in software development and IT operations. As DevOps and Agile teams keep moving forward to cross boundaries, come up with new methodologies, and drive innovation, it is now important to have the ability to quickly recover from failures, adapt to changing conditions, and maintain high performance under pressure.
|
By JJ Tang
We’re proud to share that we've been recognized as a High Performer and Enterprise Leader in Incident Management for the sixth consecutive quarter in the G2 Summer 2023 Report! In total, Rootly received nine G2 awards in the Summer Report.
|
By Hans Chung
Let’s be honest. When you see an alert pop up on your phone, you aren’t thinking “according to section 12 of our most recent SRE handbook used at training 6 months ago I need to keep in mind who should be Incident Commander and who should be Ops Lead”. You’re an engineer at heart.
|
By Ashley Sawatsky
Very few SaaS products exist completely independently. Between cloud service providers, payment processors, content delivery networks, and more, chances are you rely on external systems to keep your product working. When these systems fail, it can leave you feeling pretty helpless. In some cases you might have fallback options, but oftentimes all you can do is wait for recovery and clean up the fallout.
|
By JJ Tang
We are excited to announce that we have raised a $12M round of financing led by Renegade Partners with participation from Google Gradient Ventures (Google’s AI-focused venture fund) and XYZ Ventures. This brings our total funding to date to $15.2M ($20M CAD) alongside our other existing investors Y Combinator and 8VC.
|
By Rajesh Tilwani
Creating just any infrastructure on Kubernetes is not enough. There are so many basic configurations you could apply and create the infrastructure for your application for the time being and it might work just fine. The incident responses won’t always remain 100% reliable. You will run into newer potholes, and that’s okay.
|
By Ashley Sawatsky
We’ve all seen it: a company experiencing a major incident and going radio silent, leaving their customers to wonder “Are they doing something about this?!”. If you’ve ever been on the inside of something like this, you know the answer is most likely yes, there are people working hard to put out the fire as quickly as possible. But when it comes to incidents, perception is reality for customers.
|
By Ashley Sawatsky
As new incidents emerge, there are often many unknowns about the size, severity, and cause of the problem. Sometimes it’s not clear if the problem is an incident at all. That’s where introducing a triage stage to your incident management process can help. In this post, we’ll look at the benefits of adding a triage layer to your incident management, and how Rootly’s Triage feature allows you to seamlessly transition from triage to real incident (or false alarm).
- November 2023 (1)
- October 2023 (1)
- September 2023 (2)
- August 2023 (4)
- July 2023 (2)
- January 2023 (1)
- October 2022 (1)
- July 2022 (1)
- June 2022 (1)
- May 2022 (2)
- April 2022 (1)
- March 2022 (3)
- February 2022 (3)
- January 2022 (4)
- December 2021 (3)
- November 2021 (4)
- October 2021 (5)
- September 2021 (3)
- August 2021 (4)
- July 2021 (5)
- June 2021 (3)
- May 2021 (4)
- April 2021 (5)
- February 2021 (1)
Rootly is a turnkey incident response command centre that brings the best reliability practices from Google, Netflix, Amazon to those without a million-dollar budget.
Rootly is an all-in-one platform that streamlines collaboration, communication, and learning. It automates away manual toil engineers suffer through today and captures data-driven insights. With Rootly, companies accelerate their incident resolution and learn how to prevent them in the future.
Teams depend on Rootly to improve their reliability:
- Collaborate: Seamlessly handoff alerts from PagerDuty to quickly declare incidents from your tool of choice like Slack. Automatically involve all the right teams in seconds, not minutes. Beyond just engineering but loop in legal, support, and sales. With intelligent workflows, no more wondering what team owns which service or who should be responsible for what. Rootly does the heavy lifting for you.
- Communicate: Build your incident timeline through Web or Slack. Autolink war rooms with our Zoom & Google Meet integrations. Rich and customizable private and public status pages ensure everyone is updated while you focus on what you do best, fighting fires.
- Remediate: Enrich your timeline with automated Genius workflows. Fetch relevant information as recent git commits of your impacted services. Customize your workflows based on any incident condition.
- Retrospective: Learn from incidents with beautiful postmortems engineers want to write without the manual toil of copy and pasting. Accurately replay past incidents to help simulate real world disaster scenarios to train engineers faster and keep their tools sharp. Organized and easily shared, not buried in a Google Doc that can’t be found.
All-in-one incident response platform for humans.