Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Pinpoint performance issues in downstream services with the Dependency Map Navigator

Visibility into the upstream and downstream dependencies of your services is key to maintaining a performant microservices environment. Application developers and SREs rely on this visibility to quickly trace issues back to the source, which is essential during incidents—when time is of the essence—throughout day-to-day operations, and as systems evolve and scale.

Enhanced Incident Response: Maximizing Microsoft Teams with Squadcast

Off late more and more businesses are relying on ChatOps tools like Microsoft Teams for a range of functions beyond simple communication. Incident management is no exception to this growing trend. However, Microsoft Teams alone may not possess all the necessary capabilities to efficiently perform these functions. To bridge this gap, integration with core applications becomes necessary.

Take back control of your Monitoring

The challenges in the monitoring world are known widely. We all know about these problems, what they are, and why they are important. While each one of the problems has its own solution, it all boils down to one thing – COST. How do we balance the tradeoffs without worrying about the huge costs of solving these challenges? For high-precision monitoring and observability, you need efficient and high-precision control levers. Take back control of your Monitoring with Levitate - a managed time series data warehouse.

What Is Site Reliability Engineering? Understanding the complexities of this crucial function

Site reliability engineers manage a lot, and often in incredibly high-stakes environments. Remember that scene from "The Matrix" where Neo dodges bullets in slow motion? Of course you do. As an SRE, it can feel like you're the person getting hit by those bullets, frantically trying to investigate performance issues, automate away toil, and support the engineers around you, all before the next wave of attacks.

Improve Visibility and Capture More Data with Triage Incidents

As new incidents emerge, there are often many unknowns about the size, severity, and cause of the problem. Sometimes it’s not clear if the problem is an incident at all. That’s where introducing a triage stage to your incident management process can help. In this post, we’ll look at the benefits of adding a triage layer to your incident management, and how Rootly’s Triage feature allows you to seamlessly transition from triage to real incident (or false alarm).