Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Unveiling Past Incidents: Accelerating Incident Resolution with Historical Context

Having the context of how similar issues were handled in the past can be invaluable. It can help incident responders grasp the nature of recurring problems, their causes, and effective solutions that have worked in the past. Introducing Squadcast’s Past Incidents feature that assists incident responders by presenting them with a list of similar past incidents related to the same service they are currently investigating.
Sponsored Post

Status Pages 101: Everything You Need to Know About Status Pages

Status Pages are critical for effective Incident Management. Just as an ill-structured On-Call Schedule can wreak havoc, ineffective Status Pages can leave customers and stakeholders, adrift, underscoring the need for a meticulous approach. Here are two, Matsuri Japon, a Non-Profit Organization and Sport1, a premier live-stream sports content platform, both integrate Squadcast Status Pages to enhance their incident response strategies discreetly. You may read about them later. Crafting these Status Pages demands precision, offering dynamic updates and collaboration.

Bill Kennedy: The mistake boot, building ACs, Black boxes & AI in software - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Top 5 Resiliency Trends of 2023

In today’s world, resilience is no longer a conditioned desire or methodology to try but has become a necessity for sustained success in software development and IT operations. As DevOps and Agile teams keep moving forward to cross boundaries, come up with new methodologies, and drive innovation, it is now important to have the ability to quickly recover from failures, adapt to changing conditions, and maintain high performance under pressure.

Streamlining Incident Management with our latest feature update: Merge Incidents

Hey folks! We‘re back with another nifty feature to your Incident Management tool arsenal. You now have the ability to merge incidents with a few clicks! With this latest update you can reduce the noise while dealing with a complex incident by merging incidents across services under a parent incident. Typically this can occur when multiple incidents stem from the same underlying issue or root cause.

Elastic AI Assistant for Observability

Harness the power of generative AI to turn insights into actions. Powered by the Elasticsearch Relevance Engine™ (ESRE™), Elastic’s AI Assistant (in technical preview for Observability) transforms problem identification and resolution by eliminating manual data chasing across silos to an interactive assistant that delivers accurate and context-aware remediation for SREs.

Seven Models of Cloud Native Applications

In today's cloud-driven landscape, organizations are transitioning from legacy monolithic systems to agile, scalable, and secure cloud-native solutions. Some are even forging new cloud-native applications. However, the concept of cloud-native design remains subjective, lacking a universal blueprint. This blog aims to provide clarity and guidance for designing precise cloud-native applications and container deployment.