AIOps Essentials: How to use Distributed Tracing for Root Cause Analysis | AIOps Use Cases (4/5)

AIOps Essentials: How to use Distributed Tracing for Root Cause Analysis | AIOps Use Cases (4/5)

Jan 12, 2023

Artificial intelligence for IT operations (AIOps) is a way to automate tasks that are typically carried out by site reliability engineers (SREs). It aims to make the lives of SREs easier by helping them reduce the amount of noise coming from systems, surface issues more easily, and perform root cause analysis by correlating data from different systems.

In this video, we discuss using AIOps and machine learning for root cause analysis, specifically looking at how to find the source or origin of an issue in our systems. We use the OpenTelemetry demo with 11 microservices, which includes a feature flag and flags UI that allows us to trigger a problem when fetching a specific product in a store. By enabling the failure and letting it run for a certain amount of time, we can see how the failure propagates through the system using distributed tracing in the APM. We also show how to use machine learning to find correlations between latency and failing transactions to pinpoint the exact product causing the issue.

Chapters:

00:00 - Root Cause Analysis / Introduction

02:11 - Simulating a failure

03:20 - APM Service Map

05:10 - Failure in Product Catalog Service

06:00 - Latency Distribution

06:38 - Failed Transactions Correlation

08:40 - Confirming the Root Cause

10:30 - Conclusion

Additional Resources:

Start the 14-day trial for free! No credit card required: https://cloud.elastic.co/registration
Subscribe to Elastic’s Community YT channel: https://www.youtube.com/c/OfficialElasticCommunity

Connect with us on social media:
LinkedIn: https://www.linkedin.com/company/elastic-co
Twitter: https://twitter.com/elastic
Facebook: https://www.facebook.com/elastic.co

About Elastic
Elastic is the leading platform for search-powered solutions, and we help everyone — organizations, their employees, and their customers — find what they need faster, while keeping applications running smoothly, and protecting against cyber threats. When you tap into the power of Elastic Enterprise Search, Observability, and Security solutions, you’re in good company with brands like Netflix, Uber, Slack, Microsoft, and thousands of others who rely on us to accelerate results that matter.

#DistributedTracing #AIOps #Observability #DevOps #ElasticObservability #RootCauseAnalysis #MachineLearning #APM