Dynamic Service Graph | Tigera - Long

Dynamic Service Graph | Tigera - Long

May 4, 2021

Challenge:
Downtime is expensive and applications are a challenge to troubleshoot across a dynamic, distributed environment consisting of Kubernetes clusters. While development teams and service owners typically understand the microservices they are deploying, it’s often difficult to get a complete, shared view of dependencies and how all the services are communicating with each other across a cluster. Limited observability makes it extremely difficult to troubleshoot end-to-end connectivity issues which can impact application deployment.

Solution:
The Dynamic Service Graph available in Calico Enterprise and Calico Cloud provides visibility across the stack from network to application layer (L3 – L7) based on the actual network activity it observes in your cluster, and not what is configured. In doing so, it provides the most accurate and relevant view of how services are operating in your Kubernetes cluster.

The Dynamic Service Graph can generate a detailed visualization of the cluster environment that enables anyone to easily understand how microservices are behaving and interacting with each other at run-time, simplifying the debugging process. It provides DevOps, SREs and service owners with a point-to-point, topographical representation of network traffic within a cluster that shows how workloads within the cluster are communicating, and across which namespaces.

Along with this information, the Dynamic Service Graph provides metadata on ports, protocols, how network policies are being evaluated, and other details that help Kubernetes teams understand how end-to-end communication is occurring. Performance hotspots are automatically identified and highlighted, and alerts are provided in the context of the Service Graph. The DYnamic Service Graph also includes advanced capabilities to filter resources, save views, and troubleshoot DNS issues.

Benefits:
The unique perspective provided by the Dynamic Service Graph’s expansive topographic view helps to speed the identification and troubleshooting of connectivity issues that could be impacting applications running in a cluster. Software engineers can quickly identify the source of a problem at the application, process, and socket levels as well as through an automated packet capture function. Dev teams can design for downstream dependencies and significantly decrease the time it takes for new DevOps and SREs to ramp up, thus increasing the overall productivity of the Kubernetes team.