Most SREs and IT Ops manage Java applications without source code access or communication with AppDev teams. When applications have performance issues those SREs or IT Ops teams deploying and maintaining the infrastructure often have to prove that it is the application at fault and supply information to the app supplier which provides evidence of the issue.
Tracing of “runnables” is a fairly new feature in Percepio Tracealyzer, added in v4.7.0. One of our automotive customers needed this feature to make ISO 26262 certification of their Electronic Control Unit (ECU) software easier. In order to properly allocate ECU functions to tasks and to cores, and to ensure that they meet the budgeted resources, it is useful to know execution times, response times and wait times for each task and runnable.
Years before founding Logz.io, I was a software engineer, working with various tools to ensure my products and services performed correctly. There were few tools I dreaded using more than application performance management (APM), and I know that I’m not alone. I hated traditional APM. It’s heavy. It’s hard to implement. It’s expensive. It takes a very long time to derive business value.
As applications in the cloud become more distributed and complex, the Mean Time To Resolution (MTTR) for production issues is getting longer. Modern systems are built with hundreds of distinct, ephemeral, and interconnected cloud components, which can make it exceptionally hard for engineers to understand the current state of their applications, what problems are impacting customers, and why those problems are occurring.
In a previous blog post, we explained how containers’ CPU and memory requests can affect how they are scheduled. We also introduced some of the effects CPU and memory limits can have on applications, assuming that CPU limits were enforced by the Completely Fair Scheduler (CFS) quota. In this post, we are going to dive a bit deeper into CPU and share some general recommendations for specifying CPU requests and limits.