If you are trying to figure out how to measure the performance of your application, you are in the correct place. We spend a lot of time at Stackify thinking about application performance, especially about how to monitor and improve it. In this article, we cover some of our most important application performance metrics you should be tracking.
DevOps engineers are under intense pressure to provide reliable, high-quality services to teams and stakeholders. In large part, this is because end users today demand seamless access to software and a great user experience – a trend that will only increase as digital transformation accelerates and we move further into the future. DevOps professionals rely on various metrics to meet performance and reliability goals, one of the most important being service level objectives (SLOs).
It’s fair to say that delivering software faster has never been more relevant. But in doing so, it’s easy to let your bar for quality slip. Often, the guardrail to avoid this is to hire dedicated QA Engineers, whose sole job is to ensure your software works as it should and to spot any issues that arise. Seems sensible, right? Well, at incident.io, we take a different approach.
Site reliability engineers manage a lot, and often in incredibly high-stakes environments. Remember that scene from "The Matrix" where Neo dodges bullets in slow motion? Of course you do. As an SRE, it can feel like you're the person getting hit by those bullets, frantically trying to investigate performance issues, automate away toil, and support the engineers around you, all before the next wave of attacks.
Since Grafana started 10 years ago, there have been more than 43,000 commits to the open source project. Grafana founder Torkel Ödegaard has made more than 7,600 of those commits, and he recently reflected on some personal favorites he’s worked on, ranging from early query builders to the latest navigation updates. Torkel isn’t the only one who has strong feelings.
Perhaps the worst IT scenario an organization can face is an unexpected and forced suspension of all its operations. The downtime that’s experienced in such a situation can lead to financial damages that far exceed those from lost data or hits to reputation. While cyberattacks vary in intensity and approach, downtime and catastrophic loss of data come in many more forms and are equally, if not more, difficult to avoid.