Incidents almost never happen in a vacuum. When you receive an alert about a potential issue, odds are pretty good that you’ll need to navigate between different tools and teams to get things resolved. Of course, timing is critical in these situations, so the easier it is to communicate — between both tools and teams — the better off you’ll be.
How do you know that your open source project has been enthusiastically adopted by the community? A) Engineers give you a raucous standing ovation when a feature is revealed. B) People form a long line to meet you at an industry event. C) Every time there is a release, social media notifications blow up your phone. If you’re Grafana founder Torkel Ödegaard, the answer is D) all of the above.
Five years ago today, Grafana Loki was introduced to the world on the KubeconNA 2018 stage when David Kaltschmidt, now a Senior Director of Engineering at Grafana Labs, clicked the button to make the Loki repo public live in front of the sold-out crowd. At the time, Loki was a prototype: We bolted together Grafana as a UI, Cortex internals, and Prometheus labels to find out if there was a need for a new open source tool to manage logs.
Table of contents This is the second part of our 12-day Advent of Monitoring series. In this series, Checkly's engineers will share practical monitoring tips from their own experience. We encountered a tricky issue with our public dashboards: they were experiencing sporadic outages, happening about once every two days. The infrequency and unpredictability of these outages made them particularly challenging to diagnose.