It is not uncommon to have multiple monitoring solutions for IT infrastructure these days as distributed architectures take hold for many enterprises. We often hear from Google Cloud Platform (GCP) customers that they use Stackdriver to monitor resources as well as Grafana and Prometheus for container monitoring. We’ve heard lots of requests from customers to be able to view Stackdriver data in Grafana effortlessly.
Every software organization faces challenges in keeping applications available and running reliably. At Google, we’ve developed and practiced a discipline known as Site Reliability Engineering (SRE). Following SRE practices lets us build and operate services reliably for our billions of users. Google has about 2,500 Site Reliability Engineers who support both internal and external services.