“Service outage! Help!” These words (or their variations), have preceded notable losses of millions and billions of dollars in the 21st century. From large corporations to SMBs, no one is immune to the effects of downtime – whether planned or unplanned. However, the earlier an issue is noticed, the faster it is acted upon and resolved, resulting in little or no customer impact.
Follow these steps to write a great SRE job resume.
Stanza is a robust log agent. GCP users can use Stanza for ingesting large volumes of log data. Before we dive into the configuration steps, here’s a matrix detailing the functional differences between all the common log agents used by GCP users. Stanza was built as a modernized version of FluentD, Fluentbit, and Logstash. GCP users now have the ability to install Stanza to their VMs/ GKE clusters to ingest logs and route them to GCP log explorer.
First, I’d like to say that pager duty isn’t something we should treat like chronic pain or diabetes, where you just constantly manage symptoms and tend to flare-ups day and night. Being paged out of hours is as serious as a fucking heart attack. It should be RARE and taken SERIOUSLY. Resources should be mustered, product cycles should be reassigned, until the problem is fixed.
IT organizations are challenged with delivering quick, effective resolution to customers’ database, hardware or software downtime issues. Contractually binding service-level agreements (SLAs) place further pressure on IT engineers to accelerate incident resolution time and minimize downtime. Though engineers are obligated to meet their SLAs, they are unable to do so without the help of an automated alerting system.