Alerting

blameless

2020 SRE Predictions

It’s a new year, so what will 2020 have in store for SRE? Here’s our two cents: SRE adoption will only continue to grow. However, the practice and culture shift, rather than the role, will take priority in 2020. More people (not just SREs) will have a reliability mindset, shifting reliability left through the software lifecycle. SLIs, SLOs, and error budget policies will become common practice to make this shift actionable.

victorops

How AIOps Drives Efficient DevOps and IT Operations Automation

Artificial Intelligence (AI) is a loaded term. While “Strong AI” – technology that can simulate and even surpass human brain activity – is still a thing of Sci-Fi, machine learning is a real-world application of AI, a way to train machines to learn from specific sets of data. One example of ML in the wild is the Nest Thermostat. For the first few weeks after installation, users must manually regulate the temperature as desired at different times of the day.

opsgenie

Okta: Atlassian product suite most popular app of the year

Atlassian and Opsgenie are among the most popular apps in the Okta network this year, according to a new report from the security company. From the report: Okta’s Business @ Work 2020 Report takes an in-depth look at how organizations and people work, exploring industries and customers, and the applications and services they use to harness productivity.

victorops

Best Practices for Status Pages

Status pages have become the end-users window into your team’s operations. Companies with status pages are doing the right thing for their users, building in some transparency while mitigating frustration and support contact. In order for the benefits of status pages to pay off, organizations need to treat them as something more than active wiki-pages run by support.

bigpanda

Embracing Chaos With BigPanda's Root Cause Analysis Features

The ever-growing complexity, scale and pace of IT environments puts a huge burden on IT Ops, NOC, and DevOps teams, who are tasked with keeping these environments up and running. One of the biggest challenges is Root Cause Analysis (RCA). When something breaks, they need to determine what broke it, and they need to do it fast.

victorops

When You Shouldn't Use Infrastructure Automation

Automation, automation, automation. If Steve Ballmer were doing his developers dance today, instead of in 2001, those are probably the words he would have been shouting. After all, we’re told that automation is the key to building agile, cloud-native, highly-available, high-performing systems (to name just a handful of buzzwords that go hand-in-hand with automation today).