Operations | Monitoring | ITSM | DevOps | Cloud

How to Reduce MTTR with PagerDuty and Relay

DevOps and SRE teams are under intense pressure to reduce the Mean Time to Recovery (MTTR) in resolving incidents. With the proliferation of cloud services and the increasing complexity of DevOps toolchains, engineers today need to not only learn how to use these services but also troubleshoot them when an incident is raised at 2 AM. Incident response is still manual today – cobbling together runbooks and ad hoc scripts and orchestrating people to respond.

Automation and changing needs, featuring Forrester

In an ever-changing world, the future of work is changing as well, and it has accelerated some areas of automation that we were already moving toward. I sat down with our guest speaker, Leslie Joseph, Principal Analyst Serving Application Development and Delivery at Forrester Research, for a webinar to discuss these questions and get a better understanding around how automation plays an important role in supporting companies through crises and preparing them for an uncertain future.

Datadog and Relay for Incident Response

Datadog is an awesome tool for aggregating and visualizing the metrics that matter to you. Recently, Datadog launched a new Incident Management feature, which allows you to coordinate the activities around a problem that affected your service. In this example, I’ll walk through using Relay to roll back a Kubernetes deployment that caused a service impact, and show how the Datadog Incident timeline can keep everyone working on the incident in sync.

The Event-Driven Web is Not the Future

When you see a notification on your smartphone, your brain processes the request quickly and determines how to react. It’s an efficient process and your nervous system is built for this use case. By contrast, most Internet-connected systems work in a less event-driven architecture. If there’s a change in one service, you won’t know about it until you check.

Authenticate Puppet Enterprise with FreeIPA using LDAP

Using a Linux Domain controller such as Red Hat Identity Management or FreeIPA? If so then the fields are a bit different than some other LDAP interfaces, which makes it difficult for some to connect to for authentication. Here is a quick how-to on setting up Puppet Enterprise with authentication from FreeIPA. I am assuming that you already have Puppet Enterprise installed with eyaml configured. If not, then you may want to visit these prerequisites.

How to Provision Cloud Infrastructure

One of the best things about cloud computing is how it converts technical efficiencies into cost-savings. Some of those efficiencies are just part of the tool kit, like pay-per-use Lambda jobs. Good DevOps brings a lot of savings to the cloud, as well. It can smooth out high-friction state management challenges. Sprucing up how you provision cloud services, for example, speeds up deployments. That’s where treating infrastructure the same as workflows from the rest of your codebase comes in.

Pulling the Strings: Event-Driven Automation with Relay!

Eric Sorenson and Melissa Sussmann discuss Puppet’s event-driven automation platform, Relay, and how it helps clean up the “DevOps Dumping Ground” left behind from a tangled web of gitops and cloud events. Ditch your digital duct tape with a repeatable and reusable platform. Home-grown glue logic is expensive and high risk compared to the alternative.

Take Control of your DevOps Dumping Ground with Relay!

As the automation surface area grows to accommodate hundreds of interconnected APIs on the cloud, developers are using their own, home-grown “digital duct tape” to manage a growing “DevOps dumping ground”. For a lot of organizations, home-grown glue logic is inconsistent, not repeatable, and expensive to maintain hundreds of event-based workflows and thousands of combinations. We believe that the answer lies in automation workflows.