Operations | Monitoring | ITSM | DevOps | Cloud

Introducing the Lightstep Metrics plugin for Grafana

Chris Sackes is a Software Engineer at Lightstep. A New Yorker by birth, he loves public transportation, architecture photography, and urban exploration. He’s spent the last five years engineering delightful user experiences for a variety of applications. Lightstep’s powerful metrics reporting and analysis are now available for Grafana users. Using the new Lightstep Metrics plugin for Grafana, you can view metrics data reported to Lightstep directly in your Grafana instance.

Monitoring Amazon cloudfront with Graphite via Graphite APIs

MetricFire offers a complete system, infrastructure, and application monitoring using a suite of open-source monitoring tools. With MetricFire, you can monitor all your infrastructure on a single dashboard. The platform displays metrics on the dashboard using either Hosted Prometheus or Graphite-as-a-Service.

How Lowe's SRE reduced its mean time to recovery (MTTR) by over 80 percent

The stakes of managing Lowes.com have never been higher, and that means spotting, troubleshooting and recovering from incidents as quickly as possible, so that customers can continue to do business on our site. To do that, it’s crucial to have solid incident engineering practices in place. Resolving an incident means mitigating the impact and/or restoring the service to its previous condition.

The selling doesn't stop once the contract is signed

I have a long-time N-able partner whose account I managed off and on over the years. Although I am no longer in that role, we still keep in touch, chatting regularly about how their business is doing, and discussing their successes or any challenges they might currently be facing. This year, they set some pretty aggressive growth targets for their organization. Their revenues were off due to the pandemic, so they needed to regroup and double-down to make 2021 a more profitable year.

Introducing our open source SLO Tracker - A simple tool to track SLOs and Error Budget

One of the tools we use internally at Squadcast for SLO and Error Budget tracking is now open-source. In keeping up with the SRE ideology of automating as many ops tasks as possible, we built this SLO Tracker. We made this open-source so that the SRE community can also use it too. Looking forward to get your feedback, suggestions and patches :)

Top 8 uses of cloud computing

The cloud is gaining widespread adoption. For many organizations, cloud computing has become an indispensable tool for communication and collaboration across distributed teams. Whether you are on Amazon Web Services (AWS), Google Cloud, or Azure. the cloud can reduce costs, increase flexibility, and optimize resources. If you have spent your career in buzzing server rooms full of cable nests, you may be wondering what all the fuss is about.

Podcast: Break Things on Purpose | Omar Marrero, Chaos and Performance Engineering Lead at Kessel Run

In this episode, we chat with Omar Marrero, Chaos and Performance Engineering Lead at Kessel Run, a company at the forefront of delivering “combat capability that can sense and respond to any conflict in any domain, anytime, anywhere.” To say that Omar and Kessel Run are at the forefront is an understatement.

Team Spotlight

The #LifeatTorq Team Spotlight is a Q&A series dedicated to the talented and generally kick-ass team that form the foundation of our growing company. Today we are spotlighting Ori Seri, an R&D team leader at Torq, based in our Tel Aviv office. Tell us a bit about your career path before Torq. Ori: I was an officer in an Israeli Defense Forces (IDF) Intelligence unit early on. Then I worked at a startup called Nuweba, where I began as an engineer, and later led an R&D team there.

Profiling newlib-nano's memcpy

Newlib is a very popular libc targeting embedded systems. It’s the libc that ships with the GNU Arm Embedded Toolchain published by ARM. This article takes a look at one of the commonly used functions provided by the Newlib C library: memcpy. We’ll examine the default nano implementation and the performance implications, comparing it against the faster non-default implementation. Like Interrupt? Subscribe to get our latest posts straight to your mailbox.

What's new in Calico Enterprise 3.9: Live troubleshooting and resource-efficient application-level observability

We are excited to announce Calico Enterprise 3.9, which provides faster and simpler live troubleshooting using Dynamic Packet Capture for organizations while meeting regulatory and compliance requirements to access the underlying data. The release makes application-level observability resource-efficient, less security intrusive, and easier to manage. It also includes pod-to-pod encryption with Microsoft AKS and AWS EKS with AWS CNI.