Operations | Monitoring | ITSM | DevOps | Cloud

Blog

It's upstream-first with Ocean for Kops

Many of Spot’s AWS customers are using Kubernetes Operations (kops) to self-manage their Kubernetes clusters. The tool significantly simplifies cluster set up, lifecycle management via instance groups, Kubernetes Day 2 operations and generates Terraform configurations, making it a popular tool for deploying production-grade k8s clusters.

Sumo Logic and ZeroFOX Join Forces to Improve Visibility and Protect your Public Attack Surface

Today’s organizations have the challenge of managing several different applications and software within their technology stack. The more public-facing platforms an organization utilizes, the greater their public attack surface risks. Without proper protection, they and their community can become an easy target for malicious actors.

How Playtech Fixed Metrics Over-Collection with Observability

According to Forbes, 2.5 quintillion bytes of data are created every day. Data volumes have grown exponentially in recent years due to the growth of the Internet of Things (IoT) and sensors. The majority of data collected has been collected in the last two years alone. For example, the U.S. generates over 2.5 million gigabytes of Internet data every minute, and over half of the world’s online traffic comes from mobile devices.

Automated Root Cause Analysis & Anomaly Detection in Concert

Everyday IT operators are trying to prevent outages of business-critical applications. When prevention is not possible, IT operators strive to reduce the mean time to repair (MTTR) as much as possible. Improving resolution time can be quite a challenge. But IT operators don't stand alone in this challenge. They can use smart solutions that support Automated Root Cause Analysis and Anomaly Detection.

sFlow vs NetFlow: What's the Difference?

In any given network, switches, routers, and firewalls may support different flow protocols. After all, there’s NetFlow, sFlow, IPFIX, and J-Flow, to name a few. With so many options, you may be wondering “Which flow protocol should I use?” It’s a common question, and it has a relatively simple answer: While some devices support multiple protocols, a device typically only supports one type of flow protocol, so you should use the protocol your device and collector supports.

Root Cause Changes: are they the "Elephant in the NOC?" Here's the CTO Perspective

Ask any IT Ops practitioner what the first question they ask is when joining an emergency bridge call, and you’ll get the same answer: “What changed?” Our customers report that changes in their IT environments cause 60% to 90% of the incidents they see. Yet for some reason enterprises still find it difficult to deal with changes and correlate them to the IT incidents they may have caused.

"Things get SREious": SRE from Home Recap

Without SRECon happening this year and the world turned upside down from COVID-19, we set out to hold a virtual event to bring SREs together to share their experiences of what has changed. Last week’s SRE from Home was exactly that. With 1900 registrants, 20 lively Slack channels, six illuminating and entertaining talks from a diverse range of experts in the field and our #askanSRE panel answering attendees’ questions with a candid generosity, it was an amazing, jam-packed day.

New Integration: Create Zoom incident bridges automatically

Incident response doesn’t only happen in Slack, so today we’re happy to announce our integration with Zoom to create incident bridges automatically. Using the power of FireHydrant Runbooks, a Zoom meeting can be added with fully customizable titles and agendas based on your incident details. Let’s dive into how it works.

Bees Working Together: How ecobee's Engineers Adopted Honeycomb

At ecobee, adopting Honeycomb started as a grassroots effort. Engineers signed up for the free tier and quickly started sharing insights with teammates. When it came time for ecobee to make the “build vs. buy” decision for observability tooling, sticking with Honeycomb was the clear choice. Now on the enterprise plan, ecobee’s engineering squads rely on features like SLOs to support the business’s need to map engineering effort to user impact.

Mitigate risk with rolling deployments

Deploying a new feature to production is a momentous occasion. It's important to ensure that everything goes properly at this stage, as deployments tend to be error-prone when not handled correctly. To examine why this is and how you can avoid it, let's take a look at the different types of deployments available and where some of them fall short.