Operations | Monitoring | ITSM | DevOps | Cloud

Monitoring Kubernetes in Production

Monitoring Kubernetes, both the infrastructure platform and the running workloads, is on everyone’s checklist as we evolve beyond day zero and into production. Traditional monitoring tools and processes aren’t adequate, as they do not provide visibility into dynamic container environments. Given this, what tools can you use to monitor Kubernetes and your applications?

Best practices for alerting on Kubernetes

A step by step cookbook on best practices for alerting on Kubernetes platform and orchestration, including PromQL alerts examples. If you are new to Kubernetes and monitoring, we recommend that you first read Monitoring Kubernetes in production, in which we cover monitoring fundamentals and open-source tools. Interested in Kubernetes monitoring?

Create Reproducible Security in Kubernetes with Helm 3 and Helm Charts

With the growing popularity of containerized applications, organizations and startups at all levels need to manage their Kubernetes deployments more safely at scale. Today, there is an expanding list of tools and services that can help do this. One of these services is the package manager known as Helm.

Chaos Engineering for a More Secure Kubernetes

Netflix, Amazon, Google, Facebook, and a host of other companies have adopted chaos engineering, which encourages designing systems to proactively ward off potential issues through testing and the anticipation of failure. When it comes to container orchestration tools like Kubernetes, chaos engineering is a vital tactic for enhancing security.

How to Create a Python Stack

All programming languages provide efficient data structures that allow you to logically or mathematically organize and model your data. Most of us are familiar with simpler data structures like lists (or arrays) and dictionaries (or associative arrays), but these basic array-based data structures act more as generic solutions to your programming needs and aren’t really optimized for performance on custom implementations. There’s much more than programming languages bring to the table.

Why SUSE Acquired Rancher Labs

My favorite ice cream store is just off Richmond Green, close to where I live in West London. On sunny days, locals queue around the block to buy their fantastic gelatos and sorbets. Every one of their customers knows that they could easily nip into the supermarket around the corner to buy hermetically sealed chocolate ice cream, but they queue anyway. Why?

Testing the reliability of your fulfillment center

Fulfillment pipelines for order management in e-commerce have a lot of intricate moving parts that depend on one another. Sales orders, fulfillment, negotiation, shipment, and receipt are closely interconnected but require different actions while depending on one another closely. You also need messaging around order statuses, conditions, actions, rules, and inventory, just to name a few of the important parts of these complex systems.

4 Ways to Improve Your Change Management Practices

Okay, good. You have a change management practice in place. You know how to define it, its benefits, how to get the process started, and how to measure its success. You also know it makes for greater success for business initiatives, it prepares the organization for the future, and drives consistency. But how can your current change management practice be improved upon? No matter if you’re a change requestor or change manager, the improvement of your current practice depends on these four actions.

SRE Report 2020 - Balancing 'Dev' and 'Ops'

We recently released Catchpoint’s SRE Report 2020 that analyzed results from the SRE survey we conducted early this year along with a recent addendum survey. The report offers a detailed look at the current state of SRE and how the shift to an all-remote work environment has impacted SRE teams. In this blog, we take a deeper look at one of the report highlights – ‘Heavy Ops Workload Comes at a Cost’.

Postmortems and More With J. Paul Reed

PagerDuty sat down with J. Paul Reed, a Senior Applied Resilience Engineer at Netflix, for an Ask Me Anything (AMA) to discuss best practices around postmortems. Reed is a prominent speaker and advocate of DevOps and operations complexity, and has over 15 years of experience in release engineering. His background in tech, along with his previous work at companies like Mozilla and VMware, give him a unique perspective into the inner workings of innovative organizations.