Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

#046 - Simulating, Scheduling, and Saving: Optimizing Kubernetes with David Morrison (Applied Res...

In this episode, Itiel has an insightful conversation with Dr. David Morrison, a research scientist and founder specializing in Kubernetes scheduling and autoscaling. David shares his journey from operations research to leading distributed systems efforts at tech giants like Yelp and Airbnb. Learn about the transition from Apache Mesos to Kubernetes at Yelp, including the role of their open-source API layer, Pasta.

Kubernetes Costs: More Than Meets The Eye

As organizations expand their Kubernetes deployments and scale production workloads, effective cost management becomes an essential priority. The rapid innovation demanded from development teams often intersects with a shortage of advanced Kubernetes expertise, leading to resource inefficiencies and unnecessary expenses. This challenge is further amplified by the growing prevalence of AI/ML workloads and the intricate demands of GPU utilization.

How to Deploy Helm Charts on Kubernetes the Easy Way with Qovery

Deploying Helm charts on Kubernetes can be complex, especially when dealing with configuration overrides, security, and environment-specific setups. In this article, we show how Qovery simplifies Helm chart deployment through a seamless developer experience, robust security defaults, and powerful automation, without sacrificing flexibility.

Amazon SQS Metrics: Monitor, Debug, and Optimize Your Message Queues

Message queues quietly take care of a lot—buffering workloads, smoothing traffic spikes, and keeping services connected. But they don’t always get much attention until something feels off. Amazon SQS offers a solid set of metrics to help you understand how your queues are doing, whether you’re scaling well or nearing limits. This blog breaks down the key SQS metrics: where to find them, what they mean, and how to respond when things start to shift.

How to Configure Docker's Shared Memory Size (/dev/shm)

Your Node.js app runs fine on your machine. But inside Docker? You start getting weird crashes—ENOSPC: no space left on device. Chrome headless tests fail out of nowhere. PostgreSQL throws shared memory errors under load. The problem? It’s probably /dev/shm, the shared memory volume Docker sets up by default. Most containers get just 64MB of space here.

Navigating Shopware logs and slow pages in a real world scenario

A Shopware store goes from smooth to sluggish—pages take 10 seconds to load, even longer in some cases. What happened? In this post, we tell the true story of how one overlooked plugin setting nearly collapsed a storefront, and how it was resolved using native tools. If you’re shipping code in Shopware without clear performance observability, this is your wake-up call. Everything was working, until it wasn’t.

Enterprise Drupal: Why hosting all your apps on one platform matters

For many enterprises, Drupal has been the backbone of their web operations for years. It’s a battle-tested CMS that handles complex content needs with elegance. But business needs have evolved. Today, it’s rare for a company to rely only on Drupal. They are spinning up Python APIs, .NET backend services, Node.js apps, Java microservices — expanding their digital ecosystems around Drupal’s core.

SwiftPM, CocoaPods, and the Future of Enterprise Development for Apple Platforms

Swift is the default and preferred language for developing applications within the Apple ecosystem. The Swift Package Manager (SwiftPM) has become the de-facto dependency manager for Swift, enabling developers to share and reuse code effortlessly. While its elegance lies in its simplicity, there’s a common concern about integrating SwiftPM into robust, enterprise-grade development workflows. This is where JFrog Artifactory shines.

GPU Powerhouse: Scaling an AI Cloud in the Heart of Europe

The AI revolution needs more than models - it needs massive infrastructure. And Julien Gauthier is building it. In this episode of Uplink, Julien, CEO of Arkane Cloud, joins host Michael Reid to unpack how his company scaled from 3D rendering and gaming to delivering GPU cloud services for AI workloads across the globe. We explore how Arkane built a 1,000-GPU cluster in Paris (with capacity for 6,000), the rise of inference workloads in Europe, and the real-world engineering and business challenges of deploying high-density infrastructure - including cutting-edge liquid cooling handling 135kW per cabinet.

Puppet Infra Assistant: AI-Powered Natural Language Queries

Finding critical infrastructure insights shouldn't be a game of hide-and-seek. The new AI-powered Infra Assistant is a natural language interface that allows users of any skill level to chat with Puppet data and services for quick insights and reporting on infrastructure state. You don't need any Puppet experience to get started; it's safe to use in your infrastructure; and it's secured with explicit opt-in and robust role-based access control.