Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Every engineering org is taking an AI readiness test right now

Tamar Bercovici has been at Box for 15 years. She leads the core platform, the backend layer that storage, search, metadata, and AI capabilities all run on. When her systems go down, Box goes down. On a recent episode of the Braintrust podcast, she said the debate around AI-generated code tends to focus on whether the models will write clean code and/or introduce bugs. Tamar's focus is somewhere else entirely.

Building a single pane of glass for enterprise Kubernetes fleets

A Kubernetes single pane of glass is a centralized management layer that unifies visibility, access control, cost allocation, and policy enforcement across § cluster in an enterprise fleet for all cloud providers. It replaces the fragmented practice of switching between AWS, GCP, and Azure consoles to govern infrastructure, giving platform teams a single source of truth for multi-cloud Kubernetes operations.

Load Testing Vs Stress Testing | Resilience Testing | Harness

Load testing and stress testing are two important parts of performance testing, but they serve very different purposes. Load testing checks how your application behaves when many users access it at the same time under normal or expected conditions. It helps you understand if your system can handle real-world traffic smoothly without slowing down.

Peak traffic without the panic: auto-scaling infrastructure for ecommerce flash sales

Key takeaway: Upsun replaces manual, high-stress peak traffic prep with automatic scaling, keeping your e-commerce site fast and available during flash sales while you only pay for the resources you consume. For every e-commerce team, an outage means lost revenue, failed checkouts, and a flood of support tickets. For most stores, this gets worse during peak events like Black Friday and flash sales.

RalphCI: The Self-Healing AI Coding Loop That Automatically Fixes CI Failures

RalphCI is an open-source, CI-enabled agentic coding loop built by the Loop Lab at CircleCI. You write a spec, and the agent breaks it down into tasks, builds your application step by step, commits to GitHub, and runs your full CI pipeline on every iteration. If anything fails—linting, tests, security scans, missing files—a CI Doctor sub-agent detects the failure, reads the stack trace, and fixes it automatically. In this video, Ryan Hamilton demos RalphCI by building a classic Snake game end-to-end with zero manual coding.

Testing AI with AI: Why Deterministic Frameworks Fail at Chatbot Validation and What Actually Works | Harness Blog

Chatbots are becoming ubiquitous. Customer support, internal knowledge bases, developer tools, healthcare portals - if it has a user interface, someone is shipping a conversational AI layer on top of it. And the pace is only accelerating. But here's the problem nobody wants to talk about: we still don’t have a reliable way to test these chatbots at scale. Not because testing is new to us. We've been testing software for decades.

AI Didn't Change the Game, It Just Exposed Your Bottlenecks w/ Ganesh Datta (CTO, Cortex)

Every engineering org says they want to improve reliability — but most can't even agree on what "good" looks like. Ganesh Datta, Co-Founder and CTO of Cortex, has spent the better part of a decade helping companies confront that gap.