Operations | Monitoring | ITSM | DevOps | Cloud

Komodor Provides Autonomous AI SRE Troubleshooting for ClusterAPI

Cluster API (CAPI) is transforming how organizations deploy and manage fleets of Kubernetes clusters by introducing declarative, Kubernetes-style APIs to automate cluster provisioning and lifecycle management. While CAPI excels at creating consistent and repeatable cluster deployments across different infrastructure providers, operating it at a massive scale introduces unique day-to-day challenges.

Uncertainty and Change Are Everywhere in Software Development

If you’re like everyone else who works in software development, it’s a good bet that almost every single thing that you thought you knew about your business and engineering has changed as a result of the advent of modern LLMs. How should you respond to these changes? How should you change how you and your team develop software?

Why Connected Platforms Will Power the Next Generation of AI in Engineering | Harness Blog

AI is quickly becoming part of the engineering workflow. Teams are experimenting with assistants and agents that can answer questions, investigate incidents, suggest changes, and automate parts of software delivery. But there is a problem hiding underneath all of that momentum. Most engineering environments were not built to give AI the context it needs. In many organizations, the service catalog lives in one place. Deployment data lives in another. Incident history sits in a separate system.

Testing AI with AI: Why Deterministic Frameworks Fail at Chatbot Validation and What Actually Works | Harness Blog

Chatbots are becoming ubiquitous. Customer support, internal knowledge bases, developer tools, healthcare portals - if it has a user interface, someone is shipping a conversational AI layer on top of it. And the pace is only accelerating. But here's the problem nobody wants to talk about: we still don’t have a reliable way to test these chatbots at scale. Not because testing is new to us. We've been testing software for decades.

The Path to AI-Ready Operations Begins with Truth

Enterprises expect AI to improve how they operate, yet many underestimate the level of clarity required for intelligent systems to perform reliably. AI-assisted operations demand input signals that are accurate, consistent, and interpretable. They require a unified understanding of how services behave, how disruptions originate, and how decisions influence downstream outcomes. This level of coherence is impossible without operational truth.

Peak traffic without the panic: auto-scaling infrastructure for ecommerce flash sales

Key takeaway: Upsun replaces manual, high-stress peak traffic prep with automatic scaling, keeping your e-commerce site fast and available during flash sales while you only pay for the resources you consume. For every e-commerce team, an outage means lost revenue, failed checkouts, and a flood of support tickets. For most stores, this gets worse during peak events like Black Friday and flash sales.

Building a single pane of glass for enterprise Kubernetes fleets

A Kubernetes single pane of glass is a centralized management layer that unifies visibility, access control, cost allocation, and policy enforcement across § cluster in an enterprise fleet for all cloud providers. It replaces the fragmented practice of switching between AWS, GCP, and Azure consoles to govern infrastructure, giving platform teams a single source of truth for multi-cloud Kubernetes operations.

Sample AI traces at 100% without sampling everything

A little while ago, when agents were telling me “You’re absolutely right!”, I was building webvitals.com. You put in a URL, it kicks off an API request to a Next.js API route that invokes an agent with a few tools to scan it and provide AI generated suggestions to improve your… you guessed it… Web Vitals. Do we even care about these anymore?

Spending More, Seeing Less: How Indexing Limits Capital Markets Visibility

Capital markets systems don’t scale linearly. A macro event, an earnings release, a sudden liquidity shift, and telemetry volume doubles in seconds. In most observability platforms today, that spike means one thing: every byte gets written to a high-cost index before a single query can touch it. There’s no middle ground. You pay full indexing cost for the compliance log that no one queries for six months, the same way you pay for the execution trace you need right now.

Every engineering org is taking an AI readiness test right now

Tamar Bercovici has been at Box for 15 years. She leads the core platform, the backend layer that storage, search, metadata, and AI capabilities all run on. When her systems go down, Box goes down. On a recent episode of the Braintrust podcast, she said the debate around AI-generated code tends to focus on whether the models will write clean code and/or introduce bugs. Tamar's focus is somewhere else entirely.