Operations | Monitoring | ITSM | DevOps | Cloud

A faster way to pinpoint performance bottlenecks: Using Profiles Drilldown with Grafana Cloud Knowledge Graph

When you identify CPU or memory spikes in your services, it’s critical to understand why they’re happening. But switching between tools or crafting complex queries can slow you down when trying to pinpoint a root cause. This is why we’re excited to share that Profiles Drilldown, an application that lets you easily explore profiling data through an intuitive, point-and-click interface (no queries required), is now integrated with Grafana Cloud Knowledge Graph.

Kubernetes GPU Resource Optimization: Top 10 Solutions in 2026

TL;DR: Most Kubernetes clusters waste GPU compute through over-provisioned pod requests and suboptimal node selection. This guide covers 10 tools that fix this across four layers: resource lifecycle (Kubex, ScaleOps, Cast.ai), hardware partitioning (GPU Operator, MIG, time-slicing), inference serving (Triton, KServe), and observability (DCGM Exporter, NFD). For most teams, the biggest gains are at the resource lifecycle layer: no model changes required.

AI Factories Will Be Won on Efficiency: Why the Kubex + Rafay Partnership Matters

The early era for AI was defined by experimentation, standing up isolated environments, and finding the first practical use cases. Today, the conversation is different. Enterprises are no longer asking whether AI matters. They are asking how to scale it sustainably, securely, and economically. That shift is giving rise to the AI factory: a repeatable, governed, production-ready environment where data scientists, platform teams, and application teams can build, train, deploy, and operate AI at scale.

From Stack Trace to Probable Cause: AI Root Cause Analysis Is Here

You know the drill. An error fires, you get the stack trace, and then you spend the next 45 minutes tracing it backward through four services, two config files, and a deploy that happened three hours ago. You eventually find the root cause, but the path to get there was manual, slow, and entirely dependent on how well you already knew the codebase. We built AI-powered root cause analysis (RCA) for that kind of slog.

Introducing Code Repositories in Kosli

Kosli gives your organization a complete picture of software delivery - every build, scan, deployment, and compliance event tracked. Until now that picture was most useful to the people managing governance. However, developers shipping code had to ask someone else what versions of their code were running, how long it was taking to get to production, or what their deployment frequency was. Repositories change that.

Hosted vs. self-hosted control planes

One of the first decisions teams face when adopting Konstruct is whether to run the control plane themselves or have it managed for them. While this can look like a simple deployment choice, it is really a question of operational responsibility, control, and how your platform needs to evolve over time. Both models exist to solve the same underlying problem: providing a consistent, GitOps-driven platform across teams and environments.

(AusBiz) JFrog teams up with Nvidia to manage AI agents

AI agents are making real-time decisions inside enterprises right now; pulling code, accessing tools, executing tasks. But most businesses have zero visibility into what those agents are actually using. In this interview on @ausbizTV, Sunny Rao, SVP APAC at JFrog, explains why the governance gap is one of the biggest risks facing enterprises today; and how JFrog and NVIDIA are building the trust layer to fix it.

The Data Problem Hiding Behind Every Agentforce Deployment Hiccup

AI without context is a hallucination engine waiting to deliver your customers the wrong answer with complete confidence. Every inaccurate response an autonomous agent produces traces back to data that was incomplete or trapped inside a silo. This dependency elevates the Data Cloud (now Data 360)–Agentforce relationship from a standard integration to the most critical architectural investment in your ecosystem.

How to Monitor a Shopify Store with Playwright and Checkly

This is a guest post by Vince Graics, Staff QA Engineer at World of Books. If you're running a Shopify storefront and want reliable synthetic monitoring, you'll hit a wall. Shopify's bot detection doesn't care that your headless browser is friendly; it sees datacenter IPs and acts accordingly. Cart API calls get hit with 429 rate limits, Cloudflare challenge pages pop up mid-check, and you're left wondering whether the bug is in your code or in the platform fighting you.