Operations | Monitoring | ITSM | DevOps | Cloud

Managing AI Models and Datasets with Harness Artifact Registry | AI/ML Artifact Management

Building AI applications often means juggling multiple models, scattered datasets, and version chaos across local systems. But what if you could bring it all together — securely and efficiently — in one place? In this walkthrough, Shibam Dhar, DevRel Engineer at Harness, demonstrates how Harness Artifact Registry makes it easy to manage and govern your AI/ML assets — from models and datasets to prompts and agents — with built-in support like Hugging Face and generic registry types.

Inside the architecture: How Upsun delivers 99.99% uptime for AI

For a CTO, "four nines" represents a commitment to keeping production revenue live with less than 0.01% of total downtime per year. As AI workloads move from pilot projects into core production services, the reliability requirements for infrastructure have shifted. AI agents, RAG pipelines, and automated LLM workflows depend on a consistent platform state.

Stop Vibe Coding Everything: The Case for Spec-Driven Dev

Spec-driven development with AI coding agents could change how you build software. In this GitKon 2025 talk, Erik Hanchett, Senior Developer Advocate at AWS, breaks down why AI coding assistants perform dramatically better when they start with structured specifications instead of raw prompts. If you've been vibe coding your way through complex features and wondering why your AI keeps going off the rails, this is the video for you.

[Webinar] Conquering the Complexity of Self-Hosted Apps with Agentic AI SRE

Most enterprise SaaS products, like Komodor’s Autonomous AI SRE Platform, require installing a remote agent on the customer’s infrastructure, which varies significantly from one organization to another, in terms of architecture, configurations, permissions, processes, and more. This “unmanaged” model creates major blind spots, making daily operations, observability, debugging, and incident response challenging. When failures occur, limited visibility and bespoke systems make root-cause analysis slow, incomplete, or impossible.

AI-Powered LMS: Personalization, Analytics & Automation for Corporate Training

Corporate training systems change operationally once AI is embedded into their learning logic. In LMS environments used for onboarding and workforce development, AI shifts training from scheduled delivery toward continuous adjustment based on employee performance and role context. This shift affects how companies assign onboarding programs, detect skill gaps, and maintain compliance readiness across departments.

AI performance reviews for your app with the Flare CLI

The Flare CLI connects to your Flare performance monitoring data and uses AI to turn it into actionable insights, right from your terminal. In this video, you'll see how a single command pulls your real performance data from Flare, then generates a full review: identifying slow endpoints, spotting error trends, and suggesting concrete fixes. Links.

Claude Code + OpenTelemetry: Per-Session Cost and Token Tracking

I was looking at our Claude Code spend in the Anthropic console the other day. Aggregate cost, aggregate tokens — no breakdown by developer, no breakdown by session. I knew my Hackathon team had been using it heavily on building out new features for the OpenTelemetry Distro Builder. But heavily how? I had no idea. Turns out Claude Code has been emitting OpenTelemetry signals the whole time. Per-session cost, token counts, every tool call it makes on your codebase.