Operations | Monitoring | ITSM | DevOps | Cloud

The Three Pillars Were Built for Humans

It was 2am and I was paying for the privilege. Something was on fire in production, and I’d done the modern thing: I pointed an AI agent at it. It ingested the dashboards. It read the logs. It walked the traces. Then it handed me back a beautifully formatted paragraph that said, in effect, “latency is elevated on the checkout path.” I knew that. The page told me that.

How IT Teams Can Cut AI Token Costs with Deterministic Workflows

In our previous post on AI tokenomics, we looked at the rising cost challenge behind token-based AI systems. When enterprise IT teams rely on AI to reason through the same repeatable work over and over again, the costs to resolve those tasks may increase to an unreasonable level. That is where a deterministic IT automation platform becomes essential. A deterministic workflow follows predefined logic, meaning that given the same inputs and conditions, it produces the same expected result.

Is Your Network Holding Back Your Cloud Strategy?

Every layer of the modern network stack moves at cloud speed. If your connectivity doesn't, your entire strategy can stall. Co-authored by Fabio D’Avino This blog includes insights from Fabio D’Avino, a specialist in Network as a Service (NaaS) with more than seven years of experience researching, designing, and building global network services. Fabio’s work explores how organizations can modernize connectivity as cloud, hybrid, and AI-ready infrastructure strategies evolve.

AI ROI Dispatches: How a non-engineer solved a $300K problem for under $1K

A year ago, the sentence “I just deployed an app on GitHub” wouldn’t have made sense coming from me. I’m the VP of People at CloudZero; code deployments and I were not close friends. That’s changed. In this AI era, non-engineers are building, and I think that’s a genuinely good thing. But only if it’s tied to something that matters.

Shipped: LiteLLM is probably under-counting your Claude spend

If you run Claude through LiteLLM, some of that spend is probably going uncounted – and you can’t see it, precisely because the data isn’t there. Routing through a gateway is messier than it looks: LiteLLM alone can carry Claude several ways – the OpenAI-compatible endpoint, and the Anthropic pass-through proxy that the native SDK and Claude Code use – and each path describes the same call differently.

Upsun Dispatch is available in prerelease

When we introduced Upsun Dispatch last week, we said we were building the platform layer for everything around the code. Today, you can apply to join as a founding design partner. Starting July 1, 2026, a number of engineering organizations will join us in prerelease. This is a selective, high-touch collaboration with teams who want to help shape what comes next. If you missed the introduction, you can catch up on Upsun Dispatch here.

From a $28,000 AI Bill to $0.60 Per Ticket

Engineering teams are burning through AI budgets with nothing to show for it — $100M across 10,000 engineers and no cost per run, no cost per outcome, just a number that keeps climbing. When it runs dry, your infrastructure upgrade gets cut. Harness ties every AI token to the outcome it created: cost per run, cost per resolved ticket, and anomaly detection before the invoice hits. One customer went from a $28,000 black box bill to $0.60 per ticket.

Cloud freedom with AI built in

Most cloud providers give you the hardware and leave you to figure out the rest. Civo AI is different. Chief Innovation Officer Josh Mesout explains how Civo thinks strategically about AI adoption, guiding organisations through the full lifecycle from planning and infrastructure through to running and scaling workloads, powered by best-in-class NVIDIA GPUs.

Autonomous Worker Agents: AI Agents in Your Pipelines | Harness Blog

AI is writing more of the code. Software delivery, the work between writing code and running it in production, is where most of the day still goes. Building, testing, scanning, deploying, remediating, and operating still require the same, if not more, effort as before AI. Today, we're introducing Autonomous Worker Agents for software delivery: the platform for enterprises to build and safely run AI agents that handle the work between writing code and shipping it to production.

Unlocking efficiency with Merge Queues in Bitbucket Cloud now GA

Earlier this year, we launched Merge Queues in open beta to help teams automate, sequence, and validate pull request merges. During the beta period, we incorporated feedback from hundreds of teams to improve reliability and simplify configuration. Today, we are excited to announce that Merge Queues is generally available for Standard and Premium plans on Bitbucket Cloud.

How to Choose a Cloud Migration Partner in New Jersey: What IT Leaders Need to Verify

A failed cloud migration does not announce itself in advance. Data loss, extended downtime, misconfigured security controls, and compliance gaps surface during or after the move, when reversing course is expensive and the business is already affected. For New Jersey organisations in financial services, healthcare, legal, and manufacturing, the stakes are high enough that choosing the right migration partner is at least as important as choosing the right cloud platform. The hard part is separating providers who can execute a migration cleanly from those who can describe one convincingly.

Which Bugs AI Agents Fix Better With Traffic

In the first experiment, I wanted a baseline: if an AI coding agent gets the same production signal a human would get, can it fix bugs in a codebase it has never seen? Yes, but only when I gave it better context. With only an alert, the agent passed 51% of the runtime tests. When I added captured traffic, the actual request and response for the failing call, it climbed to 77%. This post is the second pass.

Why your team keeps waiting for staging (and what to do about it)

The staging bottleneck: why your framework needs ephemeral preview environments There's a specific kind of Friday afternoon that frontend and backend developers both recognize. A feature is ready to test. Staging is occupied. Someone else pushed a half-finished migration to the shared database last Tuesday and it's been "almost fixed" ever since. You either wait or you merge blind and hope. Most teams treat this as a scheduling problem. It isn't. It's an architecture problem.

Building a resilient workspace with an integrated security framework

Since 2020, the modern workspace has fundamentally changed, where employees now operate across a mix of office, hybrid and remote locations. Critical systems are now distributed between data centres and public cloud platforms, and most corporate data lives in the cloud. This shift has expanded the attack surface for many businesses.

CI Can't Keep Up With AI | Blacksmith CEO & Co-Founder Aditya Jayaprakash

AI coding agents are writing software faster than ever. But what happens when the systems responsible for testing and validating that code can't keep up? In this episode of Uplink, Michael Reid sits down with Aditya "JP" Jayaprakash, Co-founder & CEO of Blacksmith, to explore why continuous integration (CI) has become one of the biggest bottlenecks in modern software development.

Mission-Critical Data Orchestration with Agentic AI | Automated SFTP, DataOps & Workflow Automation

How do you automate mission-critical data pipelines without risking downtime? In this Resolve Reels episode, see how Resolve's Agentic Automation Platform enables DataOps teams to build resilient, end-to-end workflows that automate secure SFTP transfers, preflight system validation, database operations, exception handling, intelligent retries, and self-healing remediation.

Never Touch Another IT Ticket Again | AI That Resolves IT Issues Automatically

What if your IT team never had to touch another password reset, VPN issue, or software request? This hilarious commercial imagines a world where IT tickets resolve themselves. See how agentic AI automates password resets, access requests, VPN troubleshooting, software installs, and more, so your service desk can focus on higher-value work instead of repetitive tickets. Resolve's AI-powered platform helps enterprises reduce ticket volume, improve first contact resolution, lower ITSM costs, and move toward Zero Ticket IT with autonomous resolution.

What if AI could resolve your IT tickets before they're ever created?

Watch how agentic AI automates password resets, VPN troubleshooting, access requests, software installations, and other repetitive IT service desk tasks without human intervention. Resolve helps enterprises reduce ticket volume, lower ITSM costs, improve employee experience, and move toward Zero Ticket IT. If you're researching AI for IT support, ServiceNow automation, ITSM automation, autonomous IT operations, or AI service desk solutions, this Short shows what's possible.

Quantum is the least interesting part of quantum certificates

On June 3, Let’s Encrypt announced that the post-quantum web is going to run on something called Merkle Tree Certificates. The internet did what it does and turned this into a doomsday Q-Day countdown. The quantum computers are coming, your certificates are about to break, panic! Unlike every other security vendor, I’m not worried about quantum computers. But the announcement is still worth your attention. Just not for the reason you’ve been told.

How to Prevent SEO Issues During Website Migrations

Website migrations are often necessary as businesses grow, modernize their platforms, or rebrand. Whether you're changing domains, redesigning your website, switching content management systems, or moving to a new hosting environment, a migration can improve performance and user experience. However, without proper planning, it can also lead to a significant loss in search engine visibility, organic traffic, and revenue.

What is the sovereignty tax, and is your organization paying it?

Most organizations know cloud costs are rising. Fewer realize that some of what they're paying isn't for infrastructure at all; it's a penalty for not being in control of it. That penalty has a name: Sovereignty Tax. It isn't a line item on your invoice. It won't appear in your cloud dashboard. But it's accumulating quietly, in egress fees, outage exposure, audit blind spots, and the creeping realization that leaving your current provider would be harder, and more expensive, than you ever anticipated.

SSIS Data Flow Components 5.0: New Features, API Updates, and Expanded Platform Support

We are thrilled to announce the release of SSIS Data Flow Components version 5.0. This release includes updates across database connectors, cloud services, and APIs. It adds new objects, improves data type support, introduces new authentication options, and expands API coverage for more than 20 platforms.

Language AI to physical AI explained

What is physical AI? Physical AI embeds machine learning directly into hardware, enabling algorithms to interact, move, and perform autonomous tasks in the physical world. Traditionally, robots relied on precise, hardcoded coordinates; if an object shifted by a single millimeter, the entire system failed. Today, robotics is moving past rigid automation toward truly adaptive architecture. Neural networks help machines process raw sensor data in real time. Consequently, machines can dynamically reason through the unpredictable physical world.

Sanctioned Isn't Secured: The AI Audit Logs Your SIEM Never Sees

Your organization has approved AI platforms for development, data science, and productivity. Procurement signed off. Legal reviewed the terms. Employees are using them. The tools are sanctioned. What isn’t sanctioned is invisibility. The administrative layer of every AI platform in your environment — OpenAI, Amazon Bedrock, Google Gemini, Cursor, Databricks, Glean and others — generates security-relevant events that your SIEM has never seen.

Building vs. Buying your platform: The honest framework nobody discusses

Most organizations get the build versus buy decision wrong in the same way. They underestimate the cost of building while overestimating the cost of buying. In the recent Konstruct monthly webinar with M R Rishi (Platform Engineer at Civo), we explored the discussion surrounding whether you should build or buy your platform. If you want to watch the full discussion, watch the recording here.

Challenges designers face in open source (and how to fix them)

Open source software (OSS) is a cornerstone of modern technology. According to the Linux Foundation, it powers up to 90% of software tools used today. Unlike proprietary software, OSS is developed collaboratively, meaning its code is available for anyone to use, change, and distribute. Because OSS projects have historically been driven by developers, they tend to be highly flexible and functional, but they can lack critical usability considerations.

The AI vendors just started watching the meter. CFOs need to watch the return.

On June 18, OpenAI gave ChatGPT Enterprise admins new credit usage analytics and spend controls. It’s a single view of credit consumption broken down by user, product, and model, default workspace budgets, per-group limits, and a Cost API for pulling the data into their own systems. Two days earlier, Microsoft shipped Copilot Cowork with spending limits, budget allocation, usage alerts, and user-level caps. This is a step in the right direction.

Customer lifetime value (CLV): formula, calculation, and how to improve it

Customer lifetime value (CLV) is the total revenue a business expects from a single customer over the entire relationship, minus the costs of serving them. The standard SaaS CLV formula: Average Revenue Per Account x Gross Margin % / Monthly Churn Rate. For a $500/month customer with 75% gross margin and 5% churn: CLV = $7,500. That number can swing materially once AI spend per customer is built into gross margin, something many SaaS companies still don't do.

Cortex Scorecards + GitHub Rule Sets: Branch Protection at Scale

Stop guessing whether your repos meet your branch policies. Start knowing. In this Feature Friday, Senior Engineering Manager Gabriel walks through Cortex's new native support for GitHub branch rule sets and how to use them in scorecards to enforce consistent policies across all your repos. What you'll see: Questions? Reach out to your CSM or drop a comment below.

High Cardinality in ClickHouse at Scale: What Actually Breaks

ClickHouse swallows high-cardinality telemetry at ingest, then breaks at query time weeks later. Here is what fails, and how we keep it fast in production. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Creating an agentic feedback loop with reliability guardrails

Reliability guardrails help make sure that your applications stay reliable without slowing down. In an earlier blog, we went into why agentic AI development needs reliability guardrails. It went over how the increased speed of AI development demands automated guardrails to verify resilience and what kinds of tests these guardrails should cover. But that’s only the beginning. By themselves, guardrails act as a gate to ensure resilience mechanisms hold under rapid changes.

The secret behind Carnegie's fortune and the lesson for the AI era

Point A: 1835. Andrew Carnegie is born in a weaver’s cottage in Dunfermline, Scotland. The cottage has one main room, which the Carnegies share with another family. Point B: 1901. Andrew Carnegie becomes the richest man in the world when Carnegie Steel Company wins the Iron vs. Steel industrialists’ war, and he sells the company to J.P. Morgan for the modern equivalent of $450 billion.

Azure FinOps with AI: What's New in Turbo360 v5.2

Turbo360 v5.2 is the biggest AI update we've shipped. Every module now has AI built in - not just to surface data, but to explain it, guide you through it, and help non-experts take action without needing to call in a specialist. In this video, Mike Stephenson walks through every new feature in v5.2, from AI agents that explain cost drivers and rightsizing recommendations, to a brand new Savings Tracker that gives you a better way to prove FinOps impact to management.

Turbo360 for System Integrators: Grow Your Azure Practice

If you deliver Azure integration solutions for clients, this video is for you. Fragmented tooling, unpredictable bills, and support incidents that eat into your consultancy time — these are the problems that limit how fast SI partners can scale. Turbo360 helps you solve all three, and turn each one into a business growth opportunity. In this video, Turbo360 CTO Mike Stephenson (Microsoft MVP) walks through how system integrator partners are using Turbo360 to deliver better outcomes for clients, reduce support overhead, and build managed service revenue alongside their integration practice.

Agentic Pipelines now supports OpenAI Codex

Bring your Codex agent into Bitbucket Pipelines. A few weeks ago, we announced support for Claude agents in Bitbucket Pipelines. Today, we’re adding OpenAI Codex as a supported agent. If your team is already using Codex on the desktop, you can now move that same workflow into your pipeline — triggered by a merge, a schedule, a failing build, or a pull request comment.

How AI is changing platform engineering

AI is changing software development fast. But what does that actually mean for platform engineering teams? In this conversation, Civo's John Dietz and M R Rishi dig into what they're seeing on the ground, the 10x effect of AI on app count, what it means for platform team workloads, the debugging skills that are quietly being lost, and whether Kubernetes itself might eventually become just another abstraction.

How Kubernetes Operators May Conflict With Resource Optimization (And How to Avoid It)

A Kubernetes Operator is a method of packaging, deploying, and managing a Kubernetes application. It extends the native Kubernetes API by combining custom resources (CRDs) with a dedicated controller: a custom control loop that continuously watches the state of those resources. The primary purpose of an operator is to automate complex, stateful applications (like databases, message queues, or monitoring suites) that require human operational knowledge to maintain.

Cortex catalog data now flows into Rootly

Incident response is a context problem. The first minutes of any incident are spent reconstructing what the affected service is, what it depends on, and who owns it. That reconstruction happens during the worst possible window. The Cortex catalog already holds this data: services, teams, domains, and the relationships between them, maintained by the engineers who run those systems.

The debugging crisis nobody's talking about: AI, abstraction, and the skills gap

Here's a scenario that's playing out in engineering teams across the industry right now. A developer uses AI to rapidly prototype a microservice. The code works. They deploy it to production. Six months later, something breaks. The system is under load, a database connection pools, and the service starts failing in subtle ways. The engineer pulls up the code, but here's the problem, they didn't write it. An AI assistant did. They don't understand the flow deeply. They don't know where to look first.

Replacing Your Legacy Monitoring Platform? Start with a Plan.

Whether you're using SolarWinds, PRTG, Datadog, or another long-standing monitoring solution, chances are your environment has evolved significantly since the platform was first deployed. New applications have been added. Infrastructure has expanded into cloud environments. Teams have developed custom dashboards, reports, alerts, and workflows. Over time, monitoring becomes deeply woven into daily operations. That's why many organizations continue using tools that no longer meet their needs.

Why you should use Language Server Protocol (LSP) with Claude Code

Agentic coding tools like Claude Code can write, refactor, and debug across an entire codebase, but by default they read code as plain text, the way grep does. The Language Server Protocol (LSP) changes that: it’s the same code-intelligence layer an IDE uses, and wiring it into an agent lets it read code by meaning instead of by string match. The bigger the codebase, the more a wrong guess about a symbol costs, and the more that structural view pays off.

CloudZero Dimension Studio: A drag-and-drop UI at the foundation of AI ROI

The core of ROI is visibility. If you can clearly see … 1. What it costs to produce the thing you make, and 2. How much money it makes you … then calculating ROI is easy. But with AI, as with the cloud before it, getting that visibility is extremely challenging. Why? Because the cost data associated with each is inherently chaotic.

New in Kubex: KAI Scheduler Integration for Shared GPU Inference

Today, we’re launching Kubex support for the KAI Scheduler and automated GPU sharing for inference workloads. As AI inference moves into production, platform teams are being asked to serve more models, support more teams, and control GPU costs at the same time. But many inference workloads do not need an entire GPU all the time. When teams reserve full GPUs or oversized GPU fractions to stay safe, expensive capacity can sit idle across the cluster.

Native Xet Protocol Support in JFrog Artifactory: How Enterprise Model Management Actually Works

Machine learning models are not like other software artifacts. A single fine-tuned LLM can weigh 70 GB. A model family may share 95% of its weights across dozens of variants. When hundreds of developers, training jobs, and GPU clusters all need the same model at the same time, the infrastructure underneath needs to be built for it.

Introducing Package triggers in Bitbucket Pipelines

In November 2025, we introduced new triggers and workflows to Bitbucket Pipelines to help teams manage and scale complex CI/CD workflows. We later extended that foundation with additional event-based triggers for pipeline, deployment, and pull request events. We’re now extending that model with a new package-artifact-created trigger.

Trace packages back to their source pipeline

When we introduced native Pipelines authentication for Bitbucket Packages, we made it easier to publish artifacts from CI/CD without relying on personal credentials. Now we’re extending that integration further: package artifacts published through the Pipelines integration can display a Source Pipeline, making it easy to trace an artifact back to the pipeline run that created it.

Bitbucket Packages adds PyPI and NuGet support

If your team bIf your team builds with Python or.NET, your packages have likely lived separately from your code, stored in a registry with distinct login, permissions, and billing. Starting today, they don’t have to. Bitbucket Packages now supports the Python Package Index (PyPI) and NuGet, integrating Python and.NET package management into the platform where your team writes code, reviews pull requests, and runs pipelines.

Europe's Heatwave Is a Real-Time Stress Test for Data Center Infrastructure

Europe is in the middle of one of its most severe heat events on record, and the effects go well beyond public health. As temperatures across France, Spain, the UK, and Germany push into record territory, the strain on power grids, cooling systems, and physical infrastructure is becoming impossible to ignore. For data center operators, maintaining uptime in this environment demands the real-time visibility that modern Data Center Infrastructure Management (DCIM) software can provide.

Network Monitoring, the Netdata Way: Topology, NetFlow, SNMP, and Traps

Interface counters tell you a port is busy. Bytes in, bytes out, errors, drops. That’s enough to know a link is saturated, but not enough to know which conversations are saturating it, which devices are involved, or how a problem propagates across your network. For that you’ve traditionally needed dedicated network performance monitoring tools, usually expensive, usually a separate console from the rest of your monitoring.

How to migrate feature flags without breaking production

Feature flag migrations have a reputation problem. Ask anybody who’s been through one before and you’ll hear the stories, usually from someone still a little frustrated about a bad cutover, with a postmortem or two to show for it. The reputation is mostly undeserved. While the risks are real, they’re well understood and easily controlled. Getting a migration right doesn’t require a big coordinated effort.

Stop Treating Coding Agent Plugins Like Settings: Introducing Agent Plugins Repositories

Your developers install agent plugins every day: pulling from unmanaged GitHub repos, copying Cursor commands out of Slack, pointing Codex at a personal Git fork. Each of those is a new, uncontrolled distribution channel inside your software development lifecycle, and your platform team has zero visibility into any of it. A plugin is not a preference file. It is executable software, and right now it’s arriving on developer machines with no versioning, no provenance, and no audit trail.

Stop Token Maxing The Future of Al Budget Management

The era of token maxing is over. When Claude Fable 5 launched last week at $10/$50 per million tokens - double the price of Opus 4.8 - it was a clear reminder that the most powerful model isn't always the right model. Not every task needs the Ferrari. The fastest way to burn your Al budget is sending every request to the most expensive model by default. The real question for the next phase of Al cost management isn't "can this model do the job?" — it's "is it the right model for the job?".

Canonical announces live kernel patching for Arm64

Canonical Livepatch now officially supports Arm64, further expanding its security patching automation capabilities. For the first time, Ubuntu on an Arm64 machine can apply critical kernel updates, without service interruption or rebooting. Starting with Ubuntu Core 26 for Arm64, and for Ubuntu Core 20 and onwards for AMD64 machines, a wider range of devices and cloud virtual machines can achieve timely vulnerability remediation through Canonical Livepatch.

Inside the Buyer's Decision: Governance, Trust, and Production-Ready Agentic AI

Why do so many AI pilots succeed in testing but fail to reach production? In this webinar, Resolve and IT leaders from RisePoint explore one of the biggest challenges facing enterprise AI adoption today: trust. While organizations are investing heavily in AI agents and automation, many initiatives stall before deployment due to governance concerns, compliance requirements, risk management, and lack of operational visibility.

What is an AI software factory?

Ask a software engineer what they do and the answer, for years, has been some version of "I write code." That assumption is unwinding fast. AI agents can now write code, review pull requests, run tests, and ship to production, and they're taking on a fast-growing share of that work. As agents absorb more of the execution, the human role shifts.

The New Software Creator: Why AI Changes the Governance Problem, Not Just the Speed Problem

The conversation about AI and software development has mostly been about velocity. Developers write code faster. Pull requests ship sooner. Backlogs shrink. That part is real, and it matters. But there's a bigger shift happening underneath it, and most engineering leaders I talk to are only just starting to feel its weight. AI hasn't just made developers faster. It has fundamentally expanded who can create and ship software. That changes things in ways that velocity metrics don't capture.

Cut your environment setup time in half with Chunk sidecar snapshots

When you’re building with AI, you can get a lot done in 30 seconds. Waiting minutes for CI feedback on your latest change can feel like an eternity. Chunk sidecars are designed to give you feedback fast, running your full test suite against the same Linux environment as CI, directly inside the agentic loop. Traditional CI pipelines can take five or ten minutes to catch a basic lint error or failing unit test.

Why we built relaxAI, and where your AI data actually goes

Sandboxing your AI agent is only half the story. The other half is where your data goes when it hits your LLM provider's API. In this clip from our secure execution agents webinar, Ben Norris, founding engineer at relaxAI, explains why the sovereignty of your AI provider matters just as much as the security of your agent's environment and why relaxAI was built on a sovereignty-first principle, with inference running exclusively in the UK and no foreign data transfer.

Escaping the AI Tokenomics Trap in Enterprise IT

AI adoption has accelerated faster than most organizations expected. What started with chatbots has quickly evolved into AI systems capable of making decisions across enterprise environments, with the promise of faster service and more efficient teams. But many organizations are discovering an unexpected challenge: as AI usage expands, costs become harder to predict. Most AI platforms operate on token-based pricing models.

Introducing Upsun Dispatch

AI has made writing code fast, and you can feel it. Commits are up, pull requests are up, new repos spin up over a weekend, and your engineers swear they are faster. But where are all the new products? If every team really got faster, the software you use every day should be getting visibly better. AI helped your engineers ship more code. It didn't help your team ship more products.

What nobody tells you about platform engineering at scale

Platform engineering has become one of the most discussed topics in cloud native infrastructure. Yet despite the rising focus, most conversations around platform engineering skip over the uncomfortable truths. What actually works at scale? When should you build versus buy? And how do you avoid the traps that trip up even experienced teams?

Retention Policies vs Retention Labels in SharePoint (2026): The Difference Admins Constantly Get Wrong

Retention policies apply to locations. Retention labels apply to items. Both live in Microsoft Purview, both retain content, and admins regularly use the wrong one. What each actually does and when to use which.
Featured Post

From firefighting to forward planning: a practical route to operational innovation

Operational innovation is often treated as a back-office efficiency exercise, but in practice, it is becoming a strategic discipline. As AI moves deeper into day-to-day operations, technical leaders need a clearer way to cut toil, reduce risk and build the capacity to innovate. For many operations teams, it starts with incident management. When responders are trapped in noisy alert streams, manual escalations and fragmented workflows, innovation is pushed aside by the urgent work of keeping services available.
Sponsored Post

Five things your logs will never tell you

A customer escalation hit my queue when I was on the customer smoke jumpers team at an observability vendor. My team was the group that parachutes into Fortune 500 accounts one bad week from churning and usually after a big customer outage. The customer had filed a billing dispute three weeks earlier and their on-call engineers were stuck. They had our full stack: logs, metrics, traces, end-to-end instrumentation, every product we sold and some we didn't. They could see the request came in. They could see it returned a 500. They could not see the body. The trace was sampled out. The log line was truncated at 4KB.

Governing AI Agents at Runtime: Open Source Zero-Trust with AGT | Ubuntu Summit 26.04

AI agents are moving from demos to production – but who governs what they do at runtime? The Agent Governance Toolkit (AGT) is an open source, MIT-licensed framework from Microsoft that enforces deterministic policy before every tool call, message, and action an agent takes. In this talk, Imran walks through how AGT brings zero-trust identity, policy-as-code, tamper-evident Merkle audit chains, and a Kubernetes sidecar model to any AI agent, regardless of framework.

Upsun included in IDC ProductScape on worldwide cloud deployment-centric platforms, 2026

Upsun is included in IDC ProductScape guide to worldwide cloud deployment-centric platform capabilities. Building and scaling applications has never been more complex or more critical. Engineering teams are under constant pressure to ship faster, manage increasingly complex infrastructure, and adapt to the rapid rise of agent and AI-powered development. Choosing the right platform to support these demands is an important decision for technology organizations.

We redesigned Spike

Last Christmas, after everyone had gone quiet for the holidays, I sat down with a pen and some paper and started drawing Spike. Not the Spike we actually had, but the Spike I wanted, the one I had been carrying around in my head for a long time without ever really putting it down anywhere. A little while later I brought a few of those screens into Figma and showed them to the team over coffee one afternoon.

How to track business expenses in 2026: methods, tools, and AI spend

How to track expenses for a business: categorize expense types (operating, software, cloud, travel, capital), choose a tracking method (spreadsheet, accounting software, expense management tool, or cost intelligence platform), connect data sources (bank feeds, cloud billing APIs, SaaS invoices), assign ownership per cost center, set a reporting schedule, and audit quarterly.

How AI Is Transforming Production Issue Investigation for Modern DevOps Teams?

Production failures don't announce themselves cleanly. They arrive at 2 AM, buried inside 40 million log lines, spread across a dozen microservices, and disguised as something that looks entirely unrelated to the actual root cause. For years, engineering teams absorbed this pain through process: runbooks, on-call rotations, dashboards, and a deep institutional knowledge that lived in the heads of their most senior engineers.

Ship From Where You Build: Harness Delivery Intelligence, Now Inside Antigravity | Harness Blog

The Harness MCP Server now connects directly inside Google Antigravity. Developers can link Harness in under two minutes and give the agent structured, real-time access to their pipelines, execution history, services, environments, and policies, without leaving the editor. What makes it reliable isn't the connection itself. It's the Harness Software Delivery Knowledge Graph underneath, which gives the agent the context to act accurately, fast, and within your guardrails. ‍

AI Coding Security Risks Demand Dependency Firewalls | Harness Blog

AI coding assistants accelerate development but can rapidly introduce vulnerable, malicious, or non-compliant open-source dependencies into your codebase. Harness Artifact Registry's Dependency Firewall acts as a registry-level control point, evaluating and blocking risky external packages before they enter your CI/CD pipeline—essential protection against modern npm-style supply chain attacks.

Why ITSM Still Isn't Solving Tickets (And What Comes Next)

Most ITSM platforms make it easier to submit tickets. They don't make it easier to resolve them. As we said in our webinar: "A better front door without backbone orchestration is just a faster handoff." The future of IT isn't faster ticket creation. It's autonomous ticket resolution powered by AI, automation, and orchestration.

How to build a hybrid private cloud strategy that scales with your business

Most hybrid cloud strategies fail not at launch but at scale. The architecture works fine for the first year. The team's workloads are modest, the integration points are limited, and the operational overhead is manageable. Then the business grows. Workloads multiply, data volumes climb, the team expands, and the seams between public cloud and private infrastructure start showing.

How to build sustainable AI infrastructure on GPU cloud

AI's environmental cost is real, and it's growing. Training a large language model can consume the electricity of hundreds of households for weeks. Inference at production scale runs continuously, with GPU clusters drawing power around the clock. The data centers that house all of this are some of the most concentrated energy consumers in the modern technology stack.

Beware of PII in Testing Data: The Security Iceberg and Where PII Actually Hides

If you run a platform tools or security team, you have likely heard this request from developers: “I just need a copy of the production database for staging so I can run realistic load and integration tests.” It is a completely reasonable request. Production traffic and data contain the actual request shapes, real-world value distributions, long-tail anomalies, and timing patterns that make tests useful.

Integrating Digital Employee Experience (DEX) with ServiceNow: What IT Teams Need to Know

As CTO for Teneo, I get the opportunity to meet with many of our customers to talk about plans for the next few years. I often find we spend a lot of time talking about Digital Employee Experience, but far less time is spent fixing the operational friction that quietly erodes it. Slow devices, degraded application performance, and recurring service desk tickets are common themes in many organizations.

AWS Summit London & NYC: what engineers want

Across two AWS Summit events in London and New York City, we had the chance to speak with more than 1,000 engineers. They came from startups building their first production stack, and enterprises managing large AWS and multi-cloud deployments. The energy was exactly what you'd expect: major AWS launches, dozens of new service announcements, wall-to-wall cloud conversations. And HAProxy right in the middle of it.

DataStream 2.0: Faster, Smarter, Built for Scale

June 19, 2026 This is not a regular monthly update. DataStream Version 2.0 is a milestone — the result of relentless building, learning from customers, and pushing the platform toward what enterprise-scale security operations actually demand. The core has been rebuilt, new capabilities have been added across the board, and the platform is now faster, more resilient, and more extensible than ever. Here’s what’s new.

Why your PaaS choice is a governance commitment

Choosing a Platform-as-a-Service (PaaS) is not just an infrastructure decision. It is also a decision about how personal data will be handled over the life of the project. It's a governance commitment made early, with consequences that run late. A PaaS does not remove an organization’s accountability for privacy, security, or regulatory compliance. However, a well-architected PaaS can materially strengthen the control environment in which those obligations are managed.

Build WireMock mappings fast from real traffic

I’m a big fan of service mocking. I’ve been working in and around software for about 25 years, and one thing never changes: when you sit down to work on your code, you almost never have everything available. The database, the third-party API, the message queue, the service two teams over. Something’s missing. So you’ve got to stub it out or mock it out and keep moving.

Introducing the New Galileo Website: A Better Resource for IT Visibility, Optimization, and Planning

That's why we've launched a completely redesigned Galileo website. The new site isn't just a fresh look but rather a reflection of our commitment to helping IT teams gain the visibility, insight, and guidance they need to manage modern infrastructure more effectively.

Chunk sidecars: Inner Loop Validation for AI Coding Agents

Your agent writes code fast, but you shouldn't have to see it until it's right. Chunk sidecars are lightweight microVMs that work inside the agent loop, requiring agents to pass pre-push validation in a CI-like environment before they declare they're "done." That means no massive CI pile-ups, no long round-trips that risk resetting your agent's context. You're sending code you already know is good.

So you need to add microcontrollers to your fleet: now what?

Your Ubuntu Core fleet is running beautifully. OTA updates roll out in minutes. Every device is strictly confined, cryptographically attested, and carrying a 10 to 15 year long term support (LTS) commitment. The operational team sleeps soundly. Then the product roadmap meeting happens. The industrial floor needs vibration sensors on every motor. The smart building needs temperature nodes in every room. The cold chain system requires dozens of low-power Bluetooth tags. And someone just said the words.

Only ONE company has all of DevSecOps - and it's not who you think

Harness has been named a Leader in the 2026 Gartner Magic Quadrant for DevSecOps Platforms — for the third year in a row. Here's what stands out: almost every company in the Leaders quadrant is missing a piece. Some have Dev, some have Ops, but not the security. Harness is the one platform that brings Dev, Sec, AND Ops together — with AI built in. AI is changing how software gets built: more code, more automation, more agents — but also more complexity. Teams need delivery, security, testing, reliability, and cost in one place. That's what we're building every day.

How to build a secure AI agent sandbox with relaxAI and Claude Code

AI agents are powerful. They're also unpredictable, non-deterministic, and capable of doing things you didn't ask them to do, as the Rome Alibaba and Claude Mythos case studies make very clear. The answer isn't to avoid agentic AI. It's to run it properly. In this demo, Ben Norris, founding engineer at relaxAI, shows how to build a fully sandboxed AI agent environment from scratch, an ephemeral Civo VM provisioned via Terraform and GitHub Actions, locked down with egress policies, an unprivileged Linux user, and hard resource caps, running a Claude Code session pointed at the relaxAI API.

How to run an operational excellence review for software engineering

Most engineering organizations already run something they call an operational review. It usually looks like a cousin of the quarterly business review: a deck assembled every few months, walked through team by team, anchored on whatever incidents happened to land in the previous quarter. By the time leadership sees the data, the systems it describes have moved on and the next set of risks is already accumulating in the gap.

Klaudia Under the Hood: How We Built an AI SRE That Actually Earns Trust

In reliability engineering, being ‘mostly right’ is a liability. An AI SRE that sometimes misses the root cause or gives a confident, wrong answer at 2:17 AM has no place in an enterprise cloud environment. In this context, silence is better than noise. That’s the bar Klaudia is built to clear: genuine reliability that you can trust in production. The kind of reliability that earns a place alongside your best engineers. Getting there requires more than just a capable model.

Operational excellence (OpEx) reviews: the weekly meeting that actually changes behavior

Cortex co-founder and CTO Ganesh Datta sits down with Shawn Burke, Distinguished Engineer at Cortex, to explore what separates an operational excellence review that drives real engineering behavior from one that produces great conversation and nothing else. Shawn draws on experience from SoFi, Uber, and Microsoft to explain why these reviews so often fail—and how to build a process that actually sticks.

Platform engineering unplugged: What nobody tells you about platform engineering at scale

Most platform engineering stories are told in hindsight, with the rough edges smoothed out. On June 17th, we are doing it differently. Join us for Platform Engineering Unplugged, a frank conversation with a practitioner who has navigated the real challenges of building and scaling platform engineering. What worked, what didn't, and what they would do differently. If you lead engineering teams and are thinking seriously about platform engineering, this is the session for you.

Cooldown policies - Block malicious packages at the index

Every dependency pull is a trust decision. Public registries don't vet what they serve. Cooldown policies give you a gate at the moment that matters most: when a package first enters your environment. Dan McKinney (Solutions Engineering Manager) walks through how Cloudsmith's cooldown policies work and how to configure one in under five minutes. What Dan covers.

What Is Database Software? Types, Examples (Including dbForge Edge)

Database software helps organize, manage, retrieve, and analyze data in databases. But what does that actually mean in practice? In this video, we explain what database software is using a simple library analogy, show how it helps add, edit, delete, and report on data. We also break down the main types of database tools used by developers, DBAs, analysts, and technical teams. You will also see examples of well-known database software, including SSMS, MySQL Workbench, pgAdmin, Oracle SQL Developer, JetBrains DataGrip, DBeaver, and dbForge Edge.

7 Best AI Search Tools Across Slack, Google Drive, and GitHub That Flag Stale Docs

An authoritative-looking snippet can be poisonous if it's two versions behind. A Gartner CX survey found that 56 percent of users complain about outdated documentation, and a 2026 Support Ops study attributes nearly 40 percent of tickets to articles that are stale or unclear. If a deployment script changes yet the old README still ranks first in Slack, you can lose an afternoon chasing errors. Multiply that across every lapsed policy, pricing deck, or support macro, and productivity shrinks-along with audit scores and customer trust.

AI Agents Are the New Employees: The Identity & Security Crisis Enterprise IT Must Solve

As AI agents become more autonomous, enterprises face a new challenge: How do you secure a workforce that isn't human? In this episode of Agents of IT, Fran Fernandez, Zach Austin, and Ian Coppock explore the growing identity and security challenges surrounding Agentic AI. From permissions and governance to digital identities and access controls, the team breaks down what enterprise leaders need to know before deploying AI agents at scale.

How to Fix Azure Integration Errors in Minutes Instead of Days

Azure integration errors can be difficult to diagnose when messages flow across multiple services such as Logic Apps, Service Bus, Azure Functions, APIs, and external systems. Support teams often spend hours searching through logs and correlating events across services just to identify where a transaction failed.

Why Multi-Agent AI Workflows Need a Control Plane

AI is transforming how infrastructure and platform teams design, deploy, and operate systems. As organizations move from experimentation to production, a clear pattern is emerging. AI can decide what should change, but it cannot safely control how those changes are executed. This creates a gap in modern architectures. That gap is filled by a control plane. That control plane already exists in Puppet Enterprise Advanced.

Why Day 2 Operations Are Harder Than Deployment (And What To Do About It)

Getting your application deployed feels like finishing a race. You push the code, the containers spin up, the health checks go green, and for a brief moment everything feels solved. Then Day 2 arrives. Day 2 is not a specific date. It is the entire operational life of your application after that first successful deployment. It is the stretch of time that can last years, and it is where most teams quietly discover that deployment was the easy part.

How one PM scaled customer discovery with AI

Customer interviews are one of the most powerful ways to build better products — but they’re also time-consuming. In this video, Avinoam “Avi” Zelenko, Principal Product Manager at Atlassian, shares how he transformed the way he runs customer interviews using AI automation and Rovo agents. What used to take hours of coordination, note-taking, and manual summaries now happens automatically. By stitching together the Teamwork Collection and Slack, Avi built a workflow that captures conversations, summarizes insights, and shares them across teams in real time.

The sovereignty debate explained with Nine23

Who really owns your data? Data sovereignty has become one of the defining issues shaping digital infrastructure, cloud strategy and AI adoption. But what does it actually mean, and why has it become a board-level discussion for so many organisations? In Episode 4 of Perspectives from the Edge, Pulsant's Wendy Shearer is joined by Steve Jewell, CEO of Nine23, to explore data sovereignty and its relationship to security, resilience and digital transformation.

Reduce Alert Fatigue with Composite Alerting in Hosted Graphite | Tutorial

Tired of noisy alerts waking you up for issues that are not actually impacting your services? In this tutorial, we walk through MetricFire's Composite Alerting capabilities and show how to combine multiple metric conditions into a single high-confidence alert using AND / OR logic. Learn how to: Reduce alert fatigue and false positives Create service level alerts in Graphite Combine CPU, latency, and database metrics into meaningful alerts Use conditional logic to improve signal quality Build smarter observability workflows with Hosted Graphite.

We wrote the docs

Most security vendors hide their documentation behind a login. Some don’t write it at all. You get a sales page, a demo, and a request to install an agent on your servers, and you’re expected to trust that the thing does what the marketing says. That’s backwards. So we wrote the docs, and we put all of them at certkit.io/docs. No login, no account gate, no “contact us for details.” You can read every page before you create an account.

Real-Time CPU and Memory Insights for Harness CI Cloud Builds | Harness Blog

When a CI pipeline runs on cloud infrastructure, the build machine is ephemeral. It spins up, executes your build, and disappears. During that window, you have zero visibility into how much CPU and memory your pipeline actually consumes. This blind spot creates real problems. Teams over-provision VMs "just in case," wasting compute spend. Others under-provision and deal with silent OOM-kills or CPU throttling — the only clue being a cryptic exit code 137.

What Is Coherent Routing?

Coherent routing, Routed Optical Networking, Converged Optical Routing Architecture (CORA), are all names for the same concept: an advanced network architecture which integrates coherent optical transceivers directly into IP routers. This convergence of layers creates a simplified and highly efficient IP-over-DWDM (IP over Dense Wavelength Division Multiplexing) network.

Are AI Tools Actually Improving Developer Experience? (Experts Cut Through the Hype)

AI tools are spreading across the entire software development lifecycle - but are they actually making developers more productive, or just adding noise? In this panel from Context Conference, Najla Elmachtoub (Squadformers) moderates a sharp, no-fluff conversation with Nathen Harvey (Google, DORA program), Bill Harding (GitClear), and Jeremy Castile (GitKraken) on what's really working when it comes to AI and developer experience.

Harness Named a Leader in the 2026 Gartner Magic Quadrant for DevSecOps Platforms for the Third Consecutive Year | Harness Blog

Harness has been recognized as a Leader in the 2026 Gartner Magic Quadrant for DevSecOps Platforms for the third consecutive year. Harness was also positioned furthest on the Completeness of Vision axis in the report. Harness has been recognized as a Leader in the 2026 Gartner Magic Quadrant for DevSecOps Platforms for the third consecutive year. Harness was also positioned furthest on the Completeness of Vision axis in the report.

Lock-in is not theoretical: What UK organizations told us about cloud exit barriers

For years, vendor lock-in has been discussed as a theoretical risk. A concern to acknowledge in architecture reviews. A box to tick in compliance frameworks. A future problem that might need addressing. Our latest research reveals something more urgent. For UK organizations, lock-in isn't theoretical anymore. It's structural. It's measurable. And it's preventing organizations from acting on their own strategic priorities.

Scaling Android development with Anbox Cloud

Discover how Anbox Cloud helps engineering teams scale Android development by moving Android workloads from physical hardware into the cloud. In this video, we showcase how developers can run, test, validate, and share Android environments on demand using containerized and virtualized Android instances. We explore how both approaches work, key differences, and use cases.

Validating real-world skills through Canonical Academy

In an increasingly volatile job market, standing out from the competition is vital. For many in the open source community, formal recognition for self-taught skills is a significant challenge. These skills are often built through hands-on hobbies, side projects, and deep community contributions. While the market is flooded with certificates and certifications, most fail to reliably measure practical execution, or fall behind the rapid pace of industry changes.

EU Data Act Compliance for Cloud and DevOps Teams: What Changes You Need to Make Before the Deadline

If you have been staying updated on trends, you already know that the EU data act is actually very quickly turning into one of the most important regulations for companies that manage business data and customers in Europe. Understandably, many companies have already been adjusting to privacy laws over the past few years. However, this new regulation has brought on different challenges.

The Godfather of AI Ready Data Centers | OCOLO CEO & Founder Tony Rossabi

AI is reshaping digital infrastructure, but the biggest challenge isn't always building bigger data centers, it's finding the power to run them. In this episode of Uplink, Michael Reid sits down with Tony Rossabi, Founder & CEO of OCOLO, to discuss how AI is changing the data center industry and what it takes to deliver the next generation of infrastructure.

AI is only one of four things driving the data center boom

Tony Rossabi, aka the Godfather, has spent 30 years in this industry. Car washes to Telx to building data centers. He sat down with our CEO Michael Reid to break down what’s actually happening underneath the AI headlines, from where the real demand is coming from, to why a single megawatt of power is so hard to find, and how a team of eight is building 19 ten-megawatt facilities across two continents in 24 months.

Track Deployment status for your PRs (Beta)

You shouldn’t have to leave your PR list to know where your code is deployed. Yet, developers constantly lose time context-switching just to see if a change hit staging or production. To solve this, we are launching the Beta version of Deployment Status Tracking for your PRs. This feature surfaces live deployment statuses directly within your PR list view as code moves through your pipeline.

Introducing Kepler | GitKraken's Agentic Development Environment (ADE)

Kepler is GitKraken's agentic development environment: mission control for running parallel coding agents at scale. Running one agent is easy. Running five of them across three repos is where things break: scattered terminals, no shared view, no idea what's done or stuck. Kepler puts every agent session on one surface so you can plan work, write code, and review what ships without losing track of anything.

GitKraken: The Code Flow Company

From plan to main. Software is no longer just a tool. It is the infrastructure of modern life. Software keeps airplanes in the sky and power flowing into our homes. It helps doctors save lives, scientists discover cures, farmers feed cities, and astronauts navigate space. It powers economies, protects supply chains, and connects billions of people across the world. Every major system humanity depends on now depends on software. Which means developers are no longer just building applications.

Measuring engineering organizations in the age of AI

Engineering leadership is in the middle of a real transition, and most of the leaders I talk to know it. AI has reshaped how software gets built quickly enough that the operating models many of us spent a decade refining no longer fit cleanly, and there is a great deal of serious work happening across the industry to figure out how these models should evolve. The teams I find most impressive right now are the ones treating their operating model as an open question rather than a settled one.

Template: Streamlining open source design contributions

As designers working at Canonical, we’re always thinking about open source. We believe that encouraging more designers to contribute to open source benefits everyone, from the project maintainers to the end users themselves. In the 2025 edition of FOSSBackstage conference, we presented our research findings on why designers don’t get involved in open source projects and found a particular breakdown between designers and project maintainers.

Beyond Mythos: responding to a new threat landscape

Canonical’s security philosophy has always been built on the premise that vulnerabilities exist and will be discovered. Our response relies on defense-in-depth architecture, rapid patch deployment, and strict adherence to Coordinated Vulnerability Disclosure (CVD). AI changes vulnerability discovery volume and speed. We have a robust vulnerability management process that is backed by rigorous compliance certifications.

Shipped: Stop rebuilding Views from scratch

In Explorer, you build a filter set and group-by to answer a cost question, and often that’s exactly the configuration you’d want to save for later. But saving it as a View meant navigating away from Explorer, opening the Views page, and rebuilding the same configuration from scratch: filter by filter, dimension by dimension. That friction was enough to discourage saving exploratory analysis as a View at all You can now save any Explorer analysis as a View in place.

AI pricing explained: what AI actually costs and how providers charge for it in 2026

AI pricing covers the cost structures and billing models providers use to charge for AI products: per-token APIs (GPT-4o at $2.50/1M input tokens), per-seat subscriptions (Copilot at $30/user/month), per-conversation billing (Agentforce at $2/conversation), and consumption-based GPU compute (H100 instances at $55.04/hour). There is no standard. The total AI cost is almost always higher than the sticker price.

8 IT Infrastructure Automation Use Cases to Prioritize

IT infrastructure automation sounds simple enough on the surface, right? You take repetitive infrastructure work, turn it into automated workflows, and give engineers more time for higher-value problems. This may seem easy, but in practice, it gets more interesting. Modern IT environments are spread across cloud platforms, legacy systems, identity tools, ITSM platforms, monitoring systems, network devices, and business-critical applications.

The bottleneck has moved. AI is rewriting the Software Development Lifecycle

If you've read our previous piece on the 8 stages of AI engineering maturity, you know where your team sits. Turns out adopting AI is the easy part; adapting to its consequences is where most organizations struggle. For more than a decade, software organizations optimized around a single assumption: implementation capacity was scarce.

Why CI/CD Pipelines Miss Runtime Failures

CI/CD pipelines do four things: it builds code, runs tests against mocked dependencies, lints for style violations, and scans for known vulnerability patterns. What it cannot do is validate how that code behaves under real users, real service responses, and real runtime constraints that staging was never configured to reproduce. That entire class of failure clears every gate cleanly and surfaces only in production.

Why route diversity is critical to resilient global connectivity

Subsea cables have long been the invisible backbone of the internet, carrying more than 95% of global data traffic beneath the ocean’s surface. Today, they are no longer just background infrastructure, they sit at the centre of an increasingly complex digital and geopolitical landscape. The rise of artificial intelligence, alongside continued cloud expansion and hyperscale data centre growth, is driving unprecedented demand for high-capacity, low-latency connectivity.

LightMesh DHCP Integration: Always Know What's on Your Network

Dynamic Host Configuration Protocol (DHCP) activity changes faster than most IP inventory systems can keep up. Devices reconnect. Leases expire. Infrastructure changes constantly across servers, endpoints, and cloud environments. If your IP inventory cannot reflect those changes automatically, teams quickly lose confidence in the data they rely on to operate the network.

ClickHouse LowCardinality: When It Helps and When It Hurts

ClickHouse LowCardinality cuts storage and speeds up queries on low-cardinality columns, but backfires on trace IDs. How to tell the difference. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

The Miasma worm explained: How it Hit Red Hat and Microsoft

Miasma has already hit Red Hat and 73 Microsoft GitHub repos. Here's how it works and what your team can do right now. Nigel Douglas, Head of Developer Relations at Cloudsmith, breaks down the Miasma worm – a self-replicating supply chain attack and evolved variant of Mini Shai-Hulud from threat group TeamPCP. Learn how Miasma uses the yo-yo attack method to move laterally across registries and workstations, why conventional scanners missed it, and the practical steps security teams can take today, including cooldown policies and continuous risk assessment.

New Pipeline Triggers | Bitbucket Blitz | Atlassian

Bitbucket Pipelines already supports triggering on repository pushes and pull request pushes. But if you wanted to react to other events like a deployment finishing, a build failing, or a PR getting merged, you had to wire that up yourself with webhooks or external glue code. New trigger types let you define those conditions directly in your YAML. You can fire custom pipelines on events like pipeline-completed, deployment-completed, pullrequest-created, pullrequest-fulfilled, and more.

Introducing Kepler: The Delivery Engine for Agent-Driven Development

You’re no longer writing code. You’re managing a pipeline of agents writing it for you. If you’ve been running two, three, or four AI coding agents in parallel, you already know the problem. The agents are fast. The orchestration is chaos. You’re bouncing between terminal windows, manually rebasing branches, cleaning up messy commits, and trying to remember which agent is touching which repo.

Why Digital Operational Resilience Act (DORA) Compliance Requires Auditable Database Change Management

This article examines DORA's requirements for database change management and explains how Redgate Flyway Enterprise addresses them. The EU's Digital Operational Resilience Act (DORA) came into full effect in January 2025. It is designed to strengthen the ability of financial institutions to withstand operational disruption, whether caused by technology failures, data corruption, human error, or a cyberattack.

Without Governance, AI Is Just Faster Failure

Kellyn Gorman is a Database and AI Advocate and Engineer at Redgate She's the previous director of Data and AI at Silk, and the Oracle SME in Azure at Microsoft. With a robust background in cloud technology and a passion for promoting its merits and potential, I am thrilled to spearhead conversations and actions that help shape the future of this industry. Kellyn has authored numerous technical books, white papers and solution repositories in GitHub on database, AI and engineering topics.

Anthropic Fable 5 & Mythos 5 Suspended AI Risk Revealed!

Your entire AI stack ran on a model that disappeared in three days. The US government issued a directive suspending all access — a few hours' notice, no deprecation window, no roadmap. Launched Tuesday. Gone by Friday. And every enterprise that had built workflows on top of it just found out what the real risk was: not the model itself, but the absence of a governance layer underneath it.

What is AWS Cloud WAN? Benefits, Use Cases, and Adoption Best Practices

Learn how AWS Cloud WAN works, key benefits, limitations, use cases, and adoption best practices. As AWS environments grow across multiple regions and accounts, networking can become increasingly difficult to manage. What starts as a handful of virtual private clouds (VPCs) can quickly evolve into a complex web of connectivity, routing policies, and security requirements.

Closing remarks | Ubuntu Summit 26.04

The closing remark at the Ubuntu Summit reflects on two days of open source innovation and community milestone announcements. Save the dates for the next Ubuntu Summit: November 12-13, 2026! About Diogo Diogo Sousa is the Security Engineering Manager at Canonical. Ubuntu Summit 26.04 is a showcase for the innovative and the ambitious. Subscribe. Fuel your curiosity.

The cloud bill explained: A guide for finance and engineering

The cloud bill arrives at the end of every month, and somewhere in it sits a line item that nobody outside the infrastructure team really understands. It might be called "data transfer," "egress," or "outbound bandwidth," and it might be 5% of the total or even 25%. Whatever it is, it tends to be the line that finance asks engineering about, and engineering struggles to explain in a way that finance can act on. The problem is that egress is a fee that hides in plain sight. It's not on the marketing page.

Why Traditional DCIM Systems Fall Short: A Look at Cost and Complexity Solutions with Hyperview

Traditional DCIM systems often become a tangle of complexity just when your team needs clear, actionable insight. You’re stuck managing bulky software that slows onboarding and drives up costs without delivering the real-time infrastructure visibility you need. Hyperview offers a different path: a cloud-based DCIM platform powered by AI that cuts through the noise to give you faster decisions, lower operational drag, and smarter control.

Why developer teams are rethinking their cloud provider this year

The default cloud choice for technically literate teams has shifted. It hasn't shifted dramatically; the major hyperscalers aren't going anywhere, and their enterprise position is still strong, but the conversation that used to start with "which hyperscaler" now genuinely starts with "what do we actually need." That's new.

Capture once, test forever

We’ve gotten used to understanding our applications through signals, summaries, and traces. Tiny little bits of information about how the app really works. Not because that’s the best way to do it, but because it’s been too hard to get the real thing. The real information exists. It’s on the network. How people called your app and what your code did. What other systems it called, the database queries it made, and the result sets that came back.

Fixing 403 auth errors when you replay traffic

Trigger warning: this one is about Java, authentication, and Docker Compose files. If that is not your thing, I am sorry, but they are part of life and they are honestly not that hard to work with. Everything here is open source on our GitHub repo, so you can follow along. Recording an authenticated Java flow, replaying it, hitting the dreaded 403, and fixing it with a proxymock recommendation.

Excel Add-Ins 3.0 Updates: Excel 2024 Compatibility, Expanded Database and Cloud Coverage, and Modern Security Enhancements

We are pleased to announce the release of Excel Add-ins 3.0, a major update to our Excel Add-ins for databases and cloud applications. The new version adds support for Microsoft Excel 2024 across all products. It also includes new database versions, expanded object and report support, improved data type handling, and enhanced connection security.

Shipped: You're emitting AI telemetry. Point it at an engine that turns it into allocated spend.

Your AI calls already emit OpenTelemetry: your LLM gateway exports it, and it’s the open standard your own services can speak. But you don’t have anywhere to turn those spans into spend you can allocate to an outcome. Now you can. CloudZero exposes an OpenTelemetry endpoint that doesn’t care what’s on the other end.

How to monitor and optimize GPU utilization in the cloud

GPU utilization is one of the most expensive metrics in cloud infrastructure to get wrong. A GPU running at 30% utilization costs the same as one running at 90%, but it's doing a third of the useful work. For workloads measured in tens of thousands of GPU-hours, the difference between average utilization in the 30s and average utilization in the 70s is hundreds of thousands of dollars across the life of the workload.

A New Console for Qovery

We rebuilt large parts of the Qovery Console: new navigation, overviews at every level, dark mode, and a modernized frontend architecture with TanStack Router and React Suspense. Rémi is a staff frontend engineer at Qovery. He writes about frontend architecture, developer experience, and building scalable UI systems for platform engineering tools. Théo is a senior product designer at Qovery.

The alerts worth your time. Resolved faster

It's 7am. An alert fired overnight. You open your monitoring solution, navigate to the alert, cross-reference the waits, check the query plans. Twenty minutes later: it should not have fired. You knew that before you started, but you had to check anyways. The feeling of being overwhelmed by alerts is real. And so is the cost. Thresholds set once and forgotten, firing on patterns that have been normal for months. The inbox fills. DBAs learn to ignore most alerts. The workaround becomes the workflow.

Why database governance in financial services is falling behind where it matters most

If anyone knows how to operate under scrutiny, it’s database teams within finance organizations. It’s a given considering the more rigorous compliance requirements and processes they must follow. But the 2026 State of the Database Landscape: Finance Edition reveals something more specific, and more uncomfortable, than the familiar story of regulatory pressure.

AI Found 18 OpenSSL Vulnerabilities. Now Your Team Has to Patch Them.

On June 9, 2026, the OpenSSL project released patches covering 18 vulnerabilities across its supported releases. The headline flaw, CVE-2026-45447, is rated high severity and has the potential for remote code execution. Not too long ago, a security advisory with 18 vulnerabilities would have been routine. Microsoft’s Patch Tuesday provided a predictable cycle, and organizations operated with the expectation of a meaningful remediation window. That model is under pressure.

dbForge: AI-Powered Multi-Database Tools for SQL Development & Management

dbForge is an AI-powered multi-database ecosystem for SQL development, database design, data management, testing, administration, reporting, and automation. In this video, you will see how dbForge helps database developers, DBAs, and technical teams reduce tool switching and work across SQL Server, MySQL, MariaDB, PostgreSQL, Oracle, cloud databases, and on-premises environments from one connected ecosystem.

dbForge - AI-Powered Database Ecosystem for Developers & DBAs

Managing different databases, tools, and environments can slow down your workflow… but dbForge brings everything together. Work with SQL Server, MySQL, MariaDB, PostgreSQL, Oracle, and cloud databases Use AI to generate, explain, fix, and optimize SQL queries Design, develop, test, manage, and automate databases Choose from dbForge Edge, dedicated Studios, standalone tools, and SSMS/Visual Studio add-ins.

PagerDuty Report Finds Two-Thirds (66%) of Office Professionals Have Used Unauthorized AI Tools at Work

Three-quarters of office professionals (75%) say they would be likely to look for a new job that offered better AI skills development, a figure that climbs to 80% at companies with $1 billion or more in revenue.

The Next Evolution of Infrastructure Observability

Operational visibility is becoming increasingly important as infrastructure teams are asked to support AI initiatives, automation goals, cost accountability, modernization efforts, and growing operational complexity at the same time. Most are expected to do it without expanding headcount, introducing additional risk, or rebuilding the environment from scratch. Those expectations are changing the role of infrastructure operations.

3 Platform Engineering Shifts From Devoxx France 2026

Three days, 20 talks at Devoxx France 2026. The through-line wasn't AI hype - it was discipline. Context engineering, code review under AI volume, and the local-vs-remote question now shaping security, cost, and sovereignty. Fabien is a senior software engineer at Qovery. He writes about platform engineering, AI tooling, context engineering, and the practical realities of running modern developer infrastructure.

Modernizing Communications For Mission-Critical Networks

Mission-critical networks are changing fast. Utilities, transport operators, and critical infrastructure providers are under pressure to deliver more data, more automation, and more resilience—without ever compromising reliability. The challenge is simple: legacy SDH/SONET networks were built for a different era. They still deliver reliability. But they can’t support what comes next.

Shipped: What did the feature cost to ship? What does this customer cost to serve?

You can already split AI spend by team and by model. But that’s not what your CEO asks in the QBR. The question is what you got for it: what did it cost to ship that feature, to launch that campaign, to serve that customer. And is the AI bet behind it paying off? Now you can allocate AI spend to the outcomes you own: customer, product, feature, the strategic bet on the P&L. Not just the team that spent it.

The next era of telco clouds: get open infrastructure choice with Sylva and Canonical Kubernetes

The telco industry is undergoing a fundamental change. Over the past few years, the increasing maturity of cloud-native infrastructure has accelerated the movement from manually operated and hardware-centric systems to automated, software-defined platforms. Underpinning this change are open source initiatives such as the Sylva project. Sylva is hosted by Linux Foundation Europe and heavily backed by major telecom operators and vendors.

AI at the edge: simplifying infrastructure with Cisco and Canonical

Legacy infrastructure was not designed for the requirements of the AI era. While large-scale model training remains centralized in data centers, test-time inference is rapidly shifting to the edge to reduce latency and bandwidth consumption. This shift creates a new frontier for enterprise AI, but deploying at the edge introduces significant manual complexity, interoperability issues, and security vulnerabilities.

Turning down grad school, self-learning Power BI, and Lego! (Kristyna Ferris) | Simple Talk Podcast

Kristyna Ferris turned down grad school, learned Power BI, moved into the data world - and never looked back. In this chat with Steve Jones, Kristyna explains why she did it, what she’s learned, and even why her first DBA changed her password! Plus: being a Microsoft MVP, the importance of self-learning, being inspired to get involved with the community, and Kristyna’s passion for Lego, movies, and more!

#060 - Beyond ELK: Elastic's 10-Year Evolution, Open-Source Licensing, and the AI Frontier with P...

In this episode of the Kubernetes for Humans podcast, Philipp shares his incredible 10-year journey at Elastic, witnessing the company's massive growth from 300 to 4,000 employees. Discover the fascinating origin story of how Elastic evolved from a simple recipe search project into a global powerhouse for observability, security, and vector databases.

Introducing the Rootly Agent

During an incident, ask the Rootly Agent anything and it'll respond (and act) based on context and your data. Use the Rootly Agent to: The Rootly Agent performs actions on your behalf, so it is bound by the permissions assigned to your user. It will also ask for confirmation before taking significant actions. Rootly admins can turn it on for their workplaces and start running incidents even more efficiently.

How Clover moved beyond blue-green deployments with HAProxy Fusion Control Plane

Clover’s platform handles more than just payments: inventory, employee management, online sales, and customer loyalty programs are all running on a single monolith called the Clover Operating System (COS). Releasing updates to that platform reliably and without disrupting merchants is one of the hardest operational problems a platform team can face. For a decade, Clover ran HAProxy at the center of its infrastructure.

Atlassian's HR team leads AI transformation

AI transformation doesn’t succeed without people at the center. At Atlassian, HR is leading the way. Our People team believes that the best AI culture isn’t mandated from the top. It’s built by meeting employees where they are, partnering with leaders across the business, and making AI part of how work gets done from day one. See how Atlassian’s HR team is building a culture of experimentation where everyone builds, and what that looks like in practice.

AI Made Infrastructure Weird Again | Ubuntu Summit 26.04

For years, we were told we were escaping hardware. Virtualization, containers, and Kubernetes made the underlying servers practically invisible to the average application developer. Then came the AI boom and infrastructure got incredibly weird again. In this fast-paced lightning talk, Billy Olson from Canonical breaks down why the modern AI server is no longer just a machine, but a volatile distributed system packed inside a single chassis.

Tokenmaxxing: The AI Productivity Lie

Your best engineer spent 500,000 tokens last week. Nothing shipped. There's a name for it now: tokenmaxxing. Failed prompts, dead PRs, code that never reaches production — it looks like productivity, but it isn't. Most engineering leaders can't tell you what percentage of AI-generated code actually ships, or where the budget went. You should be able to say "that bug cost me $2,700 in tokens to fix.".

How to run self-hosted AI on your own infrastructure with Konstruct

Civo Platform Engineer M R Rishi demonstrates how to go from zero to self-hosted AI in minutes using Konstruct. While most teams are stuck managing thousands of configuration values across multiple models and tools, Rishi shows how Konstruct eliminates that complexity with GPU cluster provisioning, GitOps catalog deployments, and production-ready infrastructure on day zero.

Why Small Business IT Disasters Are Almost Always Preventable

A server goes down on a Tuesday morning. A ransomware file starts encrypting documents at 2 a.m. A key employee clicks a link in what looked like a vendor invoice, and by the time anyone notices, credentials have been sitting in the wrong hands for six hours.

We won't train on your data is not a security architecture

Every enterprise contract I’ve signed in the last two years has the same clause. “Vendor will not use Customer Data to train machine learning models.” Sometimes it’s a paragraph. Sometimes it’s a whole section. The language varies but the intent is identical: don’t feed our production data into your AI. I get it. I sign the same clause as a vendor. But here’s what’s been bothering me: that clause is a promise, not an architecture.

Secret Manager Integration: One Source of Truth for Humans and Agents.

Production secrets should live in one place and stay there, whether your next deployment is triggered by a developer or an AI agent. The Secret Manager integration connects AWS Secrets Manager, AWS SSM, or GCP Secret Manager to Qovery so secrets are referenced, never copied, and enterprise governance holds regardless of who deploys. Alessandro leads product at Qovery. He drives the changelog, roadmap, and product strategy - turning customer feedback into platform capabilities.

How Managed Digital Employee Experience (DEX) Supports Smarter Device Refresh Decisions

Let’s face it, refreshing devices used to be a guessing game. IT teams would swap out laptops and desktops on a fixed schedule, hoping to keep everyone happy and productive. But in today’s hybrid, cloud-first world, that old approach just doesn’t work. Employees expect seamless experience, and businesses can’t afford to waste money on unnecessary upgrades or risk productivity dips from outdated tech. That’s where Digital Employee Experience (DEX) comes in.

How Teams Work Faster with Puppet AI

Can AI actually improve infrastructure operations? Without sacrificing control? In this webinar, see how teams use Puppet AI to understand infrastructure with natural language, reduce operational effort, and move from insight to action faster—all within trusted automation workflows. Watch a live demo of detecting and mitigating a real-world vulnerability, and learn how context-aware AI helps teams scale safely with built-in governance.

The Two-Sided Scheduling Problem: Reaching the Next Layer of Cloud Savings

You’ve deployed Karpenter or Cluster Autoscaler and tightened your resource requests, but while you saw an initial dip in your cloud bill, your savings have flatlined. Organizations that thought they had the fundamentals of cloud cost under control are now seeing stagnation. The problem isn’t that they need another FinOps tool or better visibility. The problem is that the current state of enterprise cloud cost optimization strategy is fundamentally reactive.

The Inference Paradox: How Split-Brain LLMs Are Killing Your GPU ROI

During the Toronto KCD (Kubernetes Community Days), I attended an insightful talk on AI resource optimization that highlighted a staggering Gartner study: “AI infrastructure is adding $401 billion in new spending this year alone. Yet, real-world audits tell a much darker story, revealing that average GPU utilization in the enterprise is stuck at a dismal 5%”. While many people in the audience were shocked by that number, the data didn’t come as a surprise to us.

Centralize DHCP Visibility with the Windows Discovery Agent

Your Dynamic Host Configuration Protocol (DHCP) server already knows what’s connected to your network. The problem is that DHCP data rarely stays aligned with the rest of your infrastructure systems. Instead, it becomes fragmented across Windows servers, branch offices, spreadsheets, and disconnected operational tools. Lease data ages, assignments go untracked, and teams lose confidence in their network inventory.

Mainframe DevOps: Modern CI/CD for Big Iron | Harness Blog

For Platform Engineering teams, the goal has always been clear: build a secure, scalable internal developer platform that reduces cognitive load and accelerates time-to-market. Yet, a massive obstacle often remains hidden in plain sight: the mainframe. While your distributed teams are shipping cloud-native microservices multiple times a day, your core backend mainframe applications frequently remain locked in an isolated silo, lagging behind on slow monthly or quarterly cadences.

Agent Hooks + Chunk sidecars: Stop Broken AI Code Before It Hits CI

AI agents write code fast, but the feedback loop usually can't keep up. In this tutorial, you'll see how to wire Chunk sidecars into your agent's hooks so basic failures get caught before they ever reach your CI pipeline. We'll walk through the two hooks that chunk init writes automatically: Both hooks return exit 2 on failure, blocking the commit or keeping the turn open so the agent can fix its own mistakes with no manual prompting required.

Claude Mythos pricing in 2026: Fable 5 costs, Mythos 5 costs, and what every model actually runs

Claude Mythos is now available to the public through Claude Fable 5, released June 9, 2026. Claude Fable 5 pricing is $10 per million input tokens and $50 per million output tokens, exactly 2x Claude Opus 4.8 ($5/$25). Claude Mythos 5 (the restricted Project Glasswing version) has identical pricing. Prompt caching cuts input spend by 90%. Batch API pricing is $5/$25 (50% off). In April 2026, Anthropic announced a model it said was too dangerous to release.

Shipped: Catch the runaway agent while it's still running.

AI spend has no ceiling. An engineer can burn $5,000 in an hour, and a team that spins up an agent on Friday can loop it on a bad prompt all weekend. You find out when the bill lands: the money is already gone, the damage pieced back together from logs. Cloud spend had a natural limit. Tokens don’t. Now you see it as it happens. Connect a source and the calls stream in within seconds. Within minutes they’re broken out by model, provider, agent, and user.

Your Metrics Look Fine. Your Engineers Are About to Quit.

Developer experience predicts what's coming 3 to 6 months before it shows up in your delivery metrics. So why are most engineering leaders measuring it last? In this session, GitKraken VP of Developer Research Jeremy Castile breaks down what developer experience (DevX) actually is, how to measure it across 6 key dimensions, and how it connects to velocity, code quality, and AI impact data your team is already tracking.

Kubeflow MLOps tutorial: from notebook development to production inference

In this video, our engineering team takes you through a full end-to-end Kubeflow implementation, step by step – from data exploration to production inference. Follow the journey of a house price prediction use case and see how modern MLOps components work together: Kubeflow architectures and starter repositories Notebook-based development workflows Data exploration and model development MLflow for experiment tracking Katib for hyperparameter optimization Kubeflow Pipelines for automated preprocessing and training KServe for scalable model inference.

What Is Enterprise Service Management (ESM)? Explained

Enterprise service management (ESM) applies the proven model of IT service management, catalogs, workflows, self-service, and SLAs, to the whole business: HR, facilities, finance, and more. Here is what it is and how it works. What is enterprise service management, and how is it different from ITSM? In this explainer we define ESM, show how it works across departments, clarify how it builds on IT service management, and cover the mistake most teams make: copying IT ticket forms instead of orchestrating work across teams.

How to Size Infrastructure When Hardware Delays and Cost Pressure Change the Equation

Sizing infrastructure has always required a balance between performance, capacity, and risk. What has changed is the level of precision required to make those decisions. Hardware timelines are less predictable. Costs are under closer review. Decisions that were once routine now require clear justification. In many cases, the question is no longer just how much capacity is needed, but whether that capacity can be delivered when it is needed and whether the investment will hold up under scrutiny.

Agentic AI Governance: 5 Controls Enterprises Need for Safe Automation

The promise of agentic AI is dead simple to understand. Instead of waiting for a human to draft every instruction, an AI agent can interpret a goal, take action, and work across systems until the task is done. For IT teams, that motion sounds like the next logical phase of automation. That promise is real... but it’s also where the risk starts. Traditional automation followed instructions. Agentic AI, by contrast, pursues outcomes. That difference turns the entire governance model on its head.

Why the fastest teams standardize first

There's a version of this conversation that plays out in engineering organizations everywhere. Leadership pushes for standardization. Developers push back. The argument from developers is reasonable on its face: every codebase has different needs, every team has tools they're good at, and adding process feels like slowing down to go faster. It's a genuine tension, and it's also a false one. The teams that ship the most aren't the ones with the most infrastructure freedom.

The 8 stages of AI engineering maturity: a framework for teams

A few months ago, Steve Yegge published his 8 levels of AI-assisted development, and it clicked the moment I read it, because I had lived that exact progression myself, moving from autocomplete to running agents one step at a time. Framed as an AI trust gradient, it finally gave the industry a vocabulary for something most of us were already going through without a name for it. If you haven’t read it, save it for later.

How to land on the right side of the AI divide

AI changed how code gets written before it changed how code gets operated. Generation accelerated; the downstream controls that turn that output into reliable, secure software at a reasonable cost did not keep pace. The result is elevated risk, distributed unevenly across engineering organizations. A recent survey explains why the distribution is so uneven.

AI Economics Pulse: Your AI line item is winning, but is it working?

This edition of the Pulse is shifting lanes. We’re calling it the AI Economics Pulse now, because the question on every finance leader’s mind is whether AI spend and the returns on it can be made to pair at all. That question came to a head over the last few weeks. The bills came due, and they came due in public. Uber burned through its entire 2026 AI budget in four months and capped employee spending on Claude Code and Cursor at $1,500 a month.

Shipped: The AI spend on your team's laptops is the part you can't see.

Your engineers run Claude Code. Your designers are in Cowork. Half the company has Claude open in a browser tab, and a few are on Cursor. It’s on their laptops, each person authenticated a different way, and none of it touches your gateway. The only record you get is one lump-sum bill at the end of the month. Now you can capture it where it happens – on the laptop.

Claude Code alternatives in 2026: 10 AI coding tools compared on cost, features, and AI ROI

Something unusual happened in the first half of 2026: the most productive AI coding tool on the market became the most financially dangerous. And the companies that discovered this the hard way read like a Fortune 50 roll call.

Policy as Code Tools & Examples to Make Better Infrastructure Easier, Anywhere

You’re scaling your IT infrastructure so you can do more — deploying across clouds and data center, adding servers, coding like crazy. Great! But how do you keep it all from falling apart? Policy as code is an approach to managing IT that strategically leverages infrastructure as code (IaC) and compliance as code to manage consistent policies across complex IT environments. Sounds perfect, right?

An Architect's Guide to IPoDWDM

IPoDWDM is an architecture that integrates IP routing and Dense Wavelength Division Multiplexing into a single, converged platform. This integration is achieved by placing coherent optics directly into the ports of IP routers and switches, a fundamental shift from traditional network designs. Consequently, this approach eliminates the need for a separate, dedicated layer of optical transponders and their associated shelving.

AI Cost Savings Unlocking Hidden Engineering Value

Bain says AI cost savings aren't arriving. But the value isn't missing, it's invisible. Most engineering teams can see token spend. They can see AI usage. What they can't see is whether any of it shipped, and whether it moved the needle on delivery. That's the measurement gap. And until it closes, AI ROI will keep looking worse than it should.

Coding Agents Write the Code. Who Verifies It Works? We Built the Answer.

Coding agents are good at reading a spec and producing code. But producing code is one step in a longer process. The real loop is Spec -> Code -> Deploy -> Test -> Verify -> Ship. Agents stop at step two. Romaric founded Qovery to make Kubernetes accessible to every engineering team. He writes about platform strategy, developer experience, and the future of cloud infrastructure.

Enforce Artifact Governance with OPA Policy-as-Code | Harness Artifact Registry

Artifact governance should not depend on manual checks. But for many teams, container images, software packages, and open-source dependencies are imported into registries from multiple internal and external sources. Without automated guardrails, vulnerable images, untrusted packages, end-of-life dependencies, or non-compliant artifacts can reach developers and delivery pipelines.

Who really controls your data?

Digital sovereignty has moved from buzzword to boardroom priority. But most organisations are still asking the wrong question. Civo CEO Mark Boost cuts through the noise. Digital sovereignty isn't about marketing; it's about jurisdiction, accountability, and operational certainty. And it starts with where your data is hosted and how it's processed. Civo's UK Sovereign Cloud delivers public cloud, private cloud, and AI services, hosted and operated exclusively within the United Kingdom, under UK legal authority, with no exposure to foreign control.

Why Security Teams Spend So Much Time Reconciling Data

Security teams today are managing growing volumes of cybersecurity data across increasingly complex environments. This blog explores the hidden operational cost of disconnected tools, manual data reconciliation, and fragmented reporting, and how Teneo’s Cyber Asset Attack Surface Management (CAASM), powered by ThreatAware, helps organizations create a more unified and trusted view across their security estate. Most organizations are not short of security tools.

Why Most Organizations Still Don't Know What's Protected

Organizations invest heavily in cybersecurity tools, yet many still struggle to confidently understand what is actually protected across their environment. This blog explores how disconnected systems, unknown assets, and inconsistent data create blind spots, and how Teneo’s Cyber Asset Attack Surface Management (CAASM), powered by ThreatAware, helps organizations gain a trusted view of security coverage.

What is RDMA over Converged Ethernet (RoCE)?

Previous articles walked through RDMA (Remote Direct Memory Access) as a programming model and InfiniBand as the fabric that was built around it. Both led to the same conclusion, even if it was never stated outright: moving data, not compute, becomes the bottleneck once systems scale. So what happens when you want RDMA, but you’re already running an Ethernet network you’re not keen to replace? That’s usually where RDMA over Converged Ethernet (RoCE) enters the conversation.

From Commit to Approval, Without Leaving VS Code | Harness Blog

The Harness VS Code Extension is now on the Marketplace. Monitor pipelines, debug logs, approve deployments, and query failures with Claude Code, Copilot, or Cursor, without leaving VS Code. Your Harness pipelines, logs, and deployment approvals are now a sidebar panel away inside VS Code. The Harness VS Code Extension is live on the VS Code Marketplace today, no.vsix download, no manual install.

Azure Deployment Strategies & CI/CD Best Practices | Harness Blog

‍ Learn how to master Azure deployment with CI/CD pipelines, progressive delivery, and feature flags. See how Harness helps engineering teams ship faster and safer on Azure. Azure deployment sounds straightforward. Push code, it runs in the cloud. But if you've managed a 2 a.m. production incident because a deployment went sideways on AKS, you know the gap between "it deploys" and "it deploys safely at scale" is significant.

Shipped: Counting tokens isn't enough. Start connecting them to outcomes.

You’re funding AI across four billing relationships – Anthropic direct, OpenAI, Claude through Bedrock, Claude through Vertex – and the spend climbs every month. When your CEO asks what it’s producing, you have a number and no answer. Not which product it built, which customer it served, or which bet it’s paying off. And you’re being asked to approve more of it.

From Visibility to Real Savings: Turning FinOps Insights into Measurable Cost Reduction

FinOps programs are maturing, and most organizations have better visibility into cloud spend than ever before. Dashboards are full of data. And yet costs keep climbing. The problem isn’t the data. It’s the gap between knowing where the waste is and actually eliminating it. In this joint session, Tangoe and Kubex come together to bridge that gap. Tangoe brings deep expertise in spend management and FinOps discipline, while Kubex delivers infrastructure-level optimization across cloud, Kubernetes, and the AI and GPU workloads that are rapidly becoming the next frontier of cost pressure.

Beneath the Stack: A Software Engineer's Journey into Infrastructure

A software engineer's hands-on journey building a private cloud on bare-metal: Incus clustering, K3s, OVN networking, the Gateway API, and everything that breaks along the way — and what it taught them about why platforms like Qovery exist. Antoine is a senior software engineer at Qovery. He writes about hands-on infrastructure engineering, Kubernetes internals, and the realities of running production systems.

A practical guide to standardizing app delivery without rebuilding everything internally

Standardize the route from code to production. Everything else is a team decision, not a platform problem. Most app delivery problems do not start with bad engineering. They start with too much variation. One team provisions environments manually. Another keeps deployment notes in a wiki. A third has a staging setup that only one engineer understands. Security reviews happen late because the platform does not make the safe path obvious.

Top Considerations When Evaluating DCIM Vendors

Choosing a Data Center Infrastructure Management (DCIM) platform is one of the more consequential decisions a data center team will make. Get it right, and you gain an accurate digital twin of your physical infrastructure, a single source of truth across teams, improved operational visibility, and a platform for planning, reporting, and automation. Get it wrong, and you risk a failed deployment, a platform that doesn't fit your needs, or a shelfware investment that's hard to justify renewing.

The Future of FinOps: Engineering, Applications & Cloud Cost Accountability

In this episode of the FinOps on Azure Podcast, Michael Stephenson is joined by Ben DeBow, Founder and CEO of Fortified, to discuss the next evolution of FinOps and why cloud cost management needs to move beyond dashboards, reporting, and allocation. Ben shares insights from years of helping enterprises optimize cloud spend and explains why the biggest savings opportunities are often hidden inside applications, workloads, and engineering decisions—not infrastructure.

Certificate lineage: the concept your tools already use but nobody named

The word “certificate” means too many different things. When someone says “the certificate for example.com,” they might mean the public key the CA signed. They might mean the key-pair sitting on the filesystem. They might mean the signature that expires in 47 days. Or they might mean all the things together, that you’ve been renewing for the last 10 years. That last one doesn’t have a name in any PKI standard. And it should.

Why More SysAdmins Are Moving to aaPanel in 2026

Server management doesn't look like it did five years ago everything's moving fast, and sysadmins are under more pressure than ever to keep things smooth without blowing budgets or eating up resources. Lately, one name keeps popping up across every forum and tech chat: aaPanel. People who spent years with the same old paid panels are jumping ship. I'll break down exactly why that's happening-and why you might want to join them.

Validate Spring Boot Upgrades with Traffic Replay

Spring Boot version upgrades—whether moving from 2.x to 3.x, 3.x to 4.x, or even minor bumps like 3.2.5 to 3.3.1—regularly introduce subtle, breaking changes that unit and integration tests miss. JSON serialization shifts, autoconfiguration reordering, and transitive dependency conflicts can silently alter your API contract.

Sovereign GPU cloud: Data residency across training, inference, and model weights

Sovereign cloud conversations usually center on where customer data sits at rest. The provider points at a UK data center, the contract gets signed, and procurement marks the box. For most workloads, that's a defensible position. For GPU workloads, it isn't.

GPU cloud for AI inference in production: How infrastructure requirements change after training

Training a model is a project with an end date. Inference is what happens for the rest of the model's working life. The two workloads share GPUs, frameworks, and a lot of vocabulary, but the infrastructure decisions that make sense during training are usually the wrong ones in production. Teams that treat inference as "training, but smaller" tend to discover the gap somewhere around their first traffic spike.

TikTok Challenges: Trend or Danger? What Every Parent and Teen Should Know

Everyone seems to be doing TikTok challenges—but are they always harmless? From positive movements like the Ice Bucket Challenge to risky viral trends that have led to serious injuries, social media challenges can influence how teens think, behave, and seek attention online In this video, we'll explore: 0:00 Why teens are drawn to TikTok challenges & The hidden pressure of fitting in and going viral.

AI Agent Governance: The Missing Piece of Autonomous IT

AI agents are making decisions, accessing systems, and resolving issues autonomously. But as organizations deploy more agents, one challenge becomes impossible to ignore: governance. Who has access? What changed? Who is accountable? The future of Autonomous IT requires autonomy with accountability.

A package manager for AI assets (and why the lock file is per-user)

Sometime in the last two years your repos quietly filled up with a new category of file. Not code, not config exactly: prompts. A.claude/skills/ directory here. A.cursor/rules/ folder there. A CLAUDE.md at the root, an AGENTS.md next to it, a.mcp.json listing the servers your agent is allowed to call. These are the things that make a coding agent useful on your codebase, and they're sprawling.

MongoDB Changelog Automation Explained: How Harness Database DevOps Works

Managing MongoDB database changes shouldn't require manually creating and maintaining changelogs. In this video, you'll learn how Harness Database DevOps automatically generates MongoDB changelogs, helping teams capture existing database changes and bring them into version control for reliable CI/CD workflows. As a modern **database schema migration tool**, Harness Database DevOps helps teams automate database change management across relational and NoSQL databases, reducing manual effort and deployment risk.

NVIDIA Approach for Achieving ASIL B Qualified Linux | Ubuntu Summit 26.04

Can a general purpose, open source operating system like Linux be deployed in safety-critical products? Can it achieve certifications to standards like ISO 26262? This question has become increasingly common in recent years. In this talk, Bryan provides a safety integrity qualification approach for Linux. It is composed of Linux Kernel, user space libraries (like libc) and user-space components (like init processes), up to ASIL B according to ISO 26262:2018.

Agentic validation needs different infrastructure

Previously, I described some core approaches to validating agent written code: feedforward and feedback techniques. Feedforward techniques are about avoiding errors up front, for example by coming up with better prompts and planning strategies. Feedback gives agents a signal that they have actually achieved a task. Feedback is a key part of common agentic patterns like Ralph loops or the /goal commands in Codex and Claude Code: keep working until some known condition passes.

Protecting against HTTP/2 Bomb vulnerability (CVE-2026-49975) with HAProxy

On June 2, 2026, security researchers disclosed a remote denial-of-service (DoS) exploit named the HTTP/2 Bomb. This flaw allows unauthenticated remote attackers to rapidly exhaust server memory, rendering major web servers inaccessible.

5 questions you should be asking about cloud dependency

Cloud infrastructure has become the backbone of modern business operations. But as organizations deepen their reliance on cloud providers, a critical question often goes unasked: just how dependent are we, and at what cost? For years, the cloud adoption narrative focused on agility, scalability, and cost efficiency. Those benefits remain real. But the landscape is shifting.

Network Device Monitoring: Topology Maps and NetFlow

Most teams run one tool for SNMP polling, another for topology, and a third for flow analysis, then spend their time stitching the views together. This webinar shows how Netdata brings all three into a single dashboard, with 100+ vendor profiles out of the box, automatic Layer 2 topology mapping, and a flow collector that auto-detects NetFlow, IPFIX, and sFlow on a single port.

Shai-Hulud Miasma: Inside the Compromise of Red Hat's Packages | Harness Blog

The Shai-Hulud lineage has a new face. On June 1, 2026, security teams independently flagged a fresh supply chain compromise inside the @redhat-cloud-services npm namespace. 32 packages and 96 versions were all republished with a credential-stealing worm. These aren't typosquats. They are the official packages in a trusted scope, pulling somewhere 80,000-117,000 average weekly downloads.

Checklist: how to reduce environment drift without slowing devs or AI agents

Environment drift persists when teams standardize code but leave infrastructure, data, and access decisions to individual teams and manual setup. Most teams know their environments are not identical. What they underestimate is how quietly the gap widens. A database version is out of sync between production and staging; an environment variable is added manually to one server but never tracked; a cron job runs in production but was never captured in the dev config.

Bridging AI and Infrastructure: Introducing the Megaport MCP Server for Agentic Networking

Discover the Megaport MCP Server and how it enables AI-powered, agentic networking through natural language access to network infrastructure. By Miwa Fujii, Community Manager - Terraform and Ryan Tucker, Solutions Architect In the cloud networking era, we’ve moved from manual configurations in the Portal to Infrastructure as Code (IaC), Terraform. But the next frontier isn’t just code, it’s intelligence. We are pleased to announce the release of the Megaport MCP Server (Open Beta).

Beyond tokens per watt - using Ubuntu 26.04 LTS for AI

Tokens per watt (TpW) – the measure of useful AI work produced per watt of energy consumed – is the metric at top of mind for CEOs, heads of AI, and infrastructure teams alike. With the tremendous cost of GPU clusters, extracting as much value as possible from the expense is critical. But in the pursuit of tokens, it’s important to remember that hardware efficiency isn’t the only factor influencing data center operating costs, or the output of useful, revenue-generating AI work.

[Webinar] Building Regulated Infrastructure: How Lucis Standardized Security for Global Care

In Healthtech, downtime is more than a loss of revenue, it is a disruption to patient care. Whether supporting digital health platforms or AI-driven healthcare applications, infrastructure must remain secure, compliant, and highly available. Join Lucis and Qovery for a technical breakdown of building compliant and secure infrastructure that scales AI and healthcare workloads, handles traffic peaks, and maintains SOC 2, HDS, and HIPAA standards.

Enforce your team's database standards automatically with Custom Policy Checks in Redgate Flyway Enterprise

Every engineering team has a list of “things we don’t do”. No TRUNCATE TABLE in production. Every audit table must end in _audit. Foreign keys follow a naming convention. But until now, enforcing those standards has meant relying on pull request checklists, tribal knowledge, or a separate linting tool bolted onto the pipeline.

ADO.NET SQL Connection Providers for SQL Server Compared

Selecting an ADO.NET provider may seem like a simple, one-time decision, but it can affect performance, compatibility, and long-term maintainability for years. The provider sits between your application and SQL Server and affects everything from connection management and authentication to support for new database features. Also, today, SQL Server 2025 delivers new cloud-optimized features and modern security and System.Data.SqlClient has become legacy software.

The AI Code Explosion: Why Your Mocking Strategy is Breaking Down

The rise of AI-assisted coding has transformed how software is built. With tools generating entire features in seconds, the bottleneck is no longer writing code—it’s verifying it. Because AI can generate boilerplate and handle API integrations instantly, more service changes are being pushed into authentication logic, API calls, and configurations. Teams desperately need a way to verify these changes before merging, especially when the code touches external dependencies.

Run CI Tests Without Pushing: Microbuilds with Chunk sidecars

AI coding agents write code faster than your pipeline can catch mistakes. What if the agent could validate against CI before you ever push? In this 5-minute demo, we set up CircleCI's Chunk CLI and run a microbuild using Chunk sidecars, secure Linux microVMs that spin up in ~1 second in your CircleCI account, mirror your working directory (no git push required), and give your agent CI-grade feedback while it's still in context.

Should platform, SRE, and security merge into one function?

Platform, SRE, and security are three distinct functions in modern engineering orgs, each shaped by a different problem. SRE was the operations function's answer to scale: how to keep systems reliable when the systems get big. Platform answered a different problem: how to let developers ship without becoming infrastructure experts. Security drew the line on what could safely reach production.

Why Observability Is Essential for Platform Engineers?

Observability is how platform teams stop being the answer to every question and start building platforms that answer those questions themselves. This article explains specifically how observability enables platform engineers to support development teams better which reducing ticket volume, cutting MTTR, enabling SLO ownership, and making microservice debugging something devs can do without escalating to you.

UK GDPR compliance for cloud and hosting: requirements, risks and responsibilities

UK organisations using cloud services carry a clear legal obligation: they must demonstrate compliance with UK GDPR and the Data Protection Act 2018, not simply assert it. The shift to cloud and hosted infrastructure does not transfer that responsibility to a provider. It distributes it across a chain of controllers and processors that regulators expect you to understand and manage. Post-Brexit, that obligation is set within a distinct legal framework.

What Is IPoDWDM? A Guide to Converged IP and Optical Networking

IP over Dense Wavelength Division Multiplexing (IPoDWDM) is a network architecture that integrates optical transmission capabilities directly into IP networking equipment such as routers and switches. This approach represents a significant evolution from traditional network designs, where IP and optical layers were managed as separate domains with distinct hardware and operational teams.

Shipped: Keep your cost allocation logic out of the wrong hands

CostFormation is how your organization models cost allocation. As more teams adopt it, protecting that logic matters. RBAC for CostFormation Namespaces lets you scope access at the namespace level, so the right people can view and edit Dimensions, and everyone else can’t.

A look into Ubuntu Core 26: Deploying AI models on Renesas RZ/V series for production

Welcome to this blog series which explores innovative uses of Ubuntu Core. Throughout this series, Canonical’s Engineers will show what you can build with our releases, highlighting the features and tools available to you. In this blog, Asa Mirzaieva, engineer from the Silicon Alliances team, will show you how to deploy optimised AI models on Renesas RZ/V series hardware using the Dynamically Reconfigurable Processor for AI (DRP-AI).

Variable Sharing and Dynamic Step Conditions | Bitbucket Blitz | Atlassian

Bitbucket Pipelines lets you invoke child pipelines from a parent step, but until now there was no way to pass information between them. Variable sharing changes that. You can define variables in a parent step and pass them directly to child pipelines as custom pipeline variables. With dynamic step conditions, those child pipelines can make decisions at runtime based on the values they receive, like skipping a deployment when a security scan detects critical vulnerabilities.

Prevent container image overwrites with immutable tags in Bitbucket Packages

We’re excited to announce that immutable tags are now available for the Bitbucket Packages container registry. With immutable tags, workspace admins can set container image tags from being overwritten, moved, or modified after they’re first pushed.

Testing AI Code is a Security Nightmare? #Speedscale #DevOps #Kubernetes #AICoding #SoftwareTesting

AI can write a feature in seconds, but where are you testing it? Sending production traffic, API payloads, and auth headers to a third-party SaaS is a massive security risk. In this video, we break down why the Bring Your Own Cloud (BYOC) model is the ultimate fix for DevSecOps. Learn how to safely test AI-generated code against real production traffic entirely within your own VPC or Kubernetes cluster. No data leaks, no massive DLP pipelines, and no endless masking rules.

Minga cut infra costs 30-40% - and it scales itself | Control Plane

Minga checks in 1.5 million students across the eastern seaboard by 8:30 AM Eastern — then lets that infrastructure wind down an hour later. After migrating to Control Plane, they cut infrastructure costs 30–40% and traded fragile, manual scaling for a platform that scales itself.

Configure Ubuntu with YAML | Ubuntu Summit 26.04

Learn how to configure Ubuntu at launch using declarative, idempotent instructions stored in a version-controlled YAML file. In this talk, Rajan explains how this approach minimizes arbitrary commands, reduces risks of command injection and privilege escalation, and ensures validation and error handling. This is relevant on major public and private clouds, and virtualization solutions ranging from VMware, WSL, LXD, Multipass, Proxmox, and more.

Megaport Storage Marks the Next Step Toward Automated Infrastructure at Scale

Built as a globally distributed storage platform integrated directly into the Megaport backbone and co-located with Latitude.sh compute infrastructure, Megaport Storage simplifies how organizations store, move, and access data across distributed environments with a unified infrastructure experience spanning compute, network, and storage. For years, enterprise infrastructure has moved toward abstraction. Compute became elastic. Networks became software-defined.

uPKI: improving certificate revocation on Linux | Ubuntu Summit 26.04

What is uPKI? While web browsers automatically check if an HTTPS certificate has been revoked, other Linux command-line tools and applications usually skip this check. That leaves applications vulnerable to compromised or misissued certificates many months after this is discovered. In their talk, Joe Birr-Pixton and Dirkjan Ochtman will be introducing uPKI: a new effort to bring browser-grade certificate infrastructure to Linux. This effort is funded by Canonical, engineered by the maintainers of rustls, and builds on foundational work from Mozilla.

AI inference vs. training: What they are and how they differ

AI inference and training are terms you'd run into if you have been around software engineering or even just scrolled through the news. Both are integral to delivering the AI-powered experiences we have come to expect from many of the applications we use daily. According to McKinsey, by 2030 inference will overtake training as the dominant workload in AI data centers, making up more than half of all AI compute and roughly 30-40% of total data center demand.

GitHub Copilot Price Hike Developers Outraged! V2

What used to be $50 a month is now $3,000 — overnight. Microsoft just moved GitHub Copilot to token-based billing, and devs are split between calling it a "rug pull" and admitting someone always had to pay the bill. Here's the part that should worry every engineering leader: most can't tell you what percentage of their AI-generated code actually ships, or where the tokens went. When the meter is running on every prompt, "it feels productive" isn't good enough — you need to know that bug cost you $2,700 in tokens to fix.

Detecting Data Masking Gaps in a CI Pipeline | The Tony and Tonie show Ep46

Your schema changed. Did your masking rules keep up? Here’s how Flyway and Test Data Manager can catch gaps and prevent PII exposure in dev and test. Tony and Tonie discuss how Flyway and Redgate Test Data Manager can work together in a CI pipeline to detect schema changes that introduce unmasked sensitive columns, helping teams keep production-derived test data protected as the database evolves.

CloudZero AI Hub: The nexus of autonomous AI cost control

CloudZero originated as a way to make sense of your cloud costs. Costs spread across bills with billions of line items belonging to resources that might or might not have been tagged (or taggable), spun up by engineers working across teams, on different microservices, features, and products, that served a wide range of customers. Kubernetes. Multi-cloud. Check, check, check.

AI ROI: How to measure and provide the return on AI investments in 2026

Every quarter, the same scene plays out in boardrooms across the Fortune 500. The CEO asks: “What is the return on everything the company is spending on AI?” The CTO talks about productivity gains and developer velocity. The CFO points at a cloud bill that doubled but cannot isolate which line items are AI. The board nods politely and tables the discussion until next quarter, when the same question will produce the same non-answer. (If this sounds familiar, you are not alone. Keep reading.)

NVIDIA Earth-2: OSS and Science for AI Weather and Climate | Ubuntu Summit 26.04

Discover how NVIDIA Earth-2 brings open source software and open science to weather and climate forecasting. Niall Robinson (NVIDIA) introduces a new way of making production-ready weather AI fully accessible for organizations to run, fine-tune, and deploy on their own infrastructure: NVIDIA Earth-2.

Level up your Code on Arm and Ubuntu | Ubuntu Summit 26.04

What are the latest developments in Arm tooling on Ubuntu? In this talk, David explores Arm tooling to analyze and optimize workload performance, and how AI-assisted development using agentic AI and static analysis can accelerate porting and tuning applications for the Arm architecture. About David David Haikney is a Technical Product Director at Arm. He is responsible for Arm Performix, a free performance toolkit that helps developers understand and improve real-world performance on Arm architectures.

Improving Digital Employee Experience with Intelligent Automation | Reduce IT Tickets by 70%

Are your employees still waiting hours or days for IT issues to be resolved? Many organizations have invested heavily in ITSM platforms, self-service portals, and chatbots. Yet service desks continue to struggle with growing ticket volumes, rising costs, slow resolution times, and poor digital employee experiences. In this webinar, Resolve and Redington explore why traditional service management approaches often stop at ticket creation instead of ticket resolution, and how intelligent automation is helping enterprises move toward a Zero Ticket IT model.

How IT Teams Can Start Their AI Automation Journey | Agentic AI, ITSM & Zero Ticket IT

How should IT leaders approach automation and AI? Where should they start, and how can they drive measurable results without getting caught up in the hype? In this episode of Agents of IT, Fran Fernandez and Zach Austin sit down with Chris Ellis, Senior Technology Solutions Specialist at RICOne, to discuss practical IT automation strategies, agentic AI, service desk transformation, and the journey toward autonomous operations.

Premium self-hosted runners are generally available

In December, we shared our plans to introduce pricing for self-hosted runners. You told us loud and clear that a free option matters. Today, as Premium Runners become generally available, we are happy to share that we will continue to have a free tier, which includes the use of up to 100 self-hosted runners as part of your plan. If your team needs more scale, dedicated support, or advanced management features, you can upgrade to Premium Runners when you’re ready.

Best APM for Small Teams Without Dedicated DevOps in 2026

You don’t have an SRE. There’s no platform team. Your “monitoring strategy” is someone checking Slack for error alerts. When production breaks, the same two or three senior devs drop everything to debug. Sound familiar? Most APM tools are built for organizations with dedicated operations staff. They assume someone has time to configure dashboards, tune alert thresholds, and learn a complex query language. That person does not exist on your team.

Get Ship Done: Everything We Shipped in May 2026 | Harness Blog

AI coding tools promise faster development. What they don't show you is the queue forming at the pipeline, the security scanner you bypassed to stay fast, or the cost dashboard with a line now labeled "unknown" that is steadily growing. In May, we shipped 60+ features in 31 days across the entire delivery system: not just the editor, but everything downstream of it.

AI Dev Tools: What 100K Engineers at Google Really Taught Us

AI developer productivity, agentic workflows, and the lessons learned running engineering tools for 100,000+ software engineers at Google. John Montgomery, CCO at GitKraken, sits down with Asim Hussain, co-founder of Alterion AI and former Google VP of Engineering Productivity, to get real about what AI actually changes for engineering teams in 2025.

10 Enterprise AI Infrastructure Voices Worth Following

Enterprise AI has crossed an inflection point. The model problem is largely covered. What remains unsolved is the operational impact: how to run AI inference and agentic processes continuously, reliably, and at a cost that doesn’t cancel out the value. Most enterprises are discovering this the hard way. GPU utilization dashboards show 80%. Actual compute efficiency is half that. Token demand is compounding at 200-500% annually as agents multiply every action into dozens of model calls.

How to ship a POC in an afternoon: a Claude Code and Upsun walkthrough for product and product marketing

I have an Upsun project that's nothing but proofs of concept. It's a dashboard, basically. Each POC gets its own tile. Click in, and you land on a page with three tabs. The first tab is a written explanation of what the POC argues. The second tab is the POC itself, with a built-in demo that automates a walkthrough of the feature so the recipient can watch it run without me on the call.

Running AI at Enterprise Scale w/ Anthropic, Descope, Port, Rootly and Twingate

The debate about whether AI can write production code is over. Companies are handing work to fleets of agents, and for many, they write most of the code that ships to production. The next challenge is everything that happens once an entire engineering organization runs this way, at full speed. Teams that generate code 10x faster still review it at human speed, and that mismatch is now the constraint. Code ownership is also becoming an issue, as developers learn to trust agentic processes a little too much. When an agent breaks production, who is responsible?

RISC-V profiles - why is RVA23 significant?

One of the important offerings of the RISC-V Instruction Set Architecture (ISA) is the ability to customize and extend the base instruction set. An initial reaction to hearing this is often to worry about software portability and compatibility, since if every RISC-V CPU offers a slightly different set of instructions, software won’t be portable.

What is Azure Cost Management? Complete Beginner's Guide (2026)

What is Azure Cost Management and how can it help you control cloud spending? In this complete beginner's guide, Michael Stephenson explains the fundamentals of Azure Cost Management and walks through the core features available in Azure. You'll learn how Azure organizes billing accounts, subscriptions, resource groups, and management groups, and how these structures affect cost tracking and reporting.

Announcing HAProxy 3.4

HAProxy 3.4 is a milestone release that significantly advances HAProxy’s legendary flexibility, performance, security, reliability, and observability. Dynamic backend management simplifies integration with modern architectures, memory efficiency improves across a broader range of workloads, native cryptographic operations at the proxy layer open new possibilities for API security architectures, and OpenTelemetry support makes HAProxy a first-class participant in distributed tracing pipelines.

21 AI concepts every beginner should know before their first interview

If you’re prepping for your first AI or MLOps interview, the hardest part usually isn’t always the hands-on element. For me, it’s the vocabulary. Interviewers sometimes lob single-word concepts at you (“what’s quantization?”) and watch how far you can carry the thread. The questions sound clear-cut, but each one is really a doorway into a bigger topic, and the interviewer is judging how cleanly you walk through it.

The Bug Hiding in Your Production Traffic

Your logs showed 500 errors. The traces showed the dependency graph. Neither showed the actual bug, a DEL control character getting appended to the query string. This is how I found it. In this video I walk through Speedscale BYOC (bring your own cloud): capture real production traffic, store it in your own Elasticsearch cluster inside your VPC, pull it down locally with a single script, and reproduce the exact bug using proxymock. The data never leaves your environment.

IBM Think 2026 Infrastructure Insights for IT Leaders

IBM Think 2026 made one thing clear: infrastructure leaders are being asked to support more AI, more automation, and faster decision-making without adding unnecessary complexity or risk. Held earlier this month in Boston, IBM Think 2026 focused heavily on enterprise AI, hybrid cloud, automation, governance, and operational transformation.

Logs told me something broke. Traffic showed me what.

Here’s a problem I run into constantly: something breaks in production, I can see the 500 errors in my logs, but I can’t reproduce it locally. The trace shows me the dependency graph but not the actual request that failed. This is especially painful in microservices. I was looking at a CNCF example the other day (a simple demo app, like 4 pods) and it already had so many cross-service dependencies that understanding what broke required looking at the whole system at once.

Claude Opus 4.8: Pricing, benchmarks, and which model to actually run

Anthropic shipped Claude Opus 4.8 on May 28, 2026, exactly 41 days after Opus 4.7. The SERP was empty for two days after launch. Not because nobody cared. Because engineering managers and finance teams were doing the math on whether the bill changes.

The AI ROI Company's new groove: CloudZero's new UI, and what it means for customers

Customizability. Feature velocity. Performance. Capabilities that are critically important to all B2B software users. And capabilities in which CloudZero’s brand-new platform specializes. Pitching a total frontend overhaul didn’t necessarily make me CloudZero’s most popular new PM. But it’s made CloudZero faster, more customizable for a wider range of personas, and easier to update with the new features that matter most to our customers. And, if I may say, it also looks beautiful.

Atlassian Transforms Product Development with AI

What used to take months now takes weeks, and it’s changing what it means to build great products. At Atlassian, product managers and designers are using Rovo and Jira Product Discovery to move faster at every stage of the development lifecycle. From running deep research across all their tools and documents, to capturing ideas, surfacing insights, and prioritizing what to build next. AI is transforming how product decisions get made.

Agent governance starts with the service catalog you already run

Last month, an AI agent running inside Cursor wiped PocketOS's entire production database, including its backups, in roughly nine seconds. The agent found an API token in an unrelated file, originally created for managing custom domains, and used that token to execute the deletion. The backups sat inside the same blast radius as the database the agent was operating against. Nine months earlier, a Replit AI agent had done the same thing to a SaaStr database during a designated code freeze.

AI Spend Hit $297B. Nobody Knows Where It Goes.

AI spend doubled to $297B in two years — and most companies can't tell you what any of it shipped. Token spend is disconnected from outcomes on the dev side. Agents in production? The invoice is the only signal. Harness Cloud & AI Cost Management (CACM) gives teams unit economics at the inference level, cross-provider visibility across OpenAI, Anthropic, Bedrock, and Vertex AI, and request-level attribution to the agent, session, or workflow that triggered the spend.

Blackwell sold out in weeks. Here's what Rubin demand will look like.

"Blackwell sales are off the charts, and cloud GPUs are sold out. Compute demand keeps accelerating and compounding across training and inference, each growing exponentially. We've entered the virtuous cycle of AI." Jensen Huang, CEO, NVIDIA When NVIDIA's CEO makes that statement in a quarterly earnings release, it is not marketing language.

What is InfiniBand?

When distributed workloads stall because nodes cannot exchange small messages quickly and consistently, the network is the limiting factor. How do you solve that problem? InfiniBand offers one solution. InfiniBand is an interconnect, meaning the end-to-end communication system that links compute, storage, and accelerator nodes. It is implemented as a purpose-built network fabric, the switching and transport layer engineered to deliver high bandwidth and low, predictable latency between those nodes.

Code isn't cheap, but POCs are

I keep hearing the phrase "code is cheap." I don't know who came up with it. Whoever it was clearly has not seen an Anthropic bill. I get what they mean. The cost of writing a line of code has cratered, AI does most of the typing, you know the rest. Fine. But the phrase is combative in a way that doesn't help anyone, especially the engineers in the room. "Code is accessible" lands better. Less swagger, more honesty. Either way, here's the line my friend Guillaume gave me that finally cracked it open.

Massive Open Source Success: A Step-By-Step Guide | Ubuntu Summit 26.04

Not all open source projects gain traction -- but a few become movements. In this talk, Nariman, Founder of Puter, shares what actually separates the two, based on his experience of growing Puter to 40K+ stars, gaining hundreds of contributors, and over 500K installations. He breaks down how to gain momentum from a project's foundation, attract contributors, and design projects that capture the imagination.

How to deploy Canonical Managed Kubeflow on Microsoft Azure?

Learn how to deploy Canonical Managed Kubeflow on Microsoft Azure step by step. Canonical's Managed Kubeflow on Azure gives enterprise and startup AI teams a fully operational, open source MLOps platform in under an hour. It is managed 24/7 by Canonical's engineers. This means you can focus entirely on building models rather than running infrastructure.

What High-Performing DevOps Teams Get Right About Cloud Security

Most DevOps teams understand that cloud security matters, but the gap between understanding the problem and operationalizing it effectively remains fairly large. Cloud environments move quickly, infrastructure changes constantly, and teams are under pressure to deploy faster without creating unnecessary friction inside development pipelines.

Optimizing Server Bandwidth and CDN Routing for High-Performance IPTV Networks

When a stream plays instantly without buffering, most users don't think twice. But behind that smooth playback lies a carefully tuned system of servers, bandwidth layers, and global routing paths working in sync like an invisible orchestra. Modern IPTV platforms such as the best IPTV in Canada depend heavily on this silent engineering layer, where even a small inefficiency can turn a perfect stream into a frustrating experience.

Your AI agent is fixing the wrong service

Everyone wants an AI agent factory in 2026. Autonomous agents fixing bugs and shipping features while you sleep. I’ve been building toward that myself. But the error rates don’t support the fantasy. The best AI coding agents in the world fix about 50% of real bugs on SWE-bench verified. Half the time they fail. And AI-generated code produces 1.7x more issues than human-written code.

Microsoft DNS management in OpUtils: One console for complete control

For network administrators, managing DNS has traditionally meant juggling zones and records across separate server interfaces, manually tracking changes, and responding to resolution failures after they’ve already caused disruption. We’re excited to introduce Microsoft DNS management in ManageEngine OpUtils, bringing DNS zone and record administration directly into the same console you already use for IP address management (IPAM).

Software Delivery Context, Now Inside Claude | Harness Blog

Key Takeaway: The Harness MCP Server is now in the official Claude Connectors Directory. Developers using Claude can now discover and connect to Harness, gaining structured, real-time access to their pipelines, deployments, approvals, and delivery workflows. What makes this different from a typical API integration is what's underneath: the Harness Software Delivery Knowledge Graph, which gives Claude the context it needs to make decisions that are accurate, fast, and safe. ‍

Apple doesn't care who signed your certificate

The pitch for private PKI gets more compelling every year. Public certificate lifetimes are down to 200 days, dropping to 47 by 2029. If you run your own private certificate authority, you make your own rules. Issue certificates for as long as you want, skip the renewal churn. Let’s Encrypt and DigiCert don’t get to tell you what to do. Apple does though.

How platform standardization will help you deliver on your KPIs

IT leaders rarely think they have an infrastructure problem. When a roadmap slips or an audit finding lands, the reflex is to hire more senior engineers, a bigger platform team, another DevOps lead. But headcount is rarely the real lever. The bottleneck is the "hidden factory": the undocumented, invisible work that sits between a developer writing code and that code reaching customers. It doesn't show up in post-mortems because engineers treat the workarounds as normal.

Introducing Cycle's European Control Plane: Strict data sovereignty, lower latencies, and more

We're thrilled to announce that Cycle's European Control Plane is now live! While a few organizations have been utilizing it over the past month, we're eager to officially open access to all teams. Before diving deeper into the "why," let's clarify what a Cycle Control Plane actually is. If you visit our status page, you'll see a list of the core services powering Cycle. These services include everything from our APIs to our 'factory' build systems.

15 DevOps Metrics Every Engineering Team Should Track in 2026

Software moves from code to production more quickly today, but it is still difficult to tell whether delivery is actually improving or just becoming more active. Most teams rely on dashboards filled with metrics like deployments, uptime, failures, and tickets. The numbers are available, but the meaning behind them is often unclear. DevOps metrics become useful only when grouped into clear categories: DORA metrics cover only delivery speed and stability, which is just part of the picture.

How Canonical Support solves hard Linux performance bugs - even in 12-year old code

Some support cases are straightforward. Others lead deep into legacy code, where a single logic bug can quietly turn a routine command into a major performance problem. This series looks at how Canonical Support and Sustaining Engineering work together to investigate, patch, and upstream difficult issues that standard troubleshooting alone cannot solve.

Scaling Your App

Every application starts the same way: One server. One database. One optimistic engineer saying: “We’ll scale later.” And honestly? That’s usually the right call. Premature scaling is how perfectly normal applications end up with: But eventually, growth happens. Traffic increases. Queries slow down. Deployments get riskier. Your infrastructure starts making unfamiliar noises. This is where scaling enters the picture. Not scaling for conference talks.

AI ROI is an allocation problem

AI spend is going parabolic, and the labels on the bill (OpenAI, Anthropic, Gemini) are about all a CXO gets to work with. The hard part of tying that spend to outcomes is structural. A major portion of AI spend isn’t COGS. It’s the spend on coding agents producing the software, the spend on building marketing content, the spend on custom sales tooling, the spend on Intercom agents and Sybill analysis.