Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

When AI Becomes the Judge: Understanding "LLM-as-a-Judge"

Imagine building a chatbot or code generator that not only writes answers - but also grades them. In the past, ensuring AI quality meant recruiting human reviewers or using simple metrics (BLEU, ROUGE) that miss nuance. Today, we can leverage Generative AI itself to evaluate its own work. LLM-as-a-Judge means using one Large Language Model (LLM) - like GPT-4.1 or Claude 4 Sonnet/Opus - to assess the outputs of another. Instead of a human grader, we prompt an LLM to ask questions like "Is this answer correct?" or "Is it on-topic?" and return a score or label. This approach is automated, fast, and surprisingly effective.

Beyond AI hype: put reliability at the forefront

Reliability is a constant for every technology, whether it’s cloud, microservices, or AI. Full transcript:  Just a few years ago everybody was screaming about microservices, "That's the wave of the future," and now everybody's looking at AI. No matter what the change in technology hot topic is, your reliability should still be at the forefront of everything that you're doing.

How To Start A FinOps Career: Roles, Skills, Jobs, And Growth Paths

Want to know how to get a job in FinOps? You’re not alone. FinOps careers are rapidly emerging as essential roles in tech, helping companies manage cloud costs without slowing down innovation. These roles sit at the intersection of finance, engineering, and cloud operations. FinOps roles and responsibilities are expanding fast. In this guide, you’ll learn what FinOps professionals do, how to frame your skills for the job, what certifications help, and how to grow your FinOps career over time.

10 Best Live Call Routing Software for Incident Management

I curated a list of the 10 best Live Call Routing software for incident management. To compare them, I created a checklist of essential features. I then read their documentation to see how they stacks up against my checklist. And finally, I encapsulated the results in three tables: If you are new to live call routing, I’ve included a section that covers the basics for you. Let’s get started! Key highlights.

Security Compliance Management Scanning in Puppet Enterprise

In this session Jason and Nelson provide a walkthrough of Security Compliance Enforcement (SCM) scanning in Puppet Enterprise, emphasizing its seamless integration and ability to check compliance against CIS benchmarks. It highlights the compliance dashboard, which offers immediate insights into security status and supports scanning for up to a hundred thousand nodes.

Provision & Deploy Applications in Minutes with Resolve

Automate end-to-end application provisioning with Resolve Actions. In this demo, we walk through how Resolve automates the entire process of provisioning virtual machines (VMs) and deploying applications—starting from a simple request form, all the way to fully configured and monitored servers. You'll see how Resolve.

Autoscaling Made Easy with Rancher Cluster API

Kubernetes has revolutionized application deployment and management. However, manually adjusting cluster sizes to meet fluctuating workloads, without constantly under- or over-provisioning resources, quickly drains platform teams’ time and energy. While traditional cloud provider autoscaling tools are functional, they often fall short when it comes to truly dynamic, Kubernetes-aware scaling, especially in a world with diverse infrastructure.

Is on-prem the top choice to run AI?

‎‎Subscribe. Fuel your curiosity. In this episode, we break down what we’ve learned from teams running AI at scale, and why on-premises infrastructure is making a strong comeback. We’re seeing a shift: performance, cost control, data sovereignty, and platform flexibility are driving conversations about on-prem strategies for AI. No one-size-fits-all answers, but if you’re building or scaling AI, this might help you think a few steps ahead.

Are you running AI the smart way?

Data locality: AI models often rely on large datasets. Locating compute close to the data reduces transfer times and improves training performance. Latency sensitivity: Real-time AI applications, like recommendation systems or edge analytics, depend on low-latency environments. This can be more easily tuned in private or hybrid setups. Hardware specialization: Some AI workloads benefit from custom hardware like GPUs or TPUs. Private cloud allows more control over this, while public cloud offers broader access but less customization.
Sponsored Post

Incident Management Software for 2025: Revolutionizing Efficiency in Crisis Handling

With the growing reliance on technology and complex IT infrastructures, having a robust Incident Management software is no longer a luxury but a necessity. As we step into 2025, organizations are seeking more sophisticated, intuitive, and scalable solutions to streamline their Incident Response Workflows and ensure uninterrupted service delivery.

FinOps Is Not A Side Hustle

When rideshare drivers talk about a “side hustle”, they mean working a few hours on weekends to make extra cash. That’s fine for pocket money, but it’s catastrophic when the “hustle” is controlling your cloud and AI spend. Right now, too many companies run FinOps the way they run the office coffee pot: A volunteer refills it when things look empty.

Collecting and Visualizing Metrics in Puppet Enterprise

This walkthrough covers how to enable and leverage metrics collection in Puppet Enterprise for monitoring and troubleshooting your Puppet infrastructure. Presented by Barr Iserloth and Tony Green learn how to activate metrics logging, integrate with visualization tools like Grafana, and diagnose Puppet Server, PuppetDB, and system-level behaviors with real-time observability.

Rancher Live: What is Developer Advocacy?

Join us for an engaging Rancher live stream hosted by Orlin Vasilev, as we dive into the world of Developer Advocacy—what it really means, why it matters, and how it's evolving in the cloud-native space. Orlin will be joined by two powerhouse guests in the field: Jorge Castro – a community strategist and long-time open source advocate, known for his work with Kubernetes and cloud-native ecosystems. Jorge brings deep insights from years of building developer communities and bridging the gap between engineers and users.

The Second Wave of Private Cloud

Over the past decade, the public cloud became the default way to run software. Its flexibility, on-demand pricing, and global reach made it the obvious choice for many teams. Startups could move fast, and enterprises could avoid long procurement cycles and complex hardware management. As teams gain more experience with cloud infrastructure, unintended consequences start to rear their costly heads. Bills grow quickly and are difficult to predict.

Netdata Overview: All You Need to Know in Under 3 Minutes

In just a few minutes, this walkthrough will show you how to unlock the full power of Netdata during your trial period. From real-time metrics to AI-powered insights, learn how to get immediate value without any guesswork. Whether you're running a Homelab or managing production systems at scale, this video will help you hit the ground running and make every minute of your trial count. Let’s turn your trial into insight, clarity, and control.

9 Best Incident Response Tools (Plus 4 Open-Source Options)

I’ve curated a list of 9 best incident response tools, plus 4 open-source options for you. But first, a quick note: Many people mix up alerting, monitoring, and incident response. Incident response is what you do after receiving an alert. It includes alert acknowledgment, escalations, incident communication, post-incident analysis, and response automation. Yes, some of these (incident communication and post-incident analysis) overlap with incident management.

Kubernetes Is Powerful-But It's Slowing You Down. Here's How to Fix It.

Ask any SRE what slows them down in a Kubernetes incident, and the answer is usually too much information in too many different places. Kubernetes has changed the way we run software. It’s given us incredible flexibility, scalability, and power. But in the years I’ve worked in cloud operations and platform engineering, I’ve also seen how that power comes at a price: complexity.

Developing Modules for Puppet and the Forge in 2025

Since announcing changes to our OSS plans as well as introducing the new licensing starting with PDK 3.5.0, the team has received questions from the community around how the changes will affect them. In this article, we’ll highlight some helpful resources about how you can develop and contribute to modules on the Forge and ensure compatibility with Puppet Core and Puppet Enterprise.

SQS Vs. SNS: Choosing The Right AWS Messaging Service

Picture this. You recently shipped a new feature, and things were working smoothly — until they didn’t. Now, one service is timing out. Another is overloaded. You dig in and realize the issue is with how your systems communicate. Messages are not arriving when or where they should. Your team had set up Amazon SNS for notifications and Amazon SQS for processing tasks. But somewhere along the way, the difference between SQS vs. SNS (and how they’re wired together) got lost in translation.

Factors That Define a Scalable Reseller Hosting Plan

Many entrepreneurs are drawn to reseller hosting as an accessible and profitable business model. As you explore various options, it's important to understand the factors that contribute to a scalable reseller hosting plan. A plan that supports growth must include key elements like performance, flexibility, price, and support. Let's break down these crucial aspects in more detail.

The Tech Behind Europe's Space Missions | Canonical x ESA

‎‎Subscribe. Fuel your curiosity. “Open source software is… the glue for everything that everyone does, from sending an email through to managing critical operations, not just space operations.” The European Space Agency (ESA) runs missions ranging from investigating Earth’s forests, to exploring Jupiter’s moons, to deflecting incoming asteroids.

Reliability is not about mythical perfection

See what reliability means to Ganesh Seetharaman, Managing Director at Deloitte, and why it's more than high uptime. Full transcript:  Reliability to me is not about achieving mythical perfection. It's about embracing complexity, recovering quickly from failures or incidents, and building trust through transparency and adaptability.

DevAIOps: A Call To Action For The Heroes Among Us

The year is 2025, and I’ve been watching teams discover what happens when you give developers AI superpowers without giving them AI super-governance. It’s like the merchandising scene from Spaceballs: “Vibe Coding: The Flamethrower. The kids love this one.” But here’s the thing: I’m not here to take away the flamethrowers. I’m here to hand out fire extinguishers and maybe suggest we practice in a safe room instead of the living room.

Getting started with the relaxAI API: Sovereign, cost-effective AI

Earlier this year, we launched relaxAI, an AI assistant designed with one paramount focus: your privacy. We’re now excited to announce the relaxAI API is in General Availability (GA) offering an OpenAI interface. This gives UK organizations up to 90% cost savings versus leading providers while ensuring data never leaves UK jurisdiction.

Introducing the Cortex MCP Server

Cortex gives engineering teams full visibility and control over their services, from ownership and standards to service history and production readiness. Our goal is to help teams stay aligned and move faster so they are ready for whatever is ahead. The reality for any engineering team is that developers spend the most of their time in their IDE, not their IDP. And while developers love the context Cortex provides, they don’t love context switching.

Smarter Insights and Pipeline Control - New in DataStream

We’re constantly improving DataStream to make security data management simpler, smarter, and more efficient for modern SOCs. This latest update introduces new capabilities that bring even more visibility and flexibility to your telemetry pipelines. Let’s take a closer look at what’s new.

New in OTel: Auto-Instrument Your Apps with the OTel Injector

As distributed systems scale, maintaining manual instrumentation across services quickly becomes unsustainable. The OTel Injector addresses this by automatically attaching OpenTelemetry instrumentation to applications, no code changes needed. This blog covers how the OTel Injector works, how it integrates with Linux environments, and how to set it up for consistent telemetry across your stack.

Why Your Loki Metrics Are Disappearing (And How to Fix It)

Grafana Loki is up and running, log ingestion looks healthy, and dashboards are rendering without issues. But when you query logs from a few weeks ago, the data's missing. This is a recurring problem for many teams using Loki in production: while the system handles short-term log visibility well, it often lacks the retention guarantees developers expect for historical analysis and incident review.

The Digital Sovereignty Revolution: Why Big Tech Is Losing UK Trust

In a world where data is power, UK businesses are demanding control. Civo’s latest whitepaper, The Digital Sovereignty Revolution, uncovers the top challenges UK businesses face in securing true data sovereignty from eroding trust in Big Tech to geopolitical instability. Our latest research, based on a survey of 1,000+ UK IT decision-makers, reveals the extent to which sovereignty is reshaping the UK's tech sector. From the risks of relying on US-based providers to the benefits of multi-cloud strategies, the whitepaper explores key trends and what they mean for UK businesses.

Stop guessing! Speedscale's Notebook finds anything in your traffic.

Debugging complex microservices just got an upgrade. This video demonstrates Speedscale's innovative Notebook capability, allowing you to perform advanced substring searches and filter production traffic based on deeply nested JSON fields within request and response bodies. Unlike traditional observability tools that only record telemetry, Speedscale's always-on recorder captures full traffic payloads, empowering you to precisely pinpoint issues, identify specific user calls, or validate API versions. Streamline your troubleshooting, enhance your testing, and gain unprecedented visibility into your production environment.

Introducing Schedule Rotations: One Schedule, Many Rotations, Total Coverage

When coverage gets complicated, Schedule Rotations keeps it simple. On-call can get real messy, real fast. One minute you’ve got a neat little schedule for the two people rotating primary and secondary. Next thing you know, you’ve got engineers in three time zones, a new hire shadowing incidents, and your “simple” rotation has turned into a board game with no rules. So we fixed it.

How to monitor and manage front-end observability in Blackfire

In this video, we'll guide you through the process of monitoring and managing your usage of front-end observability features in Blackfire. Learn how to access your Browser usage dashboard to view browser traces collected per environment, track your quota consumption, and understand the concept of spike protection. You'll discover how Blackfire's automatic detection of abnormal traffic spikes protects your monthly quota and ensures continuous data collection.

How to Enable and Configure Front-end Observability in Blackfire

In this video, learn how to enable and configure Front-end Observability in Blackfire. The tutorial covers steps to enable features across multiple environments via the Organization settings / Front-end usage in the Blackfire dashboard. Control front-end observability by enabling or disabling Browser Monitoring and Analytics per environment, using a JavaScript probe and a unique browser key. The video emphasizes the importance of naming transactions and explains how to manually add tracking snippets to HTML for better control.

Zero Ticket Video Series: How to Automate Password Resets with Resolve

Struggling with repetitive IT tickets like password resets and account unlocks? You're not alone — these make up nearly 30% of all service desk requests. In this demo, learn how RITA, the AI-powered IT Agent from Resolve, can eliminate these issues entirely — no ticket required.

Scaling Online Game Infrastructure for High-Engagement PvM Content

The explosive popularity of player-versus-monster (PvM) content in online games brings significant backend challenges, particularly as titles scale globally. Instanced boss fights, real-time combat logic, and mass player concurrency demand robust, responsive server infrastructure that can scale both horizontally and vertically - without degrading the player experience.

From Wallpaper to Web Servers: How One Immigrant Switched from Walls to DevOps in Just Two Years

He landed in the U.S. with a suitcase, a scraper, and a strong back. No tech degree. No connections. Just a willingness to work and a sense that something bigger might be possible. At first, he did what he knew best - he worked as a wallpaper installer. The job was honest, physical, and surprisingly calming. "There's something meditative about smoothing out bubbles," he says. "But after a while, I realized I wanted to build something that didn't peel off the wall."

Introducing parent/child pipelines

We’re excited to announce the launch of parent/child pipelines for Bitbucket Pipelines. This powerful new capability lets you define a step within a pipeline that triggers and encapsulates a whole other pipeline, which can help to streamline more complex workflows into modular pieces and achieve greater parallelism within your pipeline.

Why Cloud-Based DCIM Software Outperforms Legacy Systems

The data center industry’s rapid evolution requires innovative tools to address its ever-changing demands. Cloud-based Data Center Infrastructure Management (DCIM) solutions have emerged as a powerful alternative to traditional on-premise systems, offering unmatched scalability, cost savings, real-time monitoring, and AI-driven insights.

Set up preview deployments for pull requests using CircleCI and Vercel

Working in front-end development involves writing features and bug fixes in different branches. But how do you ensure that reviewers, testers, and other stakeholders find it easy to view changes? Using preview deployments is one solution. Preview deployments allow you to automatically create a live URL each time someone opens a pull request (PR). It’s like giving every branch its mini website so that changes can be tested and proven in isolation.

30,000 updates per day: Dynamic Kubernetes routing with HAProxy Map API at PayPal

Learn how PayPal Genesis scales with HAProxy Fusion via dynamic configuration, rapid deployments, and automated service discovery for 10,000 test environments. HAProxy is the company behind HAProxy One, the world’s fastest application delivery and security platform, and HAProxy, the most widely used software load balancer. Leading platforms and cloud providers trust HAProxy to simplify, scale, and secure modern applications, APIs, and AI services in any environment.

Console Connect expands Azure ExpressRoute reach with new global locations

Console Connect has expanded its global footprint with five additional Microsoft Azure ExpressRoute locations, bringing the total to 16 locations worldwide. This significant growth gives customers even more options to directly and securely connect to Azure from strategic data centre hubs in key international markets.

PostgreSQL Table partitions now supported in Flyway

This blog post was originally authored by Prajakta Tamhankar, whose insights and expertise shaped much of the content you’ll read here. We are thrilled to announce the General Availability (GA) of Table Partitions for PostgreSQL users in Flyway v8.0.2. This new functionality is designed to enhance your database management experience by providing robust support for table partitions, including sub-partitions and range partitions.

How Prometheus 3.0 Fixes Resource Attributes for OTel Metrics

When you export OpenTelemetry metrics to Prometheus, resource fields like service.name or deployment.environment don’t show up as metric labels. Prometheus drops them. To use them in queries, you’d have to join with target_info: This makes filtering and grouping more difficult than necessary. Prometheus 3.0 changes that. It supports resource attribute promotion—automatically converting OpenTelemetry resource fields into Prometheus labels.

OTel Weaver: Consistent Observability with Semantic Conventions

Deploying a new service shouldn’t break dashboards. But it happens, usually because metric names or labels aren’t consistent across teams. You end up with traces that don’t link, metrics that don’t align, and queries that take hours to debug, not because the system is complex, but because the telemetry is fragmented. OTel Weaver addresses this by enforcing OpenTelemetry semantic conventions at the source.

Are Egress Fees Holding Your AI Business Back?

For AI companies, the landscape of cloud computing has always been a balancing act between innovation, costs, and compliance. That’s where Civo comes in. Offering a full cloud offering with GPUs, but without the usual headaches, Civo provides a rare combination: true data sovereignty and zero data egress charges. Let’s break down why these two features should be non-negotiables for your AI infrastructure.

9 Best IT Alerting Software in 2025 (Plus 3 Open-Source Options)

I’ve curated a list of 9 best IT alerting software and 3 open-source alternatives for you. Every tool on this list handles the core alerting functions you need: incident detection, fast alert delivery, clear escalation paths, and reliable incident logging. Since all these tools tick those boxes, I focused on what makes each tool special. You’ll find their unique features under “Standout Alerting Features of ” for each option.

JFrog Deployed on AWS: The Foundation for Cloud-Native Excellence

We are delighted to share the exciting news that JFrog has earned the “Deployed on AWS” badge in AWS Marketplace, marking yet another milestone in our journey of innovation and collaboration with Amazon Web Services (AWS). This achievement underscores our commitment to providing cutting-edge solutions that leverage AWS’s robust infrastructure to enhance the user experience and drive efficiency.

Git Rebase -i: Clean Your Commit History

Messy commit history? Git lets you clean it up before anyone sees it — and it’s easier than you think. In the final episode of Wait… Git Can Do That? — Volume 1, we walk through git rebase -i: Clean up the last N commits Squash, rename, or drop them Bonus: GitKraken Desktop lets you do it all visually You've just unlocked 8 Git powers most devs don’t even use.

Azure Reserved Instances: Saving Smart, Maximizing ROI

Many teams buy RIs with the best of intentions (predictability and up to 72% savings) only to realize later that they’ve either overcommitted or left money on the table. Without clear visibility, what starts as a smart cost-saving move can slither into silent waste. This guide will help you get ahead of that. We’ll walk you through the ins and outs of Azure Reserved Instances, compare them to other savings options, and share best practices to help you avoid common pitfalls.

How sum_over_time Works in Prometheus

The sum_over_time() function in Prometheus gives you a way to aggregate counter resets, gauge fluctuations, and histogram samples across specific time windows. Instead of seeing point-in-time values, you get the cumulative total of all data points within your chosen range—useful for calculating totals from rate data, tracking accumulated errors, or understanding resource consumption patterns over custom intervals.

SD-WAN, SASE, SSE, and the Coffee Shop Network: From Distraction to AI Superpower

Back in 2018, I wondered (perhaps loudly if SD-WAN was just IT’s hype-of-the-year, destined for the same eye-rolls as signature-based antivirus and GDPR compliance drives. Even then, I knew we couldn’t let messaging fatigue blind us to real technology shifts. Fast-forward to 2025: SD-WAN (Software-Defined Wide Area Network) not only stuck around, but became the springboard to something far bigger – SASE (Secure Access Service Edge).

Tutorial: Visualize Your Puppet Data in Grafana with the Observability Data Connector

When you manage complex IT infrastructure, it becomes critical to use tooling to understand what’s happening across all of your systems in terms of performance, reliability, and compliance. Monitoring key indicators manually is simply no longer possible at that scale. Puppet has long been known as a solution for managing large environments and collecting a vast amount of data about your infrastructure, but accessing and visualizing that data in a meaningful way can be a challenge.

Understanding GPUs for AI success: Insights from our panel discussion

This blog is based on the webinar, “Panel Discussion: Understanding the importance of GPUs for AI success”, you can watch the full recording by clicking here! Last week, we hosted a panel discussion surrounding the importance of GPUs for AI success that featured Kunal Kushwaha (Field CTO), Ben Norris (AI Engineer), and Kendall Miller (Strategic Business Development).

Is your cloud data truly sovereign? The CLOUD Act & FISA 702 reality check

As UK public sector bodies, financial institutions, and enterprises accelerate cloud adoption, a pivotal question emerges: Who truly controls your data, and under which laws? With data breaches and regulatory scrutiny intensifying, storing data and workloads in a host country alone doesn't guarantee sovereignty. U.S.

Early preview: Auto-translation in Mattermost channels

In this demo, we’re showcasing an early prototype of channel-based auto-translation in Mattermost — designed to break down language barriers in global operations. Watch how users can seamlessly read messages in their preferred language across any channel, enabling inclusive, multilingual collaboration in real time. Note: This is an early prototype demo. The capability has not yet been released in Mattermost but is on our roadmap for the near future (release date TBD).

Building bridges across clouds: the PayPal approach to unified business communication

Learn how PayPal solved multi-cloud connectivity challenges using HAProxy to build Meridian, achieving 24% latency reduction and seamless integration across AWS, GCP, and Azure with overlapping IP spaces. HAProxy is the company behind HAProxy One, the world’s fastest application delivery and security platform, and HAProxy, the most widely used software load balancer. Leading platforms and cloud providers trust HAProxy to simplify, scale, and secure modern applications, APIs, and AI services in any environment.

Easily Ensure Security Compliance with Puppet Enterprise

Learn how Puppet Enterprise simplifies security compliance management, helping regulated enterprises strengthen their security posture through automated monitoring and actionable insights. Key Features and Benefits: Upgrade your compliance workflow today! Streamline your processes and gain confidence in your security operations with Puppet Enterprise.

Git Shortlog: Who Really Wrote the Code?

Wondering who’s actually writing the code on your team? Git knows and it’s not subtle. In this episode of Wait… Git Can Do That?, we show you how git shortlog -sne breaks down commit counts by contributor. See who’s active, who’s dropped off Sort by name, email, and commit count Bonus: GitKraken Desktop shows this visually with diffs, additions, and deletions Useful for standups, retros, or just flexing.

Live PostgreSQL Monitoring on Azure & AWS with pgNow

Managing credentials gets trickier as your Flyway project grows, especially when databases contain sensitive data. In this episode, Tony and Tonie break down how Flyway’s Property Resolvers help you keep secrets safe by pulling them securely from local or cloud-based stores, avoiding hardcoding them into config files.

Seeing the Bigger Picture: Why Security Needs Depth, Not Just Products

A recent BBC article, “Weak password allowed hackers to sink a 158-year-old company,” outlined a serious security lapse. This case reinforces the message that we, at Teneo, advocate every day: true resilience comes from defense in depth, i.e. policy, product and process, not just tools at the edge. In a recent customer engagement, we discussed a transition from VPN to ZTNA. While ZTNA offers enhanced security including continual checking, improved segmentation and a minimized attack surface.

Use Telegraf Without the Prometheus Complexity

Every system needs observability. You need to know what your CPU, memory, disk, and network are doing, and maybe keep an eye on database query latency or Redis connection counts. But setting that up isn’t always simple. You start with a couple of shell scripts. Then come exporters. Then Prometheus. Before long, you’re managing scrape configs, tuning retention, and watching dashboards fail under load after two days of data.

Lessons from Alaska's outage: Redundant resilient

Last Sunday, Alaska Airlines suffered a three-hour outage that led to more than 200 flight cancellations and disrupted 15,600 passengers. The culprit? “A critical piece of multi-redundant hardware at our data centers, manufactured by a third-party, experienced an unexpected failure. When that happened, it impacted several of our key systems that enable us to run various operations, necessitating the implementation of a ground stop to keep aircraft in position.”

Automate Disk Space Management on Windows with Resolve

Struggling with managing disk space issues on your servers or virtual machines? See how you can use Resolve to automate disk space addition and expansion on Windows systems, saving time, reducing manual errors, and eliminating the need for high-level administrative access. In this video, you'll learn how Resolve automates the process of: Whether you’re a system admin, IT operations engineer, or automation enthusiast, this demo highlights how you can streamline infrastructure tasks using intelligent automation.

Automating Linux Disk Expansion with Resolve: Add & Extend VM Disks in Minutes!

Running into disk space issues on your Linux servers or virtual machines? In this step-by-step demo, we show how Resolve’s powerful automation platform can help you automatically add and expand disk space on Linux systems, eliminating manual processes, reducing human error, and improving operational efficiency. In this video, you’ll learn how to: Technologies Featured: Whether you're a system admin, IT operations engineer, or automation specialist, this demo highlights how to streamline critical disk management tasks that normally require elevated access and technical knowledge.

Build. Release. Run. Repeat. But Where's the Control?

In every engineering organization, from fintech unicorns to 20,000-seat global bank, delivery happens in a loop. Code gets built. Releases get pushed. Systems run 24/7. Then it all happens again. This cycle isn’t an opinionated lifecycle dreamed up by a consultant or vendor, it’s just the reality of software delivery today.

What to expect in a Gremlin workshop

Gremlin workshops give your team hands-on training with Gremlin so they can get real results and dramatically improve your reliability. Full transcript:  The goal of our workshops is really to accelerate you and the team in your reliability journey. Whether you're starting out for the first time, or you're a more advanced user, this workshop is really designed for you to take you to the next level.

From Dial-Up to Colo: The Impact of AI on Data Center Design

In this episode of Uplink, we’re joined by Jay Smith, VP of Data Center Operations and Engineering at Evocative. With nearly 30 years in the industry, Jay unpacks how data centers are adapting to support AI’s massive power and cooling demands. This episode covers: Why colo is thriving in the AI era Liquid cooling and rear-door heat exchangers Powering 275kW racks and beyond How AI inference is shifting compute to the edge Career opportunities in infrastructure without a degree.

What is Java Performance Monitoring? [A Guide to DevOps Engineers]

You rolled out a Java application that worked fine in development. Fast, clean, no errors. However, once it went into production, things began to change. Suddenly, the app feels slow. CPU usage climbs without warning. Some users start getting timeouts. You check the dashboards, but nothing jumps out. You look through the logs, but it's mostly noise. And then the questions start coming in - "Is the JVM the problem?" If you've been in that situation, you're not alone.

Build EF Core Models Visually with Entity Developer - No More Manual Mapping!

Want to simplify your EF Core development? Discover how Entity Developer — a powerful visual ORM tool from Devart — helps.NET developers design, generate, and maintain EF Core models faster and with fewer errors. In this video, you’ll learn how to: Entity Developer works as both a standalone app and a Visual Studio plugin, giving you flexibility across any development environment. With support for EF Core, NHibernate, and LinqConnect, it's your all-in-one solution for ORM design in.NET.

How To Sell Cloud Cost Optimization To Your CFO

You know you’re bleeding money in the cloud. Maybe not everywhere, but enough to feel it. Your engineers know it too. You’ve got idle resources humming away, AI workloads scaling like wildfire, and nobody can quite explain why last month’s bill jumped by 17%. So, you bring up the idea of investing in a cloud cost optimization product. Cue the skeptical glance from your CFO.

Optimizing HAProxy at scale: Liftoff's path to efficiency, performance, and cost savings

HAProxy is the company behind HAProxy One, the world’s fastest application delivery and security platform, and HAProxy, the most widely used software load balancer. Leading platforms and cloud providers trust HAProxy to simplify, scale, and secure modern applications, APIs, and AI services in any environment.

Zero-Downtime TLS: Automating HAProxy Certificate Management with ACME

HAProxy is the company behind HAProxy One, the world’s fastest application delivery and security platform, and HAProxy, the most widely used software load balancer. Leading platforms and cloud providers trust HAProxy to simplify, scale, and secure modern applications, APIs, and AI services in any environment.

How To Hire In FinOps: Roles, Responsibilities, Skills, Interview Questions, And More

FinOps is booming as a function. The global cloud FinOps market will grow from $13.5 billion in 2024 to $23.3 billion in 2029 — a compounded annual growth rate (CAGR) of 11.4%, according to Research and Markets. That’s in response to sharp increases in cloud spend. About $723 billion is expected to be spent on public cloud services in 2025, up from $596 billion the year before according to a Gartner report.

Enginears Podcast: Redefining Developer Productivity in the AI Era

Developer productivity is top of mind for all engineering organizations. As AI accelerates software development, leaders face a fundamental question: Are we truly building faster, or just building more? And more importantly, are we building the right things, and building them well? In this new era, speed alone isn’t enough. High-impact teams must ensure their work aligns with customer value and is delivered with exceptional quality.

Measure your reliability risk, not your engineers

Do you know the current reliability risk of your systems? Do you know right now how your services will react to common failures like a dependency going down? Sadly, most organizations don’t have answers to these questions, relying on QA tests and the skill of their engineers to deploy code they assume won’t break. But this is a process problem, which means you can’t hire your way out of it.

What's Driving Traffic in Rural America - Broadband, Wireless & DCI for AI

s consumers continue to shift away from traditional cable broadcasting towards streaming video services such as Netflix, Disney+, Hulu, and Paramount+, this will continue to drive the demand for robust middle mile solutions. This growth in data traffic demands not only enhance rural provider’s Internet infrastructure but also fortify their needs for data center interconnectivity (DCI) to metropolitan hubs. In this context, rural markets require connectivity solutions that can support 100G and even 400G connections to efficiently handle rising bandwidth needs tied to cloud applications and content delivery networks.

Upgrading to Puppet 8 - What You Need to Know

Discover how Puppet 8 takes performance, modernization, and automation to the next level! This video offers a comprehensive overview of all the latest changes and benefits, including faster catalog compilation, streamlined agent execution, and improved debugging capabilities with enhanced logging and error messages. Key Moments in This Video:· Importance of Using PDK Explore key upgrades like these and see how tools like PDK empower your team by identifying and resolving compatibility needs quickly.

Maximizing Peering Through Flow Analysis

Discover how to use flow data to pinpoint your most valuable traffic, identify missing peer opportunities, and make smarter peering choices across your internet exchanges. In previous peering blogs, we’ve shared how you can maximize the value of your connection to an IX by peering with the IX route servers, and identify and contact specific peers via bilateral sessions.

TLS and HAProxy 3.2: From Stunnel to native TLS support

HAProxy is the company behind HAProxy One, the world’s fastest application delivery and security platform, and HAProxy, the most widely used software load balancer. Leading platforms and cloud providers trust HAProxy to simplify, scale, and secure modern applications, APIs, and AI services in any environment.

Database Naming Standards: SQL Conventions for Tables, Columns, and Keys

SQL naming conventions are often at the root of database headaches. One careless name like temp1 or new_table can lead to broken queries, failed deployments, and hours of avoidable debugging. To prevent this, teams need clear, enforceable naming standards: rules they can rely on as databases evolve. Many organizations also turn to purpose-built SQL Server IDEs to apply the standards programmatically, visualize relationships, and manage schema changes across environments.

Speedscale: Avoid Regulatory Icebergs with Traffic Replay, and Save Money

It has never been more critical to establish a solid foundation for regulatory compliance. Regulations govern a wide range of functions. Some of them are obvious, such as health and human services, patient data, medical devices, and credit payments. Some of them are less obvious, especially with the ever-changing definition of what constitutes private and identifiable data. This article provides an overview of regulatory compliance challenges and the hidden risks organizations face beneath the surface.

Building Systems For AI: Lessons On Governance From DevOps History

In 2008, Nuance hired me to join their Healthcare Speech Recognition team as a “Release Engineer.” DevOps wasn’t a thing yet — Patrick Debois and Andrew Shafer wouldn’t hold their first “DevOpsDays” until 2009. But I was lucky that “Release Engineer” at Nuance meant “jack of all trades” who wrote Makefiles, bash scripts, Perl, and Java to build and release code to a fleet of hundreds of on-premise Linux machines.

Ship Confluent Cloud Observability in Minutes

You're running Kafka on Confluent Cloud. You care about lag, throughput, retries, and replication. But where do you see those metrics? Confluent gives you metrics, sure, but not all in one place. Some live behind a metrics API, others behind Connect clusters or Schema Registries. You either wire them manually or give up. What if you could stream those metrics to a platform built for high-frequency, high-cardinality time series, and do it in minutes?

HAProxy Enterprise WAF protects against Microsoft SharePoint CVE-2025-53770 / CVE-2025-53771

Critical vulnerabilities in Microsoft SharePoint (CVE-2025-53770 and CVE-2025-53771) are currently being exploited in the wild. Disclosed on July 19, 2025, these vulnerabilities have CVSS scores of 9.8 and 7.1 respectively, indicating severe and high risk. CVE-2025-53770 affects on-premises Microsoft SharePoint Servers, allowing unauthorized attackers to execute code over a network. CVE-2025-53771 affects Microsoft Office SharePoint, allowing authorized attackers to perform spoofing over a network.

Platform Team Toolkit: Governance that accelerates developer velocity

Platform engineering teams face a critical challenge: scaling software delivery across dozens of development teams without killing innovation and velocity. The traditional approach forces an impossible choice: rigid standardization or operational chaos. Platform teams get buried in manual configuration requests, security updates take weeks to roll out, and compliance gaps emerge from inconsistent practices and developer workarounds.

FireHydrant MCP Server User Guide

Tips and best practices to help you get up and running with FireHydrant's Model Context Protocol integration. Manage incidents, alerts, and retrospectives directly through AI assistants like Claude or Cursor. Welcome to the FireHydrant MCP Server user guide! This guide will help you get up and running with FireHydrant's Model Context Protocol integration, allowing you to manage incidents, alerts, and retrospectives directly through AI assistants like Claude or Cursor.

Reliability is about more than uptime

Reliability results are more than whether your application is up, it's about proactive measurement and keeping it up. Full transcript:  Reliability results in my earlier career was, "Is there any downtime? Are there any errors that are getting thrown?" It's not a proactive way to measure your reliability. If you're measuring it in time of production, it's not gonna be an accurate reflection of what your reliability is. The way that my mindset has changed over time has been a proactive measurement. Before we ship something out, is this gonna be reliable from the start?

The Quest For The Five Minute Deploy

The Quest For The Five Minute Deploy Speed is everything at incident.io. The faster we can test and ship code, the faster we can get new products and features out to customers. Over the last three years, as our codebase grew and our test suite expanded, we drifted away from our own goals: "We aim for less than 5 minutes between merging a PR and getting it into production." This is the story of how we got back on track.

Agentic Workflows That Actually Work (and Don't Take Six Months to Deploy)

There’s a hard truth in IT: everyone wants automation, but nobody wants to wait six months (or more) to get it off the ground. Traditional automation initiatives are often bogged down in backlog, scripting complexity, and integration chaos. That’s exactly what makes agentic workflows different. Agentic workflows don’t just automate tasks; they act. They understand intent, operate autonomously, and improve over time.

Zero downtime deployments to Render using CircleCI

Downtime during deployments can affect the performance of your work. Data can be lost, and trust in your application can be destroyed. Luckily, zero downtime deployments do not need to be complex or involve a big infrastructure. This tutorial will teach you to establish a stable CI/CD pipeline with CircleCI and Render to automatically test and deploy a basic React application.

Less Overhead, More Impact: The Cycle Approach

Every company is now a software company. While the industry gets caught up in buzzwords and complexity, the core question remains: How can my organization reduce costs without creating long-term problems, and without giving up security or speed? The Cycle platform was built to answer this. It offers a lower total cost of ownership, simplifies operations at scale through automation and standards, and is secure by default without slowing down development.

Free for the Community, Built by JFrog: Introducing the DSSE Attestation Online Decoder

Attestations, or as we like to call them, evidence, are a critical piece to proving software supply chain integrity and security. However, without the right tools and processes, reviewing and verifying attestations can be time-consuming. At JFrog, we’re deeply committed to empowering developers, DevOps, and Security teams to make these complex workstreams as simple as possible.

Platform Team Toolkit demo

Platform teams face an impossible choice: rigid standardization that slows developers down, or operational chaos that creates security gaps. CircleCI's new Platform Team Toolkit eliminates this tradeoff by delivering self-service developer experiences with built-in governance. What You'll See in This Demo: Key Benefits: Perfect for platform engineers, DevOps teams, and engineering leaders who need to scale software delivery without sacrificing speed or safety.

Stop Committing Too Soon With This Git Hack!

Need just one commit from another branch — but don’t want to commit it yet? In this episode of Wait… Git Can Do That?, we show you how to use git cherry-pick -n to stage changes without committing. Perfect for bundling, editing, or staging carefully Keeps history clean Bonus: In GitKraken Desktop, cherry-pick visually and decide when to commit More control. Less commit anxiety. Subscribe for more Git tricks and GitKraken power moves.

400 Million Reasons Hackers Will Target Microsoft Again...

Yesterday, like many others in the tech community, I found myself pausing to fully grasp the implications of the Microsoft SharePoint hack. As one of the most widely adopted document management and collaboration platforms globally, SharePoint’s compromise inevitably sends ripples of concern through businesses everywhere. This news reminded me of a conversation I had just last week with an enterprise customer. We were discussing how one might approach cybersecurity from a hacker’s perspective.

Vendor lock-in and the fight for UK digital sovereignty

To read more on the findings from this research, visit the Digital Sovereignty Revolution whitepaper by clicking here. For years, global hyperscalers have been the backbone of cloud infrastructure for UK businesses. Their scale, reach, and performance made them the default choice. But as geopolitical uncertainty grows and concerns around data governance deepen, the cracks in this model are beginning to show.

How to Set Up Real User Monitoring

Synthetic monitoring provides consistent, repeatable results, 2.1s load times, passing Lighthouse scores, and minimal variability. But those numbers reflect lab conditions. On slower networks, like 3G in Southeast Asia, real users may see much higher load times, 5.8s or more. This isn’t a fault of the tools. It’s a difference in testing context. Synthetic tests run on fast machines, stable connections, and clean environments.

VirtualMetric Achieves SOC 2 Certification: A Milestone in Trust and Security

We’re excited to announce that VirtualMetric has achieved SOC 2 Type 2 certification. This is a key step in our mission to deliver secure, resilient, and efficient telemetry solutions. This certification confirms that our controls for security, availability, confidentiality, and data integrity don’t just look good on paper — they work in practice, over time.

VirtualMetric in the 2025 Comprehensive Market Guide: Rising Data Pipeline Security

Over the past year, much of cybersecurity’s attention has centered on the promise of AI-powered SOCs. But as the Market Guide 2025 by Francis Odum reveals, the true foundation of modern security success lies in the data layer. “Without clean, well-routed telemetry, even the smartest AI is starved of context,” points out the researcher. And that’s where Security Data Pipeline Platforms (SDPPs) have become essential.

VirtualMetric Earns ISO 27001:2022 Certification: Security at Every Level

We’re excited to share that VirtualMetric has officially achieved ISO 27001:2022 certification, a globally recognized standard for building and managing an effective Information Security Management System (ISMS). This confirms that we’ve implemented robust controls to protect data, manage risks, and ensure the resilience of our infrastructure in today’s security landscape.

Build Smarter With Cloud-Native Tools: Your 2025 Guide

Cloud-native tools promise speed, scalability, and resilience. The catch is you have to pick the right ones and use them well. Without the right foundation, they can mean more complexity, hidden costs, and a false sense of control. In this guide, we’ll help you avoid that trap. From infrastructure to observability and CI/CD tools, we’ll cover the solutions shaping modern cloud stacks.

Stop Scrolling Through Git History Forever!

Trying to find when a specific function changed — or disappeared? In this episode of Wait… Git Can Do That?, we show you how to use git log -S'string' to search your Git history for code-level changes. Use -S to find string adds/removals Add -p to view the diffs Bonus: In GitKraken Desktop, search visually and jump straight to changes Less guessing.

Core Values for a Better Developer Experience

What does dev onboarding look like after 17 acquisitions? At Appfire, it's not chaos, it's a unified developer experience that actually scales. In this GitKon session, CTO Ed Frederici shares how Appfire integrates 60+ teams and 160+ products without losing developer velocity or culture. For dev leads, managers, and anyone building a better onboarding playbook...this one's gold.

Why Appfire's Onboarding Process is So Fast

How do you onboard 800+ developers, integrate 17 acquisitions, and still keep dev velocity high? Ed Frederici, CTO of Appfire, shares how their unified developer experience scales productivity and preserves team culture. Whether you're leading a team or just joining one, this talk will change how you think about onboarding, autonomy, and dev growth. GitKraken Desktop: gitkraken.com/git-client GitKraken CLI: gitkraken.com/cli.

Tech Lead Journal: The CTO playbook for engineering excellence

Engineering excellence is more than code quality or tool choice. It’s about aligning engineering with business goals, improving systems systematically, and building a culture of continuous progress. In a recent episode of the Tech Lead Journal podcast, Cortex CTO and co-founder Ganesh Datta shared lessons from his engineering career.

Speed up PR reviews with actionable code suggestions

Hello, Bitbucket fans It’s Dave from the Bitbucket Cloud product team. We’re happy to introduce another enhancement to help your team better collaborate around code reviews, saving you valuable time – the ability to propose specific code suggestions within a pull request. Code authors can view and apply the proposed code changes without switching contexts, helping teams get pull requests completed even more efficiently. This feature is available today to all teams using Bitbucket Cloud.

Accelerate Your Deployment Frequency: Strategies to Remove Bottlenecks

Is slow deployment hindering your mid-size organization? This guide tackles common deployment bottlenecks like manual processes and inconsistent environments head-on. Discover actionable strategies for faster, safer releases, including CI/CD automation, Infrastructure-as-Code (IaC), GitOps, and cultivating a strong DevOps culture.

Monitor Nginx with OpenTelemetry Tracing

At 3:47 AM, your NGINX logs show a 500 error. Around the same time, your APM flags a spike in API latency. But what's the root cause, and why is it so hard to correlate logs, traces, and metrics? When API response times cross 3 seconds, identifying whether the slowdown is at the NGINX layer, the application, or the database shouldn't require guesswork. That's where OpenTelemetry instrumentation for NGINX becomes essential.

Credentials and Flyway: Keep Secrets Safe with Resolvers

Managing credentials gets trickier as your Flyway project grows, especially when databases contain sensitive data. In this episode, Tony and Tonie break down how Flyway’s Property Resolvers help you keep secrets safe by pulling them securely from local or cloud-based stores, avoiding hardcoding them into config files.

API Security: Validating Auth and Access with Traffic Simulation Starts with Behavior

Security breaches rarely begin with a hidden zero-day exploit or a complex web of escalated hacks. They often start in very simple ways – an internal team member is breached, a permission is misconfigured, an overly permissive API endpoint is overlooked, or a JWT simply doesn’t expire. An API, or application programming interface, is a set of protocols and tools that enable different software systems to communicate and exchange data, making them essential in modern software development.

Set Up ClickHouse with Docker Compose

ClickHouse is built for high-performance OLAP workloads, capable of scanning billions of rows in seconds. If your analytical queries are bottlenecked on PostgreSQL or MySQL, or you're burning too much on Elasticsearch infrastructure, ClickHouse offers a faster and more cost-efficient alternative. This blog walks through setting up ClickHouse locally with Docker Compose and scaling toward a production-grade cluster with monitoring in place.

Stream AWS Metrics to Grafana with Last9 in 10 minutes

It’s 2:47 AM and your Lambda functions are timing out. API response times are spiking. You’re flipping between the CloudWatch console, your APM tool, and your logs, trying to figure out what’s going wrong. CloudWatch has the metrics you need: CPU usage, memory pressure, and request rates — but connecting that data to what your app is doing takes time. The delay in stitching it all together slows down your incident response.

Being on-call at incident.io

At incident.io, we are building a product that our users rely on 24/7, all year round. This means it is crucial that it is always working, and that is where our on-call rotation comes in. We believe that everyone should be on-call because it tightens the feedback loop between shipping new features and maintaining what we have, leading to more pragmatic engineering decisions.

Release v2.6: MCP Server, AI Insights Enhancement, Okta SCIM Integration, SNMP Monitoring and more.

Netdata 2.6.0 is here and it’s our most intelligent release yet! This version brings AI-powered monitoring, easier network visibility, and smoother enterprise integrations, all designed to help you troubleshoot faster and scale smarter. What's New: Netdata Referral Program Every referred user will get a 10% discount when they subscribe to Netdata Business or Homelab - and you will receive 10% of their subscription value (up to a max of 1000$ per space). You can refer an unlimited number of users, so there's no real limit to how much you can earn with the referral program.

Let Git Find the Bug for You (No Guessing)

Somewhere in your commit history, a bug snuck in. You could scroll. Panic. Guess. Or — you could let Git find the exact commit that broke your code. In this episode of Wait… Git Can Do That?, we show you how git bisect binary-searches your history to isolate the problem — fast, clean, and testable. Use git bisect start, good, and bad Test each step to narrow it down Or automate it with git bisect run.

How to ensure your AWS workloads are resilient

Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. Cloud providers like AWS give you plenty of tools to make your workloads more resilient, but it’s up to you to apply them. However, considering how complex some of these tools are, where do you start? And how can you be sure your systems are more reliable as a result?

Kubernetes Clusters Break in the Weirdest Ways

If you’ve ever spent hours chasing a weird issue in your Kubernetes cluster, you’re in good company. Reddit’s r/kubernetes is full of hilarious and painful stories about clusters going off the rails for reasons no monitoring dashboard ever predicted. And while it’s easy to laugh after the fact, each of these moments highlights just how important observability is because these kinds of problems don’t show up on your radar until it’s too late.

Looking beyond dev productivity to increase speed ft. Brian Guthrie of Justworks

Speed isn't just about developer productivity—it's about market dominance. Rob sits down with Brian Guthrie, Director of Engineering at Justworks and former ThoughtWorks consultant, to explore why lead time from conception to production should be your organization's north star metric.

Fix Vulnerabilities Faster: Puppet's Advanced Patching Solution

Break down patching silos and remediate vulnerabilities faster with Puppet. Most CVEs sit unaddressed for weeks, even after your scanner picks them up. Vulnerability Remediation in Advanced Patching (a Puppet Enterprise Advanced exclusive) gives Security and Ops teams an easy-to-use dashboard for finding, fixing, and reporting on vulnerabilities. No more tossing CVEs over the fence. No more finger-pointing when things go wrong. Just swift, efficient vulnerability management.

Evaluating Serverless Vs. Containers And How To Choose

Containers and serverless computing are two of the most popular methods for deploying applications. With the rise of microservices and modern DevOps, teams need faster, leaner ways to build and release software. However, selecting the wrong architecture can slow down delivery, increase cloud costs, or lock you into tools that don’t scale with your business. Both methods have their advantages and disadvantages.

Flyway code analysis - These are a few of my favorite rules

Clean, consistent SQL code isn’t just a preference. It’s a pathway to healthier applications, faster debugging, and happier teams. Whether you’re onboarding new developers or optimizing legacy systems, having a clear set of standards can make all the difference. Flyway’s code analysis is a powerful ally in keeping your database code tidy and secure, and making sure best practices are being followed.

Why Branding Still Matters in the Age of DevOps and SaaS Automation

In a landscape dominated by automation, CI/CD pipelines, and observability dashboards, branding can feel... secondary. After all, if your platform ships fast, scales reliably, and integrates with everything - why should anyone care what it looks or sounds like? The answer is simple: they do.

Trace Go Apps Using Runtime Tracing and OpenTelemetry

When your Go service hits 500ms latencies but CPU usage is flat, tracing gives you visibility into what the profiler misses. With 1–2% runtime overhead, Go’s built-in tracing tools help you: This makes it easier to debug performance regressions that don’t leave a clear footprint.

Introducing JFrog's MCP Server: Better vibes and easier AI automation

Good news! You no longer have to be a DevOps or JFrog expert to harness the power of the JFrog Software Supply Chain Platform. With the introduction of JFrog’s MCP Server, we’re making the JFrog Platform accessible to your favorite large language models (LLMs). Now, every developer can take advantage of the detailed security and package information available in JFrog, such as vulnerability data from the JFrog Catalog, without needing to context-switch.

Introduction to Cloudsmith: Platform Overview

Learn how to control, secure, and distribute software artifacts with this full on-demand platform demo of Cloudsmith. In this video, Solutions Engineers Dan and Ciara walk you through key features, including web app setup, logging, policy enforcement, signing, and global distribution. Through live demos, you'll see how to integrate Cloudsmith into your CI/CD pipeline, enforce security and compliance, control access with entitlement tokens, and automate everything using the API.

Navigating Change: How VMware's VCSP Partner Reduction Affects Your Business

VMware customers are once again facing major change. Under Broadcom’s ownership, VMware’s is closing its Broadcom Advantage Partner Program for VMware Cloud Service Provider (VCSP) partners on October 31, 2025. This change will also include any VCSP Partner contracts not being renewed after this date. For many, this shift is more than an internal reshuffle.

AirDroid Business for IT & MSPs Mastering Remote POS Device Management

In this video, we will explore how to efficiently manage POS devices and enhance operational efficiency. Our focus is on tackling common issues faced by IT service providers and MSPs, including device crashes, firmware issues, and the need for manual configurations. With AirDroid Business, you can remotely troubleshoot devices, distribute apps seamlessly, enforce device usage restrictions, and track device locations in real-time. Scalability is effortless—whether you manage 300 or 3,000 devices, we’ve got you covered!

Reliability isn't a metric, it's a mindset

As someone with Type 1 diabetes, reliability is a way of life for Nick Mason, Sr. Solutions Architect at Gremlin. Full transcript: Reliability isn't just a metric, to me, it's a mindset. As someone that works in site reliability engineering and also someone who lives with type one diabetes, the concept of reliability is deeply personal to me. In tech, reliability means building systems that are going to recover gracefully and in life with a chronic condition like diabetes, it's the same thing.

In The AI Era, The Winning Teams Track Cloud Unit Costs From Day 1

Everyone’s obsessed with speed right now. Ship fast. Stack features. Slap an LLM on it and call it v1. Amirite? But in the AI era, where cloud costs can spiral in a weekend, moving fast isn’t enough. The teams that track cloud unit costs from Day 1? They’re the ones who come out ahead. Most teams don’t start there though. They focus on building features and chasing traction, and the cloud bill just shows up like that subscription you forgot to cancel. Maybe someone glances at it.

Stop Losing Your Git Stash With This Easy Trick!

Got 12 unnamed stashes and no idea what’s in any of them? In this episode of Wait… Git Can Do That?, we show you how to list and pop a specific stash entry using stash@{n}. You’ll learn how to: Orient yourself with git stash list Pop a targeted stash with stash@{2} Keep it around using apply instead of pop No more mystery stashing. Just clean, precise Git workflows. Subscribe for more ways to make Git suck less.

2025 Guide & Template: Automating Production Readiness

When launches are delayed or incidents occur, it’s often due to a breakdown in production readiness. Maybe documentation is outdated. Maybe no one’s on-call. Maybe a critical dependency isn’t even known. The truth is, production readiness shouldn’t be a manual checklist. Production readiness needs to be as dynamic as the software being evaluated.

Panel Discussion: Understanding the importance of GPUs for AI success

Are you curious about the role of GPUs in AI and how they can accelerate your projects? Join Kunal Kushwaha (Field CTO), Ben Norris (AI Engineer), and Kendall Miller (Strategic Business Development) in this upcoming panel discussion as they dive into the world of GPUs and their significance in AI.

Query and Analyze Logs Visually, Without Writing LogQL

It’s 2 AM. An incident’s in progress. Error rates are climbing. You jump into the logs, filter by service, adjust the time window… and now you need a LogQL query. You write one. It errors out. You fix the syntax, try again, only to realize you need a different filter or a new aggregation. Back to rewriting. By the time you’ve got the query right, you’ve already lost 10–15 minutes. The system is still broken, and you still don’t know why.

The Silent API Killer: Data Coupling in Your Tests

In API testing, speed, accuracy, and confidence in test results are everything. Regardless of whether you’re validating functionality, testing performance under load, or ensuring compliance with your security posture and standards, the ultimate goal is the same: catching problems before they reach production. But what if your tests are lying to you? Lurking beneath even the most sophisticated test suites is a subtle, pervasive threat: data coupling.

What's Next for Cloud in India? Help Shape the Future with Our Cost Survey

As the cloud computing industry continues to evolve in India, it's becoming increasingly important for organizations to understand the complexities and challenges associated with it. Last year, we released a whitepaper on the cost of cloud that gathered insights from over 500 industry professionals. While this helped us understand more about the rising cloud costs, complex billing models, and vendor lock-in within the UK, it was unclear how this differed for the Indian market.

5 Ways to Accelerate Product Delivery Without Managing Infrastructure

Is slow product delivery holding you back? This article explores how traditional infrastructure management creates significant bottlenecks, from time-consuming provisioning to inconsistent environments. Discover 5 strategies to streamline your delivery without managing infrastructure, including fully managed services, on-demand ephemeral environments, GitOps, self-service deployment platforms, and intelligent container orchestration.

Introducing Live Call Routing for Incident Response

Today, we are introducing Live Call Routing, a direct phone line that connects incoming calls to on-call engineers. It captures human-reported incidents that monitoring tools might miss—closing the loop between automated alerts and real-world observations so nothing falls through the cracks. It helps you respond to critical incidents faster by eliminating manual call routing, reducing response times from minutes to seconds.

Live Call Routing - Getting started

Live Call Routing is a direct line that connects incoming calls to on-call engineers. It captures human-reported incidents that monitoring tools might miss—closing the loop between automated alerts and real-world observations so nothing falls through the cracks. It helps you respond to critical incidents faster by eliminating manual call routing, reducing response times from minutes to seconds.

Top Terraform Alternatives And Competitors To Know

A few weeks ago, a lead DevOps engineer at a fast-growing SaaS company hit an unexpected wall. “It used to just work… until we scaled,” the lead noted after their Terraform setup began buckling under the weight of a growing cloud footprint. Another chimed in: “We’re spending thousands on infrastructure every week, but we can’t trace it back to who deployed what, or why.” Sound familiar? You’re not alone.

Kibana Logs: Advanced Query Patterns and Visualization Techniques

Kibana gives you a structured way to explore log data indexed in Elasticsearch. With the right queries and visualizations, you can identify anomalies, debug issues more quickly, and track trends across services. This blog covers practical ways to query logs using Kibana’s Lucene and KQL syntax, build visualizations that surface meaningful signals, and set up dashboards for ongoing log-based monitoring.

Enable Kong Gateway Tracing in 5 Minutes

Kong Gateway is a popular API gateway that sits at the edge of your infrastructure, routing and shaping traffic across microservices. It’s fast, pluggable, and battle-tested, but for many teams, it remains a black box. You might have OpenTelemetry set up across your application stack. Traces flow from your app servers, databases, and third-party APIs. But the moment a request enters through Kong, observability drops off.

Build Log Automation with Last9's Query API

Manual log investigation is one of those engineering tasks that quietly drains hours without offering much real value. You're debugging an incident. Monitoring shows elevated error rates. Now begins the familiar drill: It’s a tedious cycle, and it doesn’t scale. The whole process breaks down when you’re trying to automate incident response, run continuous security monitoring, or generate compliance reports.

Control IP Requests with Ease: LightMesh Requestor Workflows

In growing organizations, network requests pile up and so does the risk of delays, miscommunication, or misconfigurations. Whether a developer needs a static IP for a staging server or a project team is scaling infrastructure, manual processes don’t scale with you. LightMesh Requestor Workflow gives your users a secure, controlled way to request static IP addresses — without granting access to your entire IP address management (IPAM) platform. Admins stay in control. IT team stays productive.

Prevent Non-Person Accounts (NPA) Expiration Outages with Resolve

Expired non-person accounts can silently break system integrations and cause major outages—often overnight. In this video, see how Resolve automates the detection and extension of non-person account expirations across systems. Learn how to eliminate manual tracking, reduce risk, and keep your critical services running smoothly.

Playwright fixtures: A deep dive

Fixtures may be one of Playwright’s most powerful yet under-used features. Playwright fixtures can be used to simplify repetitive setup or teardown in your tests, manage test data ,and test state better. Fixtures are key if your objective is to write cleaner, maintainable and manageable Playwright tests. This tutorial is aimed at helping you master using Playwright fixtures, understand their purpose, and showing how you can use them most effectively in your tests.

Flyway Autopilot: Ingeniously Simple Database DevOps

Can Flyway Autopilot really help you get up and running with database migrations, CI/CD, and deployment best practices in under 20 minutes? In this special episode, Tonie and Tony sit down with Huxley Kendell, the co-creator of Flyway Autopilot, to find out what it does, how it works, and what inspired him to build it. Whether you're new to Flyway or looking to improve how your team handles database changes, this is your shortcut to getting started.

Getting Started Guide with Netdata

New to Netdata? Start here. In this quick and practical guide, we’ll help you get set up and confident with Netdata in just a few minutes. You’ll learn how to: Access your Netdata Space Connect your nodes—servers, VMs, containers, network devices, and more Organize your infrastructure with Spaces and Rooms Collaborate with your team in real time Explore alerting and integrations Customize notifications so you’re only alerted when it truly matters.

Overcoming the common networking challenges when connecting the "big three" clouds

Adopting a multi-cloud strategy is common for many businesses as it promises flexibility in managing their data. However, this isn’t without complexity especially when it comes to networking. In a recent podcast episode, Trent Blakely and Jay Turner unpacked some of the most frequently missed challenges that come with multi-cloud deployments.

Reliability means being there right when your customer needs you

When your systems are reliable, it means your customers can count on your applications to be there for them. Full transcript:  To me reliability means a good night's sleep, and being able to confidently go to bed and wake up the next day feeling ready to get out there and do my best work and not worry about the experience that our customers might have had through the night.

The Rise of Tech Events in India: A New Era for Cloud-Native Computing

As India emerges as a significant player in the global public cloud landscape, with its public cloud services market projected to reach $25.5 billion by 2028 at a CAGR of 24.3% for 2023-28, the country is witnessing a surge in tech events. This growth is mirrored in the live events market, which is experiencing a 15% YoY growth, fostering a stronger community and facilitating the exchange of ideas and innovation in the public cloud sector.

FinOps For AI: How Crawl, Walk, Run Works For Managing AI Costs

“It started as an experiment.” That’s how it begins at most companies. A small team spins up a few GPU instances to train a proof-of-concept model. Maybe it’s a fraud detection algorithm. Maybe it’s GenAI for support tickets. Either way, it’s just a test. Then the results come in, and they’re promising. Suddenly, that model is powering new features. Teams are fine-tuning LLMs in parallel.

How to Build Resilient Networks for AI Production Workloads

Production AI needs a network that can keep up. Learn why private, scalable connectivity is the key in our webinar recap with Vultr. AI is no longer a proof-of-concept hiding in a developer lab. It’s a full-fledged production workload, and it’s hungry for data. But as enterprises move their AI strategies from theory to reality, they’re hitting a wall that isn’t about algorithms or processing power – it’s about the network.

Lost Your Work? This Git Trick Saves The Day!

Ever reset too far? Deleted a branch you needed? Thought you lost a commit forever? In this episode of Wait… Git Can Do That?, we explore git reflog — Git’s local time machine. You’ll learn how to: View every local Git action — even the messy ones Recover unreachable commits Navigate using HEAD@{n} Just remember: it’s local, it’s time-limited, and it’s seriously underrated. Subscribe for more Git features you didn’t know you needed.

Demo: Running a Patch Job with Puppet Advanced Patching

With Puppet, patching is faster and easier than ever. Watch this video to learn how to set up and run a patch job with Advanced Patching in Puppet Enterprise Advanced. Puppet's Barr Iserloth and Liam Sexton cover activating Advanced Patching, creating a patch group, and running a patch job from the Puppet Enterprise console. Highlights include the easy-to-use patching GUI, custom patch groups for cross-OS patching, streamlined scheduling that obeys your defined maintenance and blackout windows, and reporting that shows you where each patch was applied.

Automating High CPU Utilization Remediation with Resolve

High CPU utilization alerts can overwhelm IT teams and disrupt user productivity—especially in virtualized environments. In this video, see how Resolve automates the end-to-end remediation process for sustained CPU spikes. From detecting alerts and creating incidents to gathering host data, verifying VM configurations, and dynamically adding vCPUs—watch how Resolve eliminates manual effort and speeds up incident resolution.

Jaeger Metrics: Internal Operations and Service Performance Monitoring

You're monitoring a microservices-based system. Alerts trigger when response times exceed 2 seconds. But when you open Jaeger, you're faced with thousands of traces. Identifying which service or operation is responsible becomes time-consuming. Jaeger metrics help reduce this friction by exposing aggregated telemetry. Instead of scanning individual traces, you get service-level and operation-level performance metrics, latency, throughput, and error rates that highlight where the issue lies.

IT Process Improvement Is Great... If You Can Find Someone to Build It

IT leaders know the value of process improvement. Smoother onboarding, faster incident resolution, streamlined change management, etc. It’s not for lack of ideas that IT teams fall short; it’s almost always a lack of bandwidth. Because of that, most process improvement efforts stall before they scale. Great ideas get captured in diagrams, Confluence pages, and strategy decks, but they rarely make it into production. Why?

Building Better with DevOps: How to Optimize WordPress for Speed, Security, and Uptime

WordPress is powerful. It's easy to use, packed with plugins, and flexible enough for blogs, shops, or full-scale business sites. But under the hood, things can get messy. Especially if you're not thinking like a DevOps pro. For teams that want fast, secure, always-up websites, you need more than a pretty theme-you need smart systems behind it. That's where DevOps comes in. In this post, we'll explore how DevOps practices can level up your WordPress site-from speed and uptime to security and scalability.

Docker Layer Caching: Speed Up CI/CD Builds

Docker layer caching (DLC) is a powerful technique that can significantly accelerate your CI/CD pipelines. By reusing unchanged image layers across builds, DLC not only cuts down on build times but also reduces cloud costs and boosts developer productivity. In this article, we’ll break down how Docker layer caching works, how to implement it effectively, and how to combine it with ephemeral environments for maximum impact.

API Staging Is Not Production - But Speedscale Makes It Close

Staging environments are often looked at as the testing ground ahead of the “real” production environment. The idea is simple – build a duplicate of your production environment, run your tests, and ship with confidence. But the reality of using staging in the real world as part of a holistic API testing strategy is rarely that clean. No matter how meticulously you mirror production services, staging always falls a little short.

Undo a Git Commit - Without Losing Your Code

Think you have to reset hard or revert every time you mess up a Git commit? Nope. In this episode of Wait… Git Can Do That?, we show you how to undo your last commit without losing any changes — using git reset --soft HEAD~1. Perfect for devs who move fast, commit early, and want cleaner history. Subscribe for more Git tricks they don’t teach you in tutorials. GitKraken Desktop: gitkraken.com/git-client.

Grok 4 Sets Records - But I'm Focused on Microsoft's 9% Sales Growth

The recent launch of Grok 4 has set the AI community buzzing. With an impressive score of 73 on TLDR’s AI benchmark, Grok 4 edges ahead of OpenAI’s O3 and Google’s Gemini 2.5 Pro, both scoring 70. Elon and the X AI team deserve praise for this breakthrough, reinforcing Grok 4’s status as potentially the most powerful LLM yet.

The Three Constraints Of AI Adoption: Code, Servers, And Wallets

Earlier this year, OpenAI’s CEO Sam Altman admitted something that should make every engineering leader pause: they’re “currently losing money” on ChatGPT Pro subscriptions, which run $200 per month. Let that sink in. A company charging two hundred dollars a month for AI access — 10 times what most SaaS products dare to ask — is still bleeding cash on every user. This isn’t a pricing problem. It’s a physics problem.

Golden Paths Made Easy With Cloudsmith

Over the past few years, Platform Engineering has taken off as more and more as enterprise organisations adopt the practice of creating a centralised, self-service interface for developers to access the tools they need in order for them to do the job they were meant to do: build amazing software. At the heart of every Golden Path lies the ability to reliably produce, store, and consume build artifacts, from container images to internal libraries.

How to Get Grafana Iframe Embedding Right

Adding Grafana dashboards directly into your app lets users see monitoring data without switching tabs or tools. Using an iframe to embed Grafana does work, but it brings along some tricky authentication and security issues that aren’t always obvious at first. In this blog, we’ll go over the practical ways to embed Grafana dashboards from easy public snapshots to secure, private dashboards that need authentication.

Optimize LangChain Performance with Trace Analytics

You’ve instrumented your LangChain app, and traces are now flowing into Last9. Now the issues are visible: API costs are crossing $200/day, average response times exceed 3 seconds, and performance degrades under 100 concurrent users. A single tool call adds over 2 seconds. Bloated context windows are pushing up token usage, wasting $50/day. Here’s how to use trace data to identify and fix these inefficiencies, systematically and at scale.

What is Linux Support?

In the world of enterprise IT, “support” can mean many things. For some, it’s a safety net – insurance for the day something breaks. For others, it’s the difference between a minor hiccup and a full-scale outage. At Canonical, it means a simple, comprehensive subscription that takes care of everything, so that everything you build works the way you want it to, for all the people who love to use it.

Observability as Code: Why You Should You Use OaC

Key takeaways In the fast-moving world of CI/CD pipelines, microservice architectures, and container orchestration, software changes rapidly. What exists in a codebase today might be gone next week. At this scale and speed, it’s impossible for development teams to manually track every line of code and every new piece of functionality.

How Miscommunication Can Break Your Code!

Security isn’t just about scanners and firewalls...it’s about people. In this session, Stefania Chaplin (founder of DevStefOps) explores how developers and security teams can collaborate more effectively to build stronger, more resilient systems. You’ll learn why empathy, trust, and psychological safety are just as essential as any security tool. What you’ll learn: Why people are the heart of effective security The real-world cost of miscommunication (including the Equifax breach) How to align dev and security mindsets Strategies for building collaborative, security-first teams.

Top 6 Multi-CDN Solutions

Speed and reliability are crucial for online businesses. Whether you're streaming, gaming, e-commerce, or running media-rich websites, downtime or slow content delivery can drive users away. This is where Multi-CDN solutions become essential. A Multi-CDN setup dynamically manages requests across several CDN services, combining the strengths of each and minimizing weaknesses such as regional limitations, service disruptions, or bandwidth saturation. These systems provide resilience and optimization, often using advanced analytics, traffic-routing logic, and real-time load balancing.

Building a Multi-Agent Containerization System at Bunnyshell

At Bunnyshell, we’re building the environment layer for modern software delivery. One of the hardest problems our users face is converting arbitrary codebases into production-ready environments, especially when dealing with monoliths, microservices, ML workloads, and non-standard frameworks. To solve this, we built MACS: a multi-agent system that automates containerization and deployment from any Git repo.
Sponsored Post

Boba Paradox

It's 2PM on a Thursday. Your engineering team is knee-deep in bugs from a recent release. But what's the Slack channel buzzing about? Not flaky tests. Not integration coverage. Not mocking services. It's whether to order brown sugar boba or taro with oat milk. Let's be honest: for many companies, it's easier to justify $8 on boba than $800 on testing tools. And we're not here to judge-we're here to understand why.

From Guesswork to Guarantees: How Traffic Replay Improves Release Confidence

In modern software development, the pressure to move fast is matched only by the need to get it right. Teams working within the software development lifecycle (SDLC) must constantly balance velocity and quality, ensuring releases are stable, secure, and performant. Traditional software development models often relied on manual verification and human intuition to validate releases; however, as systems have grown in complexity, guesswork is no longer sufficient to meet these rising needs.

Deploying secure AI: Canonical + SpectroCloud for federal missions

As mission requirements evolve, federal agencies and defense teams need infrastructure supporting AI/ML workloads anywhere, from secure cloud environments to disconnected edge locations. In this fireside chat, Mark Lewis (VP, Application Services at Canonical) and William Crum (Senior Defense Success Engineer at SpectroCloud) discuss how their organizations are helping federal customers deploy secure, scalable, and consistent Kubernetes and AI infrastructure across hybrid and edge environments.

4 Chaos Engineering recommendations from Gartner

Gartner recently published their annual Hype Cycle reports, including the Hype Cycle for Infrastructure Platforms. Designed to help heads of infrastructure and IT operations make informed decisions about infrastructure platforms, it includes over thirty different topics covering everything from platform engineering to distributed cloud to policy as code—including Chaos Engineering and Site Reliability Engineering.

Elasticsearch with Python: A Detailed Guide to Search and Analytics

If you’re using Python for search, log aggregation, or analytics, you’ve probably worked with Elasticsearch. It’s fast, scalable, and fairly complex once you go beyond the basics. The official Python client gives you raw access to Elasticsearch’s REST API. But getting it to work the way you want, especially under load, can be tricky. This blog walks through practical ways to index, query, and monitor Elasticsearch from Python code, without getting lost in the docs.

How Much Power Does a NVIDIA GB300 NVL72 Need?

Note: As of publication, NVIDIA has not released final specifications for the GB300 NVL72. The details shared here are based on projections informed by GB200 benchmarks, industry analysis, and expected generational improvements. As the AI arms race accelerates, data center professionals are already preparing for what’s next: the anticipated NVIDIA GB300 NVL72.

A New Era of Cooperation: How the UK-India Free Trade Deal Can Benefit India's Digital Economy

India’s digital economy is on a historic growth trajectory. According to the Digital Infrastructure Providers Association (DIPA), it’s projected to reach $1 trillion by the end of 2025, driven by rising internet penetration, data consumption, and cloud adoption.

Streamline API testing with Proxy Mock! Capture, mock, and replay API calls locally

Alan Mon introduces Proxy Mock, a powerful tool for capturing and replaying API calls. Learn how to effortlessly record inbound and outbound API requests and responses. The demonstration highlights how Proxy Mock operates entirely on your local machine, eliminating the need for cloud services or internet connectivity for testing. See how to set up Proxy Mock, inspect captured API calls (including request/response headers, body, and unique signatures), and leverage it to mock API responses for seamless local testing, ultimately boosting productivity and reducing the need for costly non-production environments.

From painted doors to real prototypes - a mindset shift

The economics of building software are changing everything. For years, entrepreneurs used "painted doors" - fake features to test demand - because building was too expensive. But when AI drops development costs, you can create real prototypes and gather genuine user data instead of pretending. This mindset revolution treats experiments like cheap option contracts - the lower the cost, the more you can explore. Ready to abandon painted doors for unlimited experimentation?

How to think about quality in the age of cheap prototypes

When AI makes prototyping incredibly cheap, your old quality standards become a bottleneck. The key mindset shift? Quality doesn't matter equally everywhere. You can experiment with lower-quality prototypes to learn faster, then apply high standards only to what customers actually see. This isn't about lowering standards - it's about applying the right quality mindset at the right stage. Stop letting perfectionism slow down your learning phase.

Cloud Log Management: A Developer's Guide to Scalable Observability

As systems move to microservices, serverless, and multi-cloud setups, debugging gets harder. You’re no longer dealing with a single log file; you’re looking at logs from dozens of services, running across different environments. Traditional debugging methods like SSH-ing into servers or adding print statements don’t scale in these environments. Cloud log management tools help by collecting logs from all your services into one place.

What is Log Loss and Cross-Entropy

You're building a classification model, and your framework throws around terms like "log loss" and "cross-entropy loss." Are they the same thing? When should you use binary cross-entropy versus categorical cross-entropy? What about focal loss? This blog breaks down these loss functions with practical examples and real-world implementations.

Canonical announces Charmed Feast: A production-grade feature store for your open source MLOps stack

July 10, 2025: Today, Canonical announced the release of Charmed Feast, an enterprise solution for feature management with seamless integration with Charmed Kubeflow, Canonical’s distribution of the popular open source MLOps platform. Charmed Feast provides the full breadth of the upstream Feast capabilities, adding multi-cloud capabilities, and comprehensive support.

OWASP CI/CD Part 9: Improper Artifact Integrity Validation

Improper artifact integrity validation is a critical vulnerability in CI/CD pipelines characterised by insufficient mechanisms to cryptographically verify the authenticity and integrity of code and build artifacts traversing the pipeline. When these controls are weak or absent, adversaries with access to any pipeline stage can inject malicious or tampered artifacts that appear legitimate, enabling undetected propagation through the pipeline and eventual deployment into production environments.

ITOps vs DevOps: Understanding Their Roles in Modern IT Environments

The conversation around ITOps vs DevOps continues as organizations pursue agile development and responsive service delivery. While both practices share the goal of improving software and infrastructure management, they emerge from distinct historical, operational, and cultural backgrounds. Understanding how these models differ at their core helps decision-makers choose the most suitable operating strategy and align their teams for smoother collaboration.

HAProxyConf 2025 Recap

A lot can change in three years. The world of 2022 was a quite different place. Queen Elizabeth II was the longest-serving living monarch, the world population hadn’t yet cracked eight billion, and many of us were still emerging from the strangeness of the Covid years. Meanwhile, at HAProxyConf 2022, we unveiled HAProxy Fusion Control Plane for the first time.

How to Get Logs from Docker Containers

When a container misbehaves, logs are the first place to look. Whether you're debugging a crash, tracking API errors, or verifying app behavior—docker logs gives you direct access to what's happening inside. This blog covers the full workflow: how to retrieve logs, filter them by time or service, and set up logging for production environments.

Troubleshooting LangChain/LangGraph Traces: Common Issues and Fixes

We’ve covered how to get LangChain traces up and running. But even when everything’s instrumented, traces can still go missing, show up half-broken, or look nothing like what you expected. This guide is about what happens after setup, when traces exist, but something’s off.

Security is a leading priority for 2025

The Cloudsmith 2025 Artifact Management Report offers timely insights into how engineering and DevOps teams are evolving their approach to software artifact management and software supply chain security. With supply chain attacks on the rise and Generative AI reshaping development practices, teams are reevaluating how they manage, secure, and scale their artifact repository infrastructure.

AI-Enabled Network Management: Revolutionize Operator Workflows with AI Agents

For today's leading service providers and large enterprises, ensuring peak performance requires navigating a labyrinth of data streams, monitoring tools, and legacy systems. This often leaves network operators spending more time searching for information than acting on it. A new AI-enabled network management is dawning, promising to upend these cumbersome workflows.

Critical RCE Vulnerability in mcp-remote: CVE-2025-6514 Threatens LLM Clients

The JFrog Security Research team has recently discovered and disclosed CVE-2025-6514 – a critical (CVSS 9.6) security vulnerability in the mcp-remote project – a popular tool used by Model Context Protocol clients. The vulnerability allows attackers to trigger arbitrary OS command execution on the machine running mcp-remote when it initiates a connection to an untrusted MCP server, posing a significant risk to users – a full system compromise.

Raising the bar for automotive cybersecurity in open source - Canonical's ISO/SAE 21434 certification

Cybersecurity in the automotive world isn’t just a best practice anymore – it’s a regulatory imperative. With vehicles becoming software-defined platforms, connected to everything from mobile phones to cloud services, the attack surface has expanded dramatically. The cybersecurity risk is serious, and concrete. And with regulations like UNECE R155 making cybersecurity compliance mandatory, the automotive industry needs suppliers it can trust.

Can GitKraken AI Fix My Rebase Disaster?

Rebasing can be risky, but with GitKraken AI, it’s faster, smarter, and way less stressful. In this video, we walk through how GitKraken AI auto-resolves merge conflicts during a rebase, complete with confidence levels and clear explanations. Get conflict suggestions Edit AI output directly Finish rebases with confidence Now until July 11, try all GitKraken AI features FREE during AI All Access Week.

Visualize Your Puppet Data in Splunk, Datadog & More [Demo]

Get more from your Puppet data and easily visualize events in your favorite observability tool. The Observability Data Connector in Puppet Enterprise Advanced empowers DevSecOps teams with real-time insights by exporting critical data to popular observability tools. It gives DevOps teams access to the real-time data they need to plan, adapt, minimize downtime, and support system reliability and compliance.

Kubernetes Monitoring 101: 25 Tools And Must-Know Tips

The Kubernetes platform is the standard for orchestrating containerized applications. It’s ideal for large applications running on distributed instances. However, monitoring Kubernetes infrastructure can be notoriously challenging. This guide will cover Kubernetes monitoring in more detail, including what metrics to track to improve visibility and control over your K8s containers, apps, microservices, etc.

LATAM Rising: Building the AI-Ready Digital Frontier

What happens when an entire region rethinks its digital future? In this episode of Uplink, Gabriel Del Campo, VP of Data Center, Security & Cloud at Cirion Technologies, joins host Michael Reid to explore how Latin America is transforming into a global tech player. From sustainable energy and AI-optimized data centers to regional regulatory reform, LATAM is moving fast - and with intention. This episode dives into the tech trends shaping the continent and what they mean for global cloud infrastructure, investment, and connectivity.

Beyond event badges: How autonomy and growth create space for creative experiments

Hi, Riley Durham here. I’m an Event Marketing Manager and Social Media Marketer at Cortex, and I’m here to share how a culture of autonomy and excellence has opened doors for career growth at Cortex. Before joining Cortex, I worked in the DevSec tool space, wearing multiple hats managing events, demand generation, social media, and direct mail marketing. That variety gave me a solid foundation in full-funnel marketing, but it was joining Cortex that really accelerated my growth trajectory.

dotConnect for Oracle: Powerful .NET data access with full ORM support

Elevate your.NET development with dotConnect for Oracle — a high-performance ADO.NET data provider built for enterprise-grade Oracle data access. In this video, discover how dotConnect for Oracle streamlines database connectivity, boosts developer productivity, and enables robust, scalable applications across the.NET ecosystem. What you’ll see: Whether you're building enterprise applications or high-throughput data services, dotConnect for Oracle delivers the performance, flexibility, and reliability your projects demand.

Why is the Trust in Big Tech Fracturing? Next Steps for the Cloud Industry

To read more on the findings from this research, visit The Digital Sovereignty Revolution whitepaper by clicking here. For years, US tech giants have enjoyed near-unquestioned dominance over the global cloud market. However, this dominance is being challenged as trust in Big Tech begins to fracture. According to our latest whitepaper, "The Digital Sovereignty Revolution", the foundation of trust in Big Tech's infrastructure and services is cracking.

Customizing your Azure DevOps DORA metrics dashboard

Looking to configure and customize a DORA metrics dashboard? Our Director of Engineering Services, Tim Wheeler, demonstrates how to customize the DORA Metrics dashboard in Azure DevOps for SquaredUp. He shows how to populate key metrics like deployment frequency and change failure rate by selecting a pipeline, specifically the Squared Up multi-stage pipeline.

Cloud Cost Optimization Strategies: How Mid-Size Organisations Can Reduce Cloud Infra Costs

Learn how mid-size companies can dramatically cut cloud infrastructure costs using practical strategies like compute rightsizing, serverless, storage tiering, and automated scaling. This guide also explores how Qovery simplifies and automates cost optimization for growing teams - no full DevOps team required.

DORA Compliance: How Upsun supports our financial services customers

The Digital Operational Resilience Act (DORA) is set to reshape how financial institutions in the EU manage and contract with their technology providers. Since January 17, 2025, DORA requires financial entities to meet stricter rules for managing digital risks, especially when it comes to the third-party ICT (Information and Communication Technology) service providers they rely on.

6 OpsGenie Alternatives for On-Call Management

You’re likely here because you heard the news: Atlassian ended new sales for OpsGenie on June 4, 2025, with a complete shutdown scheduled for April 2027. For years, OpsGenie has been the backbone of on-call management for countless teams. It might have been your team’s trusted solution too. But now, that chapter is closing. The pressure to find an OpsGenie alternative for on-call is real. However, you can’t just pick any tool and hope it works for your team.

Improve Consistency Across Signals with OTel Semantic Conventions

It’s 2 AM. Your API is timing out. Logs show a slow query. Metrics flag a spike in DB connections. Traces reveal a 5-second delay on a database call. But then the questions start:- Which database?- Does the query match the delay?- Why doesn’t this align with the connection pool metrics? Each tool uses different labels, db.name, database, sometimes nothing at all. Without a shared schema, connecting the dots is slow and frustrating.

How Replicas Work in Kubernetes

Replicas in Kubernetes control how many copies of your pods run simultaneously. They're the foundation of scaling, availability, and recovery in your cluster. When you're running a stateless API or a background worker, understanding how replicas work directly impacts your application's reliability and performance. This blog walks through replica management, from basic concepts to production monitoring patterns that help you maintain healthy, scalable applications.

See System Logs Alongside your Metrics Using Loki, Grafana, and Graphite

In this quick demo, we show how you can transform logs collected by Grafana Loki into actionable Graphite metrics using MetricFire. Watch as we convert structured logs into performance insights. Perfect for teams looking to bridge the gap between logging and monitoring. This workflow helps you move beyond basic log storage and turn raw logs into meaningful metrics for alerts, dashboards, and capacity planning.

Why we're talking to people about reliability

Reliability means a lot of things to a lot of people, but it’s also essential for every digital business. That’s why we’re talking to reliability experts from all over to find out what reliability means to them and how you can improve it. Transcript:  You know, we're all out here building and operating digital businesses and like nobody's talking about reliability enough. We gotta talk about it. I can't stop talking about it and I've been on call for like 20 years.

Liquid Cooling vs. Air Cooling: What's Right For Your Data Center?

As power-hungry workloads like AI and HPC become the norm, data centers face mounting pressure to rethink their thermal strategies. Traditional air cooling has long been the industry standard, but with rising rack densities and energy costs, many operators are exploring liquid cooling as a more efficient alternative. In 2024, the global liquid cooling market was valued around $4.18 billion and is projected to reach $13.2 billion by 2029.

Resolve COO, Ari Stowe speaks at ONUG AI Networking Summit 2025 #itautomation #agenticai #ai #tech

Our COO Ari Stowe spoke at @onugcommunity's AI Networking Summit on how AI and Zero Ticket IT are transforming enterprise IT. From tickets to autonomous resolution—AI, automation, and intelligent agents are changing the game. Hear why AI is now essential in today’s complex IT environments.

Introducing MetricFire Logging: Visualize Logs Alongside Metrics

As modern infrastructure grows more dynamic and distributed, collecting logs alongside metrics becomes a critical part of any observability strategy. To make this easy and powerful, MetricFire now supports a direct logging pipeline using Grafana Loki. This allows you to forward system logs from your servers to Hosted Graphite's Loki backend and visualize them in your Hosted Grafana dashboards with full control over queries, filtering, and alerting.

What our users make with Ubuntu Pro - Episode 1

Ubuntu Pro isn’t just for enterprises – it’s for the passionate community that powers and supports open source every day. From secure remote access to homelab hardening, Ubuntu Pro helps users get more from their systems, whether at work or at home. In this series, we talk to real users about how they use Ubuntu Pro in their personal and professional lives. We begin with Marc Grondin, a longtime Linux user and Ubuntu Pro subscriber based in Quebec, Canada.

Got AI Fear? You Shouldn't; It's Coming for Your Busywork, Not Your Job

Artificial Intelligence (AI) has rapidly become a cornerstone of modern IT operations. Yet, despite its transformative potential, many IT professionals harbor apprehensions about integrating AI into their workflows. This growing AI fear, while understandable, often stems from misconceptions and a lack of clarity about AI's role and capabilities. This discussion aims to address and debunk common fears associated with agentic AI.

Navigating the Complexities of Data Sovereignty: A Guide for UK Businesses

To read the full findings from this research, visit The Digital Sovereignty Revolution whitepaper by clicking here. As the digital landscape continues to evolve, one question is becoming increasingly pressing: are you in control of your digital future? With growing concerns around data sovereignty and the impact of geopolitical risks on cloud strategies, it's time to assess your organization's digital infrastructure.

FinOps Is The Margin Lever SaaS CEOs Keep Ignoring

You’re probably not combing through cloud bills. That’s not your job as CEO. But if no one on your executive team can tell you what it costs to serve a customer, ship a feature, or launch a new product line, that’s a problem. Not a someday problem. A right-now, quietly-draining-your-margins kind of problem. FinOps tends to get lumped in with cost-cutting — some finance thing, some DevOps thing. But that framing misses the point. Done right, FinOps is a growth enabler.

Top SaaS Companies Defining The Future Of SaaS

Picture this. Gartner forecasts worldwide end-user spending for public cloud usage to total more than $720 billion in 2025 — up from $595 billion in 2024. Out of that spend, SaaS will make up a chunky $299 billion. For comparison, Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) will make up nearly $212 billion and $209 billion, respectively. Elsewhere, BetterCloud’s State of SaaS 2025 report found that the average organization uses 106 different SaaS tools.

How Attending Conferences Can Boost Your Skills and Career

If you’ve ever wondered whether attending a conference is worth the time and budget, you’re not alone. But for data professionals, the right event can be a game-changer, not just for your technical skills, but for your career trajectory. At Redgate, we’ve seen firsthand how events like PASS Data Community Summit help attendees grow, connect, and return to work with fresh ideas and renewed energy.

SQL Prompt and other Tools now use a Dedicated Entra ID Application for Azure SQL Databases - Update Required

If you use Microsoft Entra ID to connect Redgate tools, such as SQL Prompt, to Azure SQL Databases, please update to the versions listed below before July 31, 2025. These versions use a new, dedicated Entra ID app to authenticate. Earlier versions use an authentication method that will no longer work after July 31st. This change only affects connections to Azure SQL Databases.

Understanding GCP Availability Zones And How To Use Them

If you’ve ever deployed a cloud application and wondered why some workloads seem faster, more resilient, or more expensive than others, the answer often lies in how you’ve used availability zones. In this guide, we’ll break down how GCP availability zones work, why they matter, and how to use them strategically to balance availability, compliance, and Google Cloud costs.

AI Won't Be Productive By Default (And That's OK)

Remember when we thought deploying from our laptops was efficient? When FTPing files directly to production at 2 AM felt like peak productivity? We’ve been here before. As AI transforms how we write code, we’re about to learn the same lesson all over again — but this time with much bigger bills.

Instrument LangChain and LangGraph Apps with OpenTelemetry

In our previous blog, we talked about how LangChain and LangGraph help structure your agent’s behavior. But structure isn’t the same as visibility. This one’s about fixing that. Not with more logs. Not with generic dashboards. You need to see what your agent did, step by step, tool by tool, so you can understand how a simple query turned into a long, expensive run.

Prometheus Group By Label: Advanced Aggregation Techniques for Monitoring

Your Prometheus dashboard shows 847 CPU metrics. The alert fired—but is the problem in us-east or us-west? You're trying to rule out whether that new feature caused a latency spike, but the sheer number of time series isn’t helping. Grouping can make this manageable. By organizing metrics by shared label values, you can quickly spot which service or region is behaving differently, without digging through every metric.

Boost Your AI Projects with GPUs: Live Expert Insights Webinar

Are you ready to supercharge your AI initiatives? Join our live webinar on July 16, 2025, at 05:00 PM, where Kunal Kushwaha, Ben Norris, and Kendall Miller will dive into the world of GPUs and their critical role in AI. Get ready to explore the latest insights and trends in GPU technology, including: This webinar is perfect for AI enthusiasts, startup founders, and engineers looking to stay ahead of the curve.

Introducing Packet Filtering on Megaport Cloud Router

Megaport Cloud Router's new feature is here to cut complexity and secure your network even faster. If you’re already a Megaport customer, Megaport Cloud Router (MCR) likely needs no introduction. This popular and long-standing solution allows customers to spin up routing capabilities on-demand within Megaport’s global Network as a Service (NaaS) platform and can be utilized across a variety of use cases.

TOP Salesforce Integration Tools

Looking to streamline your business workflows and boost Salesforce capabilities? In this video, we explore the top Salesforce integration tools that help connect Salesforce with the most popular apps like DBeaver, Power BI, Excel etc. Whether you're a developer, admin, or business leader, these tools can help you automate tasks, improve data flow, and enhance productivity.

What Makes Dev Tools Spark Joy for Developers?

Jovana Dunisijevic shares insights on how the right choice of Dev Tools can spark joy and boost developer velocity. She highlights the impact of company culture and innovation on team happiness. This talk emphasizes the importance of software engineering practices for a positive developer experience. GitKraken Desktop: gitkraken.com/git-client GitKraken CLI: gitkraken.com/cli GitLens for VS Code: gitkraken.com/gitlens.

Application Performance Monitoring (APM) Use Cases Every DevOps Team Should Know

Modern applications are built using distributed architectures, microservices, and cloud-native technologies. As these systems grow in complexity, it becomes harder for DevOps teams to maintain performance, track issues, and ensure a consistent user experience across all environments. Application Performance Monitoring (APM) helps solve these challenges by providing real-time visibility into how applications behave, from user interactions to backend services and infrastructure.

Docker Status Unhealthy: What It Means and How to Fix It

If your container shows Status: unhealthy, Docker's health check is failing. The container is still running, but something inside, usually your app, isn’t responding as expected. This doesn’t always mean a crash. It just means Docker can’t verify the app is working. Here’s how to debug the issue and restore the container to a healthy state.

Top Rancher Alternatives To Consider In 2025

Kubernetes orchestration isn’t getting any simpler. Today, teams are pushing into AI/ML, edge computing, and multi-cloud automation. And with that, you may be looking beyond Rancher. This guide walks you through today’s top Rancher alternatives, from enterprise-grade platforms like OpenShift to leaner developer-first tools like Lens and Portainer. With this intel, you can then decide which one fits your evolving stack, budget, and business goals.

Introducing UptimeRobot's official Terraform provider

We’re excited to announce the official release of the UptimeRobot Terraform provider, a feature that many of you have been requesting. Starting today, you can manage your UptimeRobot resources, including monitors, alerting integrations, maintenance windows, and public status pages, directly in your Terraform configuration. Let’s take a closer look.

Building Resilient Government IT: Strategies for Secure, Compliant, and Scalable Connectivity

As Australian government agencies progress in their digital transformation journeys, how can IT leaders innovate without compromising compliance, sovereignty, or operational stability? This blog was originally published on PublicSectorNetwork.com.au on 11th June 2025 and republished with permission.

Building Stronger Tech Communities: Foundations, Success Stories, and Future Directions

Discover the secrets to building thriving tech communities in Austin. Join industry experts Cherie Werner (FIESTA), Kasey Randall (Good Code), Cyndi Schultz (Capital Factory), Laura Santamaria (Red Hat), and Emily Gupton (SKG Texas) as they share their insights on community building, personal success stories, and the future of tech communities. Recorded at Civo Navigate Austin 2025, this panel discussion offers a unique perspective on the challenges and opportunities of building strong, inclusive, and supportive communities.

AI for Data Analytics: Unlocking the Power of Data Insights

According to McKinsey, 78% of organizations have implemented AI in at least one core function, with data analytics leading that transformation. AI no longer supports analytics from the sidelines; it now directs how data is queried, modeled, and delivered. It forecasts outcomes, detects anomalies, and reveals real-time insights, often before a dashboard loads. For SQL developers and DBAs, this marks a new phase in data work.

SQL Server AI Guide: Best Tools, Benefits & Use Cases

SQL Server AI marks the beginning of a new era for data-driven businesses. What was once thought of as a future prediction is now actively transforming database management. With outstanding features like natural language processing, predictive analytics, and automated error handling, managing data with artificial intelligence in SQL Server is faster, easier, and better. If you are a database developer, data analyst, or DBA, using SQL Server AI is no longer optional; it’s a productivity necessity.

How to Use AI for MySQL: Optimizing Queries and Database Management

Imagine telling your database to get the best five customers by order from six months ago, and you get a well-optimized query instantly. Zero coding. No Googling syntax. Just results. Welcome to the future of database management with MySQL AI. MySQL Artificial Intelligence is a strategic solution transforming the traditional method of managing databases.

Best Ways to Find Troublesome Containers and Virtual Machines Using Cycle's Portal

The best problems are the ones you never have to deal with. That's why smart teams catch issues early on, before they impact production. Cycle gives great visibility to spot troublesome workloads, control resource usage, and take action before things go sideways.

Addressing Security Concerns in Mobile Device Management with AirDroid Business

In this video, we will explore how AirDroid Business addresses security concerns in mobile device management, allowing IT professionals to manage multiple devices seamlessly. AirDroid Business offers robust features like remote lock, remote wipe, and application management to ensure your mobile data is secure. With efficient policy deployment, you can enforce security protocols effortlessly across all devices.

Scaled Kubernetes Resource Management Requires Cross-Team Collaboration

As organizations scale their Kubernetes infrastructure, one truth becomes clear: no single team can optimize it alone. Efficiency, resilience, and cost-effectiveness in Kubernetes environments depend on the collective effort of multiple personas, each bringing essential knowledge and responsibility. But it’s not just about division of labor. It’s about active collaboration across roles to unlock the full potential of the platform.

Automating Kubernetes Resource Optimization: Strategies for Efficient, Scalable Workloads

Kubernetes gives you the amazing power to deploy and manage containerized applications. But this power comes with a trade-off. Instead of letting you focus only on writing code and delivering features, Kubernetes also shifts the burden of resource optimization i.e., cost control, performance, and scalability, directly onto your shoulders. The answer to these challenges is automation. Automated optimization takes the guesswork out of resource allocation.

LangChain Observability: From Zero to Production in 10 Minutes

LangChain apps are powerful, but they’re not easy to monitor. A single request might pass through an LLM, a vector store, external APIs, and a custom chain of tools. And when something slows down or silently fails, debugging is often guesswork. In one instance, a developer ended up with an unexpected $30,000 OpenAI bill, with no visibility into what triggered it. This blog shows how to avoid that using OpenTelemetry and LangSmith. With this setup, you’ll be able to.

Balancing Reliability at the Crypto-Finance Frontier with Brian Shaw (Uphold)

Sylvain Kalache sits down with Brian Shaw, Senior Engineering Leader at Uphold, to explore the reliability challenges that arise when operating at the intersection of traditional finance and crypto markets. Brian shares how unexpected market events can create massive traffic spikes, how their platform architecture and Kubernetes setup help them stay resilient, and why Uphold's transparency and regulatory approach make them both trustworthy and a high-profile target.

How to Create a Database from Scratch: A Step-by-Step Guide

Knowing how to create a database is no longer just a backend task; it’s a core skill for building systems that perform under pressure and scale with complexity. With global data volumes expected to reach 175 zettabytes by 2025, developers must design systems that turn raw information into something meaningful, accessible, and usable. But what does that look like in practice? How do you go from raw data requirements to a reliable schema? Which engine should you choose?

Stop paying for Microsoft 365 licenses

When someone leaves your company, the natural step is to disable their Microsoft 365 account. But what many businesses don’t realize is that they often continue paying for that user’s license — just to retain access to their OneDrive files, Teams chats, and emails. Over time, this adds up to thousands in unnecessary costs. In this article, we’ll explain.

LangChain & LangGraph: The Frameworks Powering Production AI Agents

Your AI agent worked flawlessly in development, with fast responses, clean tool use, and nothing out of place. Then it hit production. A simple "What's our pricing?" query triggered six API calls, took 8 seconds, and returned the wrong answer. No errors. No stack traces. Unlike traditional systems, AI agents don't crash, they drift. They make poor decisions quietly, and your monitoring says everything's fine.

How to Run Elasticsearch on Kubernetes

Elasticsearch stands as one of the most robust open-source search engines available today. Built on Apache Lucene, it handles complex search operations, real-time analytics, and large-scale data processing with impressive speed and accuracy. Kubernetes has transformed how we deploy and manage containerized applications. This orchestration platform automates deployment, scaling, and operations of application containers across clusters of hosts.

The Complete Guide to APM Best Practices for Developers, DevOps & SREs

Application Performance Monitoring (APM) is no longer optional, it is essential for delivering fast, reliable, and seamless digital experiences. But simply installing an APM tool isn’t enough. To truly know its potential, IT teams need to follow APM best practices. Best practices for APM refer to the most effective ways to monitor, analyze, and optimize your application’s performance using APM tools.

Introducing Netdata Insights

Subscribe to the channel → / @netdata Now in research preview: Netdata Insights The problem: Incident? You're jumping between dashboards, piecing together timelines. Reporting? You're copy-pasting charts and correlating trends by hand. The data’s there, but turning it into a narrative doesn’t scale. The solution: Netdata Insights. Synthesizes high-fidelity telemetry using the latest LLMs into AI-powered reports with natural-language explanations, visuals, and clear recommendations.

Netdata: The Fastest Path to Full Stack Observability. AI Powered.

Netdata is a real-time, high-performance and on-premises observability platform designed to monitor metrics and logs with unparalleled efficiency. Netdata requires zero-configuration to get started, and provides alerts, anomaly detection and AI assisted troubleshooting out of the box, providing a powerful and comprehensive infrastructure monitoring experience. Netdata is known for its distributed design. Instead of funneling all data into a few central databases like most traditional monitoring solutions, Netdata processes data at the edge, keeping it close to the source.

10 Essential Things to Know Before Diving into Database DevOps

In today’s rapidly evolving world of development, Database DevOps is becoming an essential practice. It combines the agility of DevOps with the intricacies of databases, all with the goal of enhancing speed, stability and collaboration when it comes to database changes. However, before diving into Database DevOps there are some key concepts your team should get acquainted with. Here are 10 key things you should know to truly understand and reap the benefits from Database DevOps.

Rewriting the Same Controls-Over and Over Again? How FINOS and Kosli Are Fixing Software Compliance

Every bank needs to prove it’s compliant. So why is every bank reinventing the same rules? Manual, duplicative compliance across teams Engineers stuck gathering screenshots for audits Custom rules for common risks Missed opportunity to define shared standards Mike joins FINOS Aaron Griswold and explains why Kosli joined FINOS—and how defining shared SDLC controls can help regulated organizations stop wasting time and start delivering software faster and safer. Unpacking the real problems in regulated software delivery.

Is AI About to Create Its Own Language? Here's What You Need to Know!

This panel brings together experts Josh Mesout (Civo), Nobel Chowdary Mandepudi (Arm), Jimil Patel (Intuit), Numa Dhamani (iVerify), and James Gress (Accenture) to discuss the cutting edge of AI and machine learning. They explore when AI might develop its own language beyond human syntax, the evolving landscape of ML frameworks such as MLIR, Mojo, and JAX, and the challenges involved in bridging the gap from AI research to production while optimizing models for deployment.

Live Linux kernel patching with progressive timestamped rollouts

The apt package manager is responsible for installing.deb packages on Ubuntu LTS (long-term support) and interim releases, including the.deb package for the Linux kernel. Updating the kernel package requires a system restart, leaving systems vulnerable between the moment the Linux kernel package is installed and when the machine is rebooted.

30+ Essential Cloud Metrics For SaaS And FinOps Teams

Author Jeff Duntemann said a good tool improves how you work, whereas a great tool transforms your thinking. Companies that want to improve their cloud-based operations can rely on cloud metrics as an effective tool for transforming their cloud operations. You can’t fix what you don’t measure. Cloud metrics are the logs of data that a cloud infrastructure or application generates.

Top NetSuite Connectors in 2025 and How Businesses Use Them

NetSuite connectors unlock the real power of your ERP by bridging gaps between systems, data, and decisions. According to McKinsey, only 1 in 5 companies capture more than half the expected value from their ERP investments. The missing link? Integration. Core modules alone are not enough. Today’s ERPs must serve as connected command centers, linking systems, syncing data, and supporting real-time decision-making. NetSuite is no exception.

Zero Ticket IT Process Automation: Beyond the Service Desk

Traditional IT process automation has always promised faster, more efficient operations. But for years, it’s been largely synonymous with service desk workflows: password resets, access requests, and the like. Those are important, no doubt. But limiting automation to the service desk is like only automating the assembly line in a factory while leaving the rest of the production floor manual.

Logging in Docker Swarm: Visibility Across Distributed Services

Docker Swarm's logging model shifts from individual container logs to service-level aggregation. The docker service logs command batch-retrieves logs present at the time of execution, pulling data from all containers that belong to a service across your cluster. This approach gives you a unified view of distributed applications, but it comes with its patterns and considerations for effective observability.

How to Write Logs to a File in Go

When your Go application moves beyond development, you need structured logging that persists. Writing logs to files gives you the control and reliability that stdout can't match, especially when you're debugging production issues or need to meet compliance requirements. This blog walks through the practical approaches, from Go's standard library to structured logging with popular packages.

Departed M365 Users

When someone leaves your organization, the first step IT usually takes is to disable their Microsoft 365 account. But have you ever stopped to ask: The answer might surprise you. If you’re not actively managing this, Microsoft will automatically delete that data — often in as little as 30 days. This post explains exactly what gets deleted (and when), why this is a problem, and what you can do to protect that data — without paying for unnecessary licenses.

When Will We See the First $1 Billion Company Run by a Single Individual?

It’s only a matter of time. OpenAI CEO Sam Altman said in 2024 that he thought this could be achieved by the end of 2026. Personally, I feel this is a little optimistic; however, based on the evidence I’ve seen, it won’t be long after that. Consider Telegram: a global messaging giant with just 30 employees, already achieving a remarkable $1 billion in revenue. Or Midjourney, revolutionizing creative industries with only 40 employees and generating an impressive $500 million.

Making AI scalable with database change management and Redgate Flyway

With the rise of AI and machine learning comes data. Lots of it. For organizations today, AI is radically changing the way data is accessed, maintained and operationalized. For heads of architecture and development teams, it offers opportunity and responsibility.

What Impacts GKE Pricing? A Guide To Kubernetes Spending

Google Cloud released Google Kubernetes Engine (GKE) as a commercial version of native Kubernetes (K8s). GKE promises a user-friendly, reliable, and cost-effective service. Yet calculating GKE costs can be daunting, including understanding what you’re paying for and maximizing your return on investment. In this GKE pricing guide, we’ll discuss how GKE pricing works, what it costs, and more.

Chiseled Ubuntu containers for OpenJRE 8, 17 and 21

Today we are announcing chiseled containers for OpenJRE 8, 17 and 21 (Open Java Runtime Environment), coming from the OpenJDK project. These images are highly optimized for size and security, containing only the dependencies that are strictly necessary. They are available for both AMD64 and ARM64 architectures and benefit from 12 years of security support.

Signals Is Lighting Up the Future of On-Call: Eight (Yes, 8!) New Features Just Released

We’re going beyond notifications — and building the most powerful, flexible, and team-first on-call experience on the market. When we launched Signals, it was because alerting and on-call desperately needed a reset. Legacy tools hadn’t evolved with the way modern teams work — they were individual-centric, inflexible, and wildly overpriced. Signals changed that.

Do You Know How Many IPs Your CIDR Block Really Has? Understanding Network Capacity and Allocation

Many people use CIDR blocks every day without knowing exactly how many IP addresses they actually have. A CIDR block like /24 gives exactly 256 possible IP addresses, while a /29 block gives 8, and each size gives a different number you can count on for planning. Not understanding this can leave networks overcrowded or wasteful, leading to problems later.