Operations | Monitoring | ITSM | DevOps | Cloud

Modernizing Data Centers for AI: Bridging Observability, Cost Control, and Intelligent Automation

Attend our webinar on April 3 to see our latest innovations live. Register IT Operations are more complex than ever, with modern data centers spanning on-premises, containers, multi-cloud environments, and AI-powered infrastructure. The rapid expansion of data sources has created an overwhelming volume of information, making manual monitoring across multiple tools impractical. Visibility gaps slow down troubleshooting and delay critical decisions, impacting business performance.

Server Monitoring Explained: How to Outwit Downtime Before it Strikes

Server monitoring is the practice of continuously tracking server health, performance, and resource usage to catch issues before they cause downtime. When a server crashes, it can mean lost revenue, frustrated users, and a mad scramble to fix the problem. The right server monitoring tool helps your IT team stay ahead by providing real-time alerts and visibility into critical metrics. In this guide, we’ll break down how server monitoring works, why it matters, and what to look for in a solution.

Building optimized LLM chatbots with Canonical and NVIDIA

The landscape of generative AI is rapidly evolving, and building robust, scalable large language model (LLM) applications is becoming a critical need for many organizations. Canonical, in collaboration with NVIDIA, is excited to introduce a reference architecture designed to streamline and optimize the creation of powerful LLM chatbots. This solution leverages the latest NVIDIA AI technology, offering a production-ready AI pipeline built on Kubernetes.

Unlocking Edge AI: a collaborative reference architecture with NVIDIA

The world of edge AI is rapidly transforming how devices and data centers work together. Imagine healthcare tools powered by AI, or self-driving vehicles making real-time decisions. These advancements rely on bringing AI directly to edge devices. However, building a robust architecture for diverse edge environments presents significant hurdles. This blog introduces our new reference architecture, designed to simplify edge AI deployment.

Using CircleCI to test and deploy Python serverless functions on Microsoft Azure

Serverless computing simplifies app development by abstracting away server management. Azure Functions provides a robust platform for event-driven, on-demand code execution. In this tutorial, we’ll create and deploy a Python-based Azure Function—one that parses incoming JSON—using CircleCI. For a more granular and enable programmatic access to Azure resources, we’ll use service principal for secure authentication and the Azure CLI orb to streamline our CI/CD pipeline.

Proactive Monitoring: How DinoCloud Uses CloudWatch to Save Clients Money

At MetricFire, we love talking with engineers about their tech stacks, SRE challenges, and how they approach infrastructure monitoring. Recently, we had a great chat with Yoimer Roman from DinoCloud, a Latin American company that helps clients make smarter business decisions by leveraging AWS CloudWatch monitoring. Yoimer wears many hats: mentoring his team on all things AWS, designing custom cloud environments, and bridging the gap between technical challenges and non-technical stakeholders.

IT Governance Software: Best Options and Key Features to Look For

IT governance software helps organizations take control of their IT strategy by providing frameworks, automation, and monitoring tools to oversee IT operations. This software is the best ally for managing IT effectively: a proces that requires more than just keeping systems running — it involves aligning technology with business goals, addressing security risks, and meeting compliance standards.

Rethinking WhatsApp Alerts - A Data-Driven Approach

WhatsApp has become a major alerting channel for incident response teams. It's popular and for many, a great alternative to SMS. In our 2024 recap, we mentioned how Spike sent over 25,000 alerts on WhatsApp. It is now the 2nd most used alert channel for responders on Spike (rising from 4th spot in 2023). But... I will be the first one to admit – the WhatsApp alerts experience needed work to help responders react to incidents quicker!

What Your Mobile Devices and Favorite Jeans Might Have in Common

We can all agree that we need our mobile devices to be as secure as possible. No one wants to be hacked. No one wants to deal with the fallout of a breach. If you’re a small business owner, you could be out of business in six months because of how hard it is to recover from a single cybersecurity incident. If you’re in charge of a larger business, you might have to clean up the damage caused from leaked data for years.

Announcing HAProxy ALOHA 17.0

HAProxy ALOHA 17.0 is now available, delivering powerful new features that improve UDP load balancing, simplify network management, and enhance performance. With this release, we’re introducing the new UDP Module and extending network management to the Data Plane API, a new API-based approach to network configuration. The Network Management CLI is enhanced with exit status codes and contextual help.