Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Save Hours on Troubleshooting with Automated Investigations

How many times has your team stared at a dashboard, pointed to a spike, and asked a question that charts alone can’t answer? “What was the real impact of that deployment?” “Why are our Kubernetes pods in the us-east-1 cluster suddenly crashing?” “Are we wasting money on overprovisioned servers?” Answering these questions is the real work of operations and SRE.

Tutorial: How to Remediate Vulnerabilities with Puppet Enterprise Advanced Patching

The rate at which vulnerabilities are being exploited is on the rise. The VulnCheck company, which specializes in vulnerability intelligence, found that in Q1 2025, 28.3% of vulnerabilities were exploited within 1 day of CVE disclosure. Keeping your systems up to date is more important than ever. The reality is that many security teams are running scans and then exporting to giant spreadsheets, which are “tossed over the wall” to the Operations team with little context.

How to Block Apps on Android Business Devices?

Are you an IT administrator looking for an efficient way to manage company-owned Android devices? This video provides a step-by-step guide on how to block apps on Android devices to boost employee productivity and maintain security. In a business environment, a clear app usage policy is essential for compliance and focus. We'll show you how to easily set up an App Blocklist using the AirDroid Business MDM solution.

Product Klip: Istio Developer Dashboard

Troubleshooting issues in a complex service mesh environment, such as traffic failures or authorization problems, often requires the expertise of an SRE or DevOps professional. However, Komodor simplifies this process. Komodor provides developers with the necessary visibility to diagnose service mesh issues on their own. It helps developers easily identify blocked connections and understand the root cause without having to review logs or configuration files.

Netdata Now Troubleshoots Your Alerts for You

The 2 AM pager alert. For anyone in Ops, SRE, or IT administration, those words trigger a familiar sense of dread. An alert has fired. Is it a real fire, or another false alarm waking you from a dead sleep? The pressure is on. Every minute of downtime costs money and reputation, but troubleshooting a complex system when you’re sleep-deprived is a Herculean task.

AI Agent Is Hitting Your APIs - Are You Ready?

It’s no longer theoretical – artificial intelligence has left research labs and entered production systems, generating a new breed of consumers – autonomous and intelligent agents. These autonomous AI agents are increasingly interacting with real-world APIs (application programming interfaces), which are sets of protocols and tools for building and integrating software applications.

The strategic art of build vs. buy in software delivery ft. Tara Hernandez of MongoDB

Rob Zuber sits down with Tara Hernandez, VP of Developer Productivity at MongoDB and former Netscape engineer who helped create early continuous integration systems, to explore strategic frameworks for build vs. buy decisions in modern software delivery.

Jaeger Monitoring: Essential Metrics and Alerting for Production Tracing Systems

Your Jaeger setup is running. Traces are coming in, and the UI is helping you spot slow services or debug broken flows. But just like any part of your observability stack, Jaeger needs some basic monitoring to stay reliable. If the collector starts queueing spans or the agent runs out of buffer, it can lead to dropped traces, sometimes without any obvious sign in the UI. This blog focuses on the operational side of Jaeger.

Building your AI infra, our tips

Modular architecture: Decouple compute from storage so each can scale independently. This makes it easier to adapt to growing or shifting workloads over time. Future-ready hardware: Select GPUs and CPUs not just for current workloads but with an eye on scalability, including support for newer accelerator types. Scalable design: Ensure the system allows seamless addition of compute nodes or storage without a full redesign.