Operations | Monitoring | ITSM | DevOps | Cloud

Implementing Grafana Play privacy policies with Grafana k6: A behind-the-scenes look

Grafana Play is a free and publicly accessible sandbox environment that allows users to explore and learn Grafana without setting up their own instance. Grafana Play comes preloaded with ready-made sample dashboards, and showcases how to work with different data sources, create visualizations, and use advanced Grafana features.

Apache Spark security: start with a solid foundation

Everyone agrees security matters – yet when it comes to big data analytics with Apache Spark, it’s not just another checkbox. Spark’s open source Java architecture introduces special security concerns that, if neglected, can quietly reveal sensitive information and interrupt vital functions.

The Future of IT Is Human + Agentic: How Zero Ticket IT Is Reshaping Tech Careers

Automation has always stirred up fears of job loss. For IT professionals, the conversation has only grown louder with the rise of AI. But the truth is that the future of IT is not about replacement—it’s about reinvention. For decades, IT has been defined by its firefighting: manually resolving tickets, managing endless alerts, and fielding repetitive service requests. These tasks are ripe for automation, but automation doesn’t eliminate the need for IT talent.

How to Monitor Kafka Producer Metrics

Your Kafka producer pushed a million messages yesterday. Nice. But can you tell if they all made it? Or why did latency spike at 2 PM? Producer metrics help you determine that. They expose how long messages take to send, whether messages are getting stuck, and whether retries are piling up. Let’s go over which ones help while debugging and how to monitor them.

Optimize and troubleshoot AI infrastructure with Datadog GPU Monitoring

As organizations bring more AI and LLM workloads into production, the underlying GPU infrastructure that supports these workloads becomes even more critical in ensuring these workloads remain fast, reliable, and scalable. Inefficient GPU resource usage, for instance, can lead to longer runtimes and reduced throughput, negatively impacting overall model performance. Additionally, idle and underutilized GPUs can quickly drive up costs and lead to needless spending.

Datadog MCP Server: Connect your AI agents to Datadog tools and context

As development teams adopt AI-powered tools and build services that make use of AI agents, they want to extend their AI capabilities to incorporate familiar tools and observability data. However, AI agents struggle with regular API endpoints and frequently fail when parsing complex nested JSON hierarchies or incorrectly handling errors. As a result, these agents often fail to retrieve relevant results.

Moving from Relational to Time Series Databases

I’ve been building apps with SQL Server for years. Everything worked well until I started dealing with sensor data, stock trade volume, and IoT telemetry. As the volume of time-stamped records grew into the millions, I saw relational databases struggling with workloads they weren’t designed for. That’s when I explored time series databases. The performance improvements were significant, but what surprised me was the mental shift required.

Opsgenie Is Shutting Down: Why FireHydrant Is the Natural Evolution

Opsgenie set a high bar. For years, it helped teams respond faster and stay on top of incidents with reliable alerting and on-call management. At FireHydrant, we’ve always admired how Opsgenie modeled incident data and structured its workflows — it was one of the best in the game. But as Atlassian sunsets Opsgenie and teams face the pressure to migrate, there’s a real decision to make: move into Jira Service Management, or find a new solution that fits your team’s needs and scale.