Operations | Monitoring | ITSM | DevOps | Cloud

Declarative Configuration in OTel (Grafana OpenTelemetry Community Call #1)

We’re kicking off a brand-new Grafana OpenTelemetry Community Call! Join us as we dive into getting observability into your apps and infrastructure with Grafana, powered by OpenTelemetry. In this session, we’ll dive into Declarative Config — the new way to make OpenTelemetry onboarding simple and powerful. Instead of juggling environment variables or boilerplate in your startup code, declarative config gives you a clean, language-agnostic approach that works across SDKs and unlocks future possibilities like remote configuration. Join us with Marylia Gutierrez (OTel JavaScript approver & core contributor) to explore.

How Atlassian built a smarter observability system with Grafana and OpenTelemetry

Discover how Atlassian built OpsDeck, an observability platform powered by Grafana, to automate incident detection, improve response time, and reduce troubleshooting from one hour to under a minute. Hear how the Observability Insights team scaled OpenTelemetry, broke silos, and built smarter workflows for both engineers and support.

Demystifying WMI Permissions

Network administrators are always seeking to gain a deeper understanding of their Windows-based environments. Windows Management Instrumentation (WMI) enables their network monitoring tools to access system information, manage configurations and automate tasks. It provides a vital role in network monitoring by providing a standardized interface for querying and controlling system components. A complex set of permissions governs WMI access.

Clarity in the Dojo: The power of the Summary Agent

In the dojo, not every role is about throwing punches. Some roles are about awareness, the unmistakable voice that tells the fighter when to move, where the strike is coming from, and why the opponent matters. That’s the role of the Summary Agent in Sumo Logic Dojo AI. Unlike a traditional agent, it doesn’t launch queries or carry out actions on its own. Its purpose is to narrate, not act. In doing so, it becomes the foundation for every other decision in the dojo.

How to manage ilert call flows via Terraform

Call flows let you design voice workflows with nodes like “Audio message,” “Support hours,” “Voicemail,” “Route call,” and much more. The ilert Terraform provider now includes a ilert_call_flow resource so you can version and promote these flows across environments. This blog post offers an overview of managing call flows in Terraform, detailing the benefits and key scenarios.

Why your Kubernetes clusters and GPUs should live under one roof

The world remains abuzz with AI hype, but the reality is that most modern applications aren’t purely AI workloads. The average company will have web services, APIs, databases, and background jobs running alongside its machine learning inference or training components. An architecture question everyone faces: should your Kubernetes cluster and GPU compute live in the same data center, or can you split them across providers?

What Is Incident Response Lifecycle?

The Incident Response Lifecycle is a step-by-step process that helps engineering teams detect, respond to, and recover from unexpected system disruptions or outages. It includes a series of six practical stages: Detection, Analysis, Impact Mitigation, Incident Resolution, Service Restoration, and Post-Incident Analysis. By following this lifecycle, teams can minimize downtime, reduce business impact, and continuously strengthen system reliability.

What Is Business Continuity?

A single outage can stop operations, affect customers, and impact trust. In a world of pandemics, cyberattacks, weather events, and supply chain delays, your team cannot pray that something does not break. Business continuity drives your team to stay ready, recover earlier, and keep downtime lower. In this blog, we’ll explain what business continuity means, how to create a solid business continuity plan, and which approaches help teams keep operational during a disruption event.

Why SELinux Matters in Enterprise Security

When evaluating cybersecurity products, it's easy to focus on surface-level features like dashboards, alerts and integrations. But real strength often lies more deeply, in the architecture itself. One embedded capability that demonstrates rigorous security design principles is Security-Enhanced Linux (SELinux). Originally developed by the U.S. National Security Agency (NSA) and released to the open-source community, SELinux is a mandatory access control (MAC) framework built into the Linux kernel.