Operations | Monitoring | ITSM | DevOps | Cloud

Analyze wait events and in-flight queries with the Datadog Database List

When you’re operating databases at scale, being able to get real-time insights across all your databases is essential for addressing issues and identifying areas for optimization. Datadog Database Monitoring’s Database List allows you to monitor your entire database fleet in one place, so you can quickly identify and troubleshoot overloaded hosts and gauge the impact of problematic queries throughout your infrastructure.

Hardware vs. IT vs. Software Asset Management - Why the need for specific asset monitoring tools?

Hardware asset management (HAM), IT asset management(ITAM), and software asset management (SAM) are all closely related fields that deal with the maintenance of IT assets in an enterprise. Asset management is managing and tracking the lifecycle of all your enterprise’s assets, from physical to digital. They ensure your enterprise has full visibility into its assets to make informed decisions about what it needs to do with them.

Application Snapshots: A Valuable Observability Signal for Developers

Monitoring is often not the first thing on the mind of the modern developer. Yet, it’s necessary at many points of the software development lifecycle, including: before deprecating an API, before launching a new feature, after launching the feature, and more. In fact, monitoring needs can vary much more than the classic Ops monitoring.

3 Enterprise IT Factors That Will Make MSPs More Successful in 2022

A version of this blog first appeared in APMdigest. A new study by OpsRamp on the state of the Managed Service Providers (MSP) market concludes that MSPs face a market of bountiful opportunities but must prepare for growth by embracing complex technologies like hybrid cloud management, root cause analysis and automation.

Why More Incidents Are Better

Ask most SREs how many incidents they’d have to respond to in a perfect world, and their answer would probably be “zero.” After all, making software and infrastructure so reliable that incidents never occur is the dream that SREs are theoretically chasing. Reducing actual incidents by as much as possible is a noble goal. However, it’s important to recognize that incidents aren’t an SRE’s number one enemy.

Why Operational Maturity Helps Businesses Reduce the Great Resignation Trend

The past few years have led to fundamental business and cultural shifts for both companies and employees. Covid-19 has brought opportunities for companies who invested early in digital operations, while others struggled to maintain the status quo. The latter gave rise to record employee burnout, and what is now commonly referred to as the Great Resignation.

How to monitor nginx in Kubernetes with Prometheus

nginx is an open source web server often used as a reverse proxy, load balancer, and web cache. Designed for high loads of concurrent connections, it’s fast, versatile, reliable, and most importantly, very light on resources. In this article, you’ll learn how to monitor nginx in Kubernetes with Prometheus, and also how to troubleshoot different issues related to latency, saturation, etc.

How to Get Ahead of Recurring IT Tickets (Use Case)

Learn how one Financial Institution stopped a flood of recurring IT tickets in its tracks with an automated 1-click fix When an L1 agent faces a mounting pile of IT tickets, it is hard to be anything but reactive. They need to resolve the issue as fast as possible and restore employee productivity. But when it comes to resolving the same IT ticket over and over again, no Service Desk team should have to do that more than once.