Operations | Monitoring | ITSM | DevOps | Cloud

New in Grafana Alerting: a faster, more scalable way to manage your alerts in Grafana

Effective alerting is the backbone of any observability strategy. But as your systems grow, managing hundreds or even thousands of rules can become a significant challenge. And when something goes wrong, the last thing you want is to fight with your tooling. That’s why we’re thrilled to announce the launch of our brand new alert rules list page, which we built to provide a faster, more intuitive, and scalable experience for teams of all sizes!

Getting started with MongoDB dashboards

MongoDB is a popular NoSQL database used by many modern web applications. Once your web application is up and running, you might find you need to monitor the application data for operational purposes. For example, you may need to report on user sign-ups, or monitor for problems like invalid data. SquaredUp is an easy-to-use dashboard that plugs directly into your MongoDB database to visualize and monitor your data.

Patterns for safe and efficient cache purging in CI/CD pipelines

"There are only two hard things in Computer Science: cache invalidation and naming things."—Phil Karlton In the age of increasingly frequent deploys, edge caching, and Jamstack adoption, caching plays a key role across the software delivery life cycle. In build and CI pipelines, caching compiled assets or dependencies helps reduce compute costs, speed up job runtimes, and lower the environmental impact (regarding energy usage) of repeated builds.

Breaking through the Senior Engineer ceiling

You’ve made it to Senior engineer. Now what? You’re now staring at the next level, Staff typically, sometimes Principal, or whatever your company calls it. The path feels murky. Your manager gives you feedback like “show more technical leadership” or “think bigger picture”, but what does that actually mean day-to-day? I’ve been there. I’ve also been on the other side, helping engineers grow through whatever explicit (or implicit) levels a company has.

Secure by Design: IT Modernization for Government

As government agencies modernize IT infrastructure, many are shifting to hybrid and multicloud environments. But this evolution brings heightened exposure to cyber threats. For the public sector, where data protection is tied to national security and public trust, compliance is more than a box to check—it’s the front line of defense. FedRAMP (Federal Risk and Authorization Management Program) provides a standardized framework for securing cloud services used by U.S. agencies.

Resilience with Zero Data Loss in High-Volume Telemetry Pipelines with OpenTelemetry and Bindplane

This was the problem one Bindplane customer had with processing enormous S3-stored log files. Our engineering team tackled the problem head-on, enhancing the S3 event receiver with offset tracking and chaos testing methodologies.

Goodput vs Throughput: The Differences and How They Affect Your Network

Two key metrics that often come up in discussions about network performance are throughput and goodput. While these terms may seem similar, they highlight different aspects of your network’s efficiency and misunderstanding them can lead to poor decision-making that can impact the way you manage your network and your business’ resources.

PostgreSQL Performance: Faster Queries and Better Throughput

A PostgreSQL setup that performed well with 10,000 users starts to show strain at 100,000. Queries that once returned in under 50ms now take over 2 seconds. The connection pool regularly hits its limit during peak usage, leading to timeouts and degraded performance. This blog focuses on practical ways to reduce query latency by 50–80% and increase throughput for high-concurrency environments.

Leaning into AI, ML, and observability to manage your ever-growing infrastructure

The complexity and scale of modern infrastructure requires an equally intelligent set of observability tools to effectively monitor it. Remember when scaling meant ordering new servers and racking them in a data center? Remember when cloud providers first offered access to seemingly infinite virtual machines at the click of a button? Remember when Kubernetes made it trivial for infrastructure to automatically scale itself based on demand?