Operations | Monitoring | ITSM | DevOps | Cloud

Three Years a Leader. Thank You.

Dear Nexthink community, We are excited to be named a Leader in the 2026 Gartner Magic Quadrant for Digital Employee Experience Tools for the third year in a row. I want to share this recognition with our customers, our partners and ecosystem, and every Nexthinker across the world. As a founder, it’s a true honor to work alongside so many talented people. To us, this recognition is also yours.

It Can Only Goodhart Happen

When a measure becomes a target, it ceases to be a good measure. Charles Goodhart, 1975 You’ve probably read this quote in relation to any number of things over the years. People complaining about arbitrary metrics like PRs merged, lines of code produced, and now, token usage. But is the era of tokenmaxxing over before it even began? The rise of token leaderboards to the death of token leaderboards at companies like Amazon seem to have taken place in less than three months!

What is SRE Observability and Key Pillars You Should Know?

What happens when a critical service slows down, but nothing is technically “broken”? Most teams have monitoring in place. They know when something goes down. But when performance drops or issues spread across services, finding the real cause becomes slow and unclear. Engineering teams end up switching between dashboards, logs, and alerts just to understand what changed. This delays response and increases pressure on on-call teams. This is where SRE observability becomes essential.

New: Introducing the StatusGator Chrome extension

We’re excited to announce the launch of the StatusGator Chrome extension, a new way to check the status of websites and online services directly from your browser. Whether you’re troubleshooting an issue, wondering if a website is down, or looking for more information about an ongoing incident, the extension gives you instant access to service status information with a single click. Simply install the extension and start checking the status of websites and services as you browse.

API update: Full board management now available

We’re excited to announce expanded functionality for the StatusGator Boards API. You can now create new boards, update existing boards, and delete boards directly through the API. Previously, the Boards API only supported listing boards and retrieving board details. With these new capabilities, you can automate the complete board lifecycle – from provisioning new boards to managing ownership and cleaning up boards that are no longer needed.

Errors, traces, logs, metrics: when to reach for what

When should I reach for a log, a trace, or a metric? I hit that question constantly when I instrument code, and I watch coding agents hit it too. It sounds like it should be obvious. Errors, traces, logs, and metrics are the four kinds of telemetry most apps run on, four tools in one box, and they overlap enough that the honest answer is every developer’s favourite: it depends. You can stuff context into span attributes instead of logging it. You can count log events instead of emitting a metric.

Progress Wins at the Network Computing Awards

Progress has been named a winner at this year's Network Computing Awards, earning industry recognition for its ongoing commitment to innovation and delivering real-world value to customers. A standout event in the UK technology calendar, the Network Computing Awards celebrate organizations and solutions that are driving measurable impact across the industry.

11 Incident Management Best Practices Every IT Team Should Follow

A well-defined incident management process can mean the difference between a minor disruption and a major business outage. When critical services fail, every minute of downtime matters. Yet many IT teams still face challenges such as unclear ownership, poor prioritization, communication gaps, alert fatigue, and manual processes that delay resolution. The result is longer outages, missed SLAs, and frustrated users.

Turning Disconnected Alerts into Actionable Insights

The previous post in this series focused on shared context and why hybrid operations depend on a connected view across cloud, network, and infrastructure. Once that context is in place, the operational benefits become easier to see—especially during incident response, where signal volume and fragmented tooling can slow teams down. Alert noise remains one of the most persistent challenges in hybrid environments. Every layer of the stack can generate its own warnings, anomalies, and service events.