Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Making the Most of MQTT - Native Collector or Telegraf?

When it comes to IoT data, MQTT is a superstar. With so many IoT devices generating data out in the world, developers need ways to access it. After all, data lies at the heart of every application. But data doesn’t just magically manifest itself into your datastore, and building the right data pipeline can make or break an application. Data collection is not a one-size-fits-all problem to solve.

5 Ways to Report on Your SLA Obligations

Service Level Agreements are designed to foster trust between your customers and your business. They help define the maximum amount of downtime your team finds acceptable. While they can have legal repercussions, SLAs are fundamentally about trust. Your customers use your service because you’re the best at what you do. They remain loyal because they trust you to do what they need. Retention in SaaS is very fickle, and competition in certain spaces is quite stiff.

Is your plugin compatible with Grafana? There's a tool for that!

Here at Grafana Labs, we’re always striving to reduce the amount of effort needed to maintain plugins across different versions of Grafana. That is why we’re excited to provide you with a tool to check the compatibility of your plugin with the latest Grafana plugins API. We know that it can be frustrating for developers to find out people can’t use their plugins. Over the past few months, we’ve been working on detecting the breaking changes as soon as they happen.

IT Monitoring for Government

Today’s blog comes from Kevin Howell, CEO of UK partner – Howell Technology Group (HTG) about their work supplying secure cloud technologies and remote working solutions to government and regulated customers. HTG are a trusted industry leader in the UK, who offer virtual desktops, managed services and efficient modern workplace solutions. Their solutions are also available with the UK Government’s Digital Market place under the G-Cloud Framework.

AIOps for Real: Characteristics of a Platform That Add Value and Drive Change

When you’re investing in automation solutions, ultimately, tangible results need to follow quickly. Getting a return on investment (ROI) out of an automation project after two years is something that would have been OK in the not-so-distant past but is no longer acceptable nowadays. With the current speed of change, where new technologies come and go and existing ones evolve at lightning speed, IT teams require much faster time to value on automation investments.

What is Distributed Tracing vs OpenTelemetry?

There are a few key differences between distributed tracing and OpenTelemetry. One is that OpenTelemetry offers a more unified approach to instrumentation, while distributed tracing takes a more granular approach. This means that OpenTelemetry can be less time-consuming to set up, but it doesn’t necessarily offer as much visibility into your system as distributed tracing does.

Online Learning: a Novel Approach to Applying Machine Learning in Splunk

Most classical, batch-oriented machine learning systems follow the paradigm of “fit and apply”. In an earlier blog post, I discussed a few patterns on how to better organize data pipelines and machine learning workflows in Splunk. In this blog, we’ll review how you can organize your machine learning model in a new way: online learning.

Accurately Forecasting Cloud Costs for FinOps

Companies are investing heavily in the cloud for the operational and financial benefits. But without a robust cloud cost management strategy in place, the complexity of cloud services and billing can to overspending and unnecessary cloud waste. Being able to accurately predict future cloud spend is one way to more optimize cloud spend and inform budgets.

Web Endpoint Monitoring

In today’s world, a significant fraction of a software business’s reputation depends on its web application and its speed. It all comes down to how fast your server responds to client requests (assuming your application is reliable and reasonably user-friendly). Therefore, you could argue that the server endpoint is the centerpoint of all the server-side action — the operations here primarily determine the performance of your application.