Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

MTTR vs. MTBF vs. MTTF: Understanding Failure Metrics

In the dynamic landscape of software and web applications, failures can have severe consequences, impacting user experience, business continuity, and overall performance. To proactively address these challenges, organizations rely on robust monitoring practices supported by failure metrics. Failure metrics, specifically tailored to software and web application monitoring, provide crucial insights into system health, reliability, and optimization opportunities.

Experience This! What is the Importance of Application Experience?

If you have ever worked in a kitchen, you know how tough it is to be short-staffed. Cooks have to work twice as hard and their performance suffers, leaving not-so-happy customers and comped meals for the complainers. It’s similar to how applications operate. Maybe an application is glitching out from poor coding on the backend or bogging from an influx of data coming in. In either case, the end-user application experience suffers.

Q2 Round Up: Roadmap Review & Q3 2023 Look Ahead

Many thanks to everyone who joined us for our recent virtual meetup, during which we discussed some of our Q2 2023 highlights, including features highlights, the 2023 roadmap for VictoriaMetrics and of course: The launch of VictoriaLogs! In this blog post, we’d like to share a summary of these highlights.

Micro-Outages Uncovered: Exploring the Real Cost of Downtime for Your Business

Unplanned downtime is an eventuality every business tries to avoid but will face. In today’s digitally interconnected world, outages can be particularly damaging, especially if the business is unprepared. Not only can outages cause employee frustration and anger customers, leading to numerous intangible costs like lower satisfaction hurting a company’s reputation, but the loss of employee productivity caused by unplanned downtime can significantly affect the bottom line.

Should you DIY your Opentelemetry Monitoring?

I recently read this thread in the CNCF slack from someone wanting to send metrics and traces directly to Postgres. Reasonable enough right? After all once your data is in postgres you can query it to your heart’s content. And isn’t the general culture of OpenTelemetry that you should be able to do all of Observability without resorting to SaaS tools? The thread, however, is pretty universally opposed to this approach; and I have to say that I agree.

What is Chronograf?

InfluxDB is an open-source time-series database, i.e. a database optimized for storing data points collected across an interval of time. Developed by InfluxData, InfluxDB is intended for fast, high-availability storage and retrieval of many different system metrics. The entire InfluxDB project, which is housed at influxdata.com, includes: Yet with all of these tools for collecting and processing time-series data, there's still one step missing—visualizing it. That's where Chronograf comes in.

Server performance metrics: 11 to consider for actionable monitoring

With the DevOps movement becoming mainstream, more and more developers are getting involved with the end-to-end delivery of web applications, including deployment, monitoring performance, and maintenance. As an application gains more users in a production environment, it’s increasingly critical that you understand the role of the server.

Azure Unit Cost Analysis for Cloud cost optimization

As organizations embrace cloud computing, understanding, and optimizing costs becomes essential. Azure, Microsoft’s cloud platform, offers various services and features to help you manage your cloud expenses effectively. One powerful technique to achieve this is by performing Azure unit cost analysis. In this blog post, we will explore the concept of unit cost analysis and provide a step-by-step guide on performing it in Azure.

Distributed tracing for testing with Grafana Tempo and Tracetest (Grafana Office Hours #05)

Did you know you can use distributed tracing for testing with Grafana Tempo and Tracetest? Distributed tracing can really help you drill down from metrics to root causes, but how can you automate it? Adnan Rahić, Senior Developer Advocate at Tracetest.io, shares how you can do just that, using Grafana + Grafana Tempo + Tracetest.