Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Chasing the Rainbow: Towards Unified Service Metrics

As Zendesk migrated from a monolithic application to an ecosystem of hundreds of services, its need for fully unified and standardized observability became a chief concern. In this talk, Senior Principal Engineer Daniel Schierbeck shares how adopting a service mesh has helped Zendesk teams manage its growing number of services while standardizing its observability. He also explains how Zendesk’s approach to monitoring service interactions has evolved as it adopted Datadog metrics and Datadog APM.

Manage metrics & logging costs with Grafana Cloud + Log Volume Explorer demo | ObservabilityCON

Are your SRE and platform teams under pressure to ingest fewer metrics and logs in the name of cost savings? Reducing costs does not have to mean reduced observability. This recording walks through the cost management features in Grafana Cloud that allow you to analyze, attribute, monitor, and optimize your metrics and logs usage – and lower costs – without compromising your observability strategy.

How Pipedrive switched its observability stack to OpenTelemetry & LGTM | ObservabilityCON 2023

The cloud-based CRM company Pipedrive has been relentlessly modernising its observability stack, first adopting Grafana visualisation and Grafana Mimir for Prometheus metrics, then recently completed a migration of its distributed tracing from a third-party SaaS provider to OpenTelemetry and Grafana Tempo, and its logging stack from Graylog to Grafana Loki. Along the way, the team developed its own in-house library to include OpenTelemetry in its roughly 750 microservices.

Grafana SLO Demo: Prioritize critical resources with SLO-driven IRM | ObservabilityCON 2023

A majority of respondents in our Observability Survey said they were using SLOs or moving in that direction. For good reason: By highlighting the most critical error budget burndown, service level objectives (SLOs) can help you prioritize performance issues based on business impact. In this recording, Josh Abreu Mesa and Reem Tariq walk through how Grafana Cloud’s integrated SLO and Incident Response Management (IRM) capabilities can help you identify the most important issues and resolve them quickly.

User-centered observability: load testing, real user monitoring & synthetics | ObservabilityCON 2023

Understanding your end users’ experience with your applications and services is critical, and there are a variety of tools to help. But there are also a number of different use cases: During development or in production? Simulate user behavior or monitor real user behavior? What should you use and when? This recorded session explores when and how to apply load testing, synthetic monitoring, and real user monitoring to gain insights into the end user experience of your critical applications.

Application Observability and Beyla Demo | ObservabilityCON 2023

In cloud native environments, finding and resolving issues across services and between application and infrastructure dependencies can be challenging. In this recording, we provide demos on Grafana Cloud’s latest capabilities for correlating application and infrastructure observability: Application Observability and Beyla — both generally available. You will hear how Grafana unifies and contextualizes service relationships and application and infrastructure dependencies to help you resolve problems faster.

5 Simple Steps to Reduce Your AWS S3 Bill

Understanding your AWS S3 billing is crucial to effectively manage and reduce your costs. Charges in AWS S3 are primarily based on three factors: the amount of data you store, the number of requests you make, and data transfer fees. Storage costs are calculated per gigabyte (GB) stored, which are tiered depending on the total size of your data. Requests costs are incurred with each put, get, or list operation on your objects, with prices varying based on the type of request.

Centrally govern and remotely manage Datadog Agents at scale with Fleet Automation

As customers scale to thousands of hosts and deploy increasingly complex applications, it can be difficult to ensure that every host is configured to give you the visibility you need to monitor your infrastructure and applications. To ensure visibility across a growing number of hosts, you need to know that your observability strategy is implemented uniformly across your entire fleet of Datadog Agents installed on these hosts.

Secure and monitor infrastructure networking with Buoyant Enterprise for Linkerd in the Datadog Marketplace

As organizations adopt Kubernetes, they face gaps in security, reliability, and observability such as unencrypted communication, lack of multi-cluster support, and missing reliability features like circuit breaking. Buoyant Cloud is the dashboarding and automated monitoring component of Buoyant Enterprise for Linkerd, which helps organizations secure and monitor communication between Kubernetes workloads.