%term

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

FinOps and Cloud Cost Optimization

Nov 8, 2022 By Datadog In Datadog

As companies scale, it’s become increasingly important to keep cloud cost management and optimization top of mind. In this talk, Yuval Yogev from Sygnia walks you through Sygnia’s optimization journey of cutting their total cloud costs in half. Yogev also shares insights into how you can optimize your own organization’s cloud usage and spend.

View Video

Datadog

Read more about FinOps and Cloud Cost Optimization

Ask a Site Reliability Engineer (SRE)

Nov 8, 2022 By Datadog In Datadog

Site reliability engineering (SRE) can be complicated, and at Datadog, we’ve spent a lot of time thinking about SRE and refining how we implement it. Join Datadog’s Brandon West and Rick Mangi as they provide a brief overview of SRE and its core concepts. This video also contains a Q&A session from the live taping of this panel.

View Video

Datadog

Read more about Ask a Site Reliability Engineer (SRE)

Scaling Up, One Network Bottleneck at a Time

Nov 8, 2022 By Datadog In Datadog

Processing data at scale involves moving packets through a network—but what happens when that network isn't cooperative? Anatole Beuzon, a Software Engineer at Datadog, discusses how he investigated and resolved network issues in Datadog’s larger data-processing apps and how you can apply these same methods to your own production workloads.

View Video

Datadog

Read more about Scaling Up, One Network Bottleneck at a Time

I've Made a Huge Mistake: Implementing Agile on Infrastructure Teams

Nov 8, 2022 By Datadog In Datadog

Bad planning methods can damage team morale and prevent teams from improving the systems they maintain. In this talk, Sam Handler from Shopify explains how his attempts to fix poor infrastructure planning processes through Agile methods failed. Drawing from this experience, he offers several principles that can help infrastructure teams improve the way they work.

View Video

Datadog

Read more about I've Made a Huge Mistake: Implementing Agile on Infrastructure Teams

Empower the SREs - Conclusions from The SRE Report 2023

Nov 8, 2022 By Steve McGhee In Catchpoint

Let's be honest, nobody loves surveys. Ok, well I sure don't. But surveys satisfy a huge need in our demand for insights into complex human-computer, sociotechnical systems. It turns out that we've been measuring the computer part pretty well, but the humans – not as easy to keep track of. When Google SRE first defined toil as a metric we wanted to reduce, we spent far too long trying to quantify it numerically based on tooling and insights from computer systems.

Read Post

Catchpoint

Read more about Empower the SREs - Conclusions from The SRE Report 2023

Observability is Still Broken. Here are 6 Reasons Why.

Nov 8, 2022 By Tomer Levy In logz.io

In an era where there’s no shortage of established best practices and tools, engineering teams are consistently finding their ability to prevent, detect and resolve production issues is only getting harder. Why is this the case? Our most recent DevOps Pulse Survey highlighted alarming trends to this end.

Read Post

logz.io

Read more about Observability is Still Broken. Here are 6 Reasons Why.

3 best practices to reduce application downtime with Google Cloud's API monitoring tools

Nov 8, 2022 By Varun Krovvidi In Google Operations

Maintain high uptime and performance for your APIs without any overheads using Google Cloud’s API monitoring tools.

Read Post

Google Operations

Read more about 3 best practices to reduce application downtime with Google Cloud's API monitoring tools

What is AIOps (Artificial Intelligence for IT Operations)? AIOps Use Cases

Nov 7, 2022 By meshIQ In meshIQ

The volume of data that IT systems generate nowadays is overwhelming, and without intelligent monitoring and analysis tools, it can result in missed opportunities, alerts, and expensive downtime. However, with the advent of Machine Learning and Big Data, a new category of IT operations tool has emerged called AIOps. AIOps can be defined as the practical application of Artificial Intelligence to augment, support, and automate IT processes.

Read Post

meshIQ

Read more about What is AIOps (Artificial Intelligence for IT Operations)? AIOps Use Cases

Monitoring Docker Containers with cAdvisor

Nov 7, 2022 By Ryan Tendonge In MetricFire

Docker is one of the most popular tools for containerization, and several tools have been developed by the open-source community to monitor what happens inside of Docker containers. This guide focuses on one tool specifically: cAdvisor.

Read Post

MetricFire

Read more about Monitoring Docker Containers with cAdvisor

How to monitor Windows logs with the updated Windows integration for Grafana Cloud

Nov 7, 2022 By Muhammad Shahzeb In Grafana

As we all know, Windows is one of the most popular operating systems in the world. It has a dominant share in the desktop computer market, with more than 70% of the machines running the operating system. It makes sense, then, that the Windows integration is also one of the most used and popular integrations in Grafana Cloud.

Read Post