Latest Posts

Monitor AWS Trainium and AWS Inferentia with Datadog for holistic visibility into ML infrastructure

Dec 3, 2024 By Anjali Thatte In Datadog

AWS Inferentia and AWS Trainium are purpose-built AI chips that—with the AWS Neuron SDK—are used to build and deploy generative AI models. As models increasingly require a larger number of accelerated compute instances, observability plays a critical role in ML operations, empowering users to improve performance, diagnose and fix failures, and optimize resource utilization.

Read Post

Datadog

Read more about Monitor AWS Trainium and AWS Inferentia with Datadog for holistic visibility into ML infrastructure

How Datadog migrated its Kubernetes fleet on AWS to Arm at scale

Dec 2, 2024 By Matthieu Jaillais In Datadog

Over the past few years, Arm has surged to the forefront of computing. For decades, Arm processors were mainly associated with a handful of specific use cases, such as smartphones, IoT devices, and the Raspberry Pi. But the introduction of AWS Graviton2 in 2019 and the adoption of Arm-based hardware platforms by Apple and others helped bring about a dramatic shift, and Arm is now the most widely used processor architecture in the world.

Read Post

Datadog

Read more about How Datadog migrated its Kubernetes fleet on AWS to Arm at scale

Achieve total app visibility in minutes with Single Step Instrumentation

Dec 2, 2024 By Evan Pandya In Datadog

Datadog APM and distributed tracing provide teams with an end-to-end view of requests across services, uncovering dependencies and performance bottlenecks to enable real-time troubleshooting and optimization. However, traditional manual instrumentation, while customizable, is often time consuming, error prone, and resource intensive, requiring developers to configure each service individually and closely collaborate with SRE teams.

Read Post

Datadog

Read more about Achieve total app visibility in minutes with Single Step Instrumentation

Monitor your OpenAI LLM spend with cost insights from Datadog

Dec 2, 2024 By Thomas Sobolik In Datadog

Managing LLM provider costs has become a chief concern for organizations building and deploying custom applications that consume services like OpenAI. These applications often rely on multiple backend LLM calls to handle a single initial prompt, leading to rapid token consumption—and consequently, rising costs. But shortening prompts or chunking documents to reduce token consumption can be difficult and introduce performance trade-offs, including an increased risk of hallucinations.

Read Post

Datadog

Read more about Monitor your OpenAI LLM spend with cost insights from Datadog

Secure your cloud environment from end to end with Datadog Infrastructure-as-Code Security

Dec 2, 2024 By Cliff Kim In Datadog

Infrastructure-as-code (IaC) tools like Terraform and CloudFormation allow teams to define, manage, and provision their cloud infrastructure using code, as opposed to clicking through consoles or executing commands via a CLI. IaC adoption is now widespread and helps teams increase productivity and efficiency, but it also introduces new surface area for mistakes, defects, and other risks.

Read Post

Datadog

Read more about Secure your cloud environment from end to end with Datadog Infrastructure-as-Code Security

Centrally manage Agent upgrades and configurations with Datadog Fleet Automation

Dec 2, 2024 By Vignesh Palaniappan In Datadog

Teams can gain deep visibility into their applications and infrastructure by installing Datadog’s client-side agent software—the Datadog Agent—throughout their environment. And to help ensure the Agent is deployed correctly and consistently, Datadog’s Fleet Automation feature already helps teams centrally view Agent installations and configurations. But teams also need an easier way to manage the deployment and configuration of the Agent at scale.

Read Post

Datadog

Read more about Centrally manage Agent upgrades and configurations with Datadog Fleet Automation

Build Datadog workflows and apps in minutes with our AI assistant

Nov 26, 2024 By Amber Tunnell In Datadog

Datadog is a central hub of information—enabling you to see logs, traces, and metrics from across your stack and providing a centralized source of notifications about potential issues. However, when Datadog notifies you of an issue, you often need to log in to other applications to fully assess and resolve it, which slows down mitigation.

Read Post

Datadog

Read more about Build Datadog workflows and apps in minutes with our AI assistant

Stream logs in the OCSF format to your preferred security vendors or data lakes with Observability Pipelines

Nov 25, 2024 By Micah Kim In Datadog

Today, CISOs and security teams face a rapidly growing volume of logs from a variety of sources, all arriving in different formats. They write and maintain detection rules, build pipelines, and investigate threats across multiple environments and applications. Efficiently maintaining their security posture across multiple products and data formats has become increasingly challenging.

Read Post

Datadog

Read more about Stream logs in the OCSF format to your preferred security vendors or data lakes with Observability Pipelines

Optimize LLM application performance with Datadog's vLLM integration

Nov 22, 2024 By Curtis Maher In Datadog

vLLM is a high-performance serving framework for large language models (LLMs). It optimizes token generation and resource management to deliver low-latency, scalable performance for AI-driven applications such as chatbots, virtual assistants, and recommendation systems. By efficiently managing concurrent requests and overlapping tasks, vLLM enables organizations to deploy LLMs in demanding environments with speed and efficiency.

Read Post

Datadog

Read more about Optimize LLM application performance with Datadog's vLLM integration

Get deeper visibility into your AWS serverless apps with enhanced distributed tracing

Nov 22, 2024 By Sumedha Mehta In Datadog

Serverless or event-driven applications can comprise many different distributed components, including serverless compute services such as AWS Lambda and AWS Fargate for Amazon ECS, as well as managed data streams, data stores, workflow orchestration tools, queues, and more. Having full end-to-end visibility into requests as they propagate across all of these parts of your application is crucial to monitoring performance, locating affected up- or downstream services, and troubleshooting issues.

Read Post

Datadog

Read more about Get deeper visibility into your AWS serverless apps with enhanced distributed tracing

Operations | Monitoring | ITSM | DevOps | Cloud

Monitor AWS Trainium and AWS Inferentia with Datadog for holistic visibility into ML infrastructure

How Datadog migrated its Kubernetes fleet on AWS to Arm at scale

Achieve total app visibility in minutes with Single Step Instrumentation

Monitor your OpenAI LLM spend with cost insights from Datadog

Secure your cloud environment from end to end with Datadog Infrastructure-as-Code Security

Centrally manage Agent upgrades and configurations with Datadog Fleet Automation

Build Datadog workflows and apps in minutes with our AI assistant

Stream logs in the OCSF format to your preferred security vendors or data lakes with Observability Pipelines

Optimize LLM application performance with Datadog's vLLM integration

Get deeper visibility into your AWS serverless apps with enhanced distributed tracing

Monthly Archive

Follow Us