Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Get granular LLM observability by instrumenting your LLM chains

The proliferation of managed LLM services like OpenAI, Amazon Bedrock, and Anthropic have introduced a wealth of possibilities for generative AI applications. Application engineers are increasingly creating chain-based architectures and using prompt engineering techniques to build LLM applications for their specific use cases.

Integration roundup: Monitoring the health and performance of your container-native CI/CD pipelines

Widespread adoption of containerized infrastructure has been closely followed by an explosion of container-native tools for each layer of the stack, including new solutions for managing CI/CD pipelines in container-based environments, such as the Argo suite, FluxCD, and Tekton. This is because these lightweight solutions make it easier to automate builds, testing, deployments, and more on Kubernetes, as well as other platforms that manage containerized workloads and services.

Reduce alert storms in your microservices architecture with easily scalable techniques

Alert storms occur when your monitoring platform generates excessive alerts simultaneously or in succession. Although numerous factors can cause an alert storm, microservices architectures are uniquely susceptible to them due to multiple service dependencies, potential failure points, and upstream and downstream service relationships.

Introducing Toto: A state-of-the-art time series foundation model by Datadog

Foundation models, or large AI models, are the driving force behind the advancement of generative AI applications that cover an ever-growing list of use cases including chatbots, code completion, autonomous agents, image generation and more. However, when it comes to understanding observability metrics, current large language models (LLMs) are not optimal.

Recapping DASH 2024

DASH 2024 was our biggest event yet! Over two days, thousands from the Datadog community gathered at North Javits in New York City for an impactful experience. The 2024 keynote featured numerous new product launches and updates, but there was much more to enjoy beyond this speech. Attendees got to experience breakout sessions, workshops, certification exams, one-on-one Datadog consultations, and a bustling expo hall.

Optimize PostgreSQL performance with Datadog Database Monitoring

PostgreSQL is a widely used open source relational database that many organizations operate as a core part of their infrastructure stack. Because of their mission-critical nature, database-related issues can have outsize downstream impacts on user experience, service performance, and data retention, making it vital to identify and address problems quickly.

Create Golden Paths for your development teams with Datadog App Builder and Workflow Automation

Improving the developer experience is a chief concern for many orgs who must maintain highly complex software architectures and platforms supported by an intricate web of internal processes. Platform engineering for Golden Paths seeks to address this by providing self-service tools, capabilities, and processes to help engineers start new projects in a more standardized, less mistake-prone way.

Unify your OpenTelemetry and Datadog experience with the embedded OTel Collector in the Agent

OpenTelemetry (OTel) is an open source, vendor-neutral observability solution that consists of a suite of components—including APIs, SDKs, and the OTel Collector—that allow teams to monitor their applications and services in a standardized format. OTel defines this data via the OpenTelemetry Protocol (OTLP), a standard for the encoding and transfer of telemetry data that organizations can use to collect, process, and export telemetry and route it to observability backends, such as Datadog.

Monitor, troubleshoot, improve, and secure your LLM applications with Datadog LLM Observability

Organizations across all industries are racing to adopt LLMs and integrate generative AI into their offerings. LLMs have been demonstrably useful for intelligent assistants, AIOps, and natural language query interfaces, among many other use cases. However, running them in production and at an enterprise scale presents many challenges.