Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

How Datadog's Infrastructure team manages internal deployments using the Service Catalog and CI/CD Visibility

Managing the software development lifecycle of your applications is a complex task. Releasing software updates in a large and ever-changing ecosystem requires visibility into the state of your services and insight into how changes to these services impact the reliability, performance, security, and cost of your application. The stages of software delivery are often sharded across multiple tools, each purpose-built for a specific slice of your application lifecycle.

Visualize relationships across your on-premises network with the Device Topology Map

Network engineers need clear visibility into the relationships and dependencies of their network devices so they can quickly troubleshoot when issues arise. But when dealing with the potentially thousands of devices that comprise a modern enterprise network, engineers often need to navigate a complex web of interconnected signals in order to trace the sources and consequences of poor network performance.

Best practices for using DORA metrics to improve software delivery

Software development and delivery requires cross-team collaboration and cross-service orchestration—all while ensuring that organizational standards for quality, security, and compliance are consistently met. Without careful monitoring, you risk a lack of visibility into delivery workflows, making it difficult to evaluate how they impact release velocity and stability, developer experience, and application performance.

Monitor your CI/CD modernizations with Datadog CI Pipeline Visibility

As your organization adopts modern technologies and scales its workloads, it’s critical that your CI/CD environment follows suit to maintain smooth development and testing workflows. Adopting modern CI/CD tools (e.g., pipeline runners and testing frameworks) and best practices can increase the agility and resilience of your CI/CD environment as well as enable your teams to configure new jobs, stages, and tests to meet changing business requirements.

Highlights from Google Cloud Next 2024

Over 30,000 people flocked to Las Vegas to see the latest and greatest from Google Cloud and its partners at Google Cloud Next 2024. As a long-time Google Cloud partner and recipient of two Google Cloud Technology Partner of the Year awards this year, we were there in full force to showcase our unified observability and security solutions and engage with the Google Cloud community.

Save up to 14 percent CPU with continuous profile-guided optimization for Go

We are excited to release our tooling for continuous profile-guided optimization (PGO) for Go. You can now reduce the CPU usage of your Go services by up to 14 percent by adding the following one line before the go build step in your CI pipeline: You will also need to supply a DD_API_KEY and a DD_APP_KEY in your environment. Please check our documentation for more details on setting this up securely.

Manage incidents seamlessly with the Datadog Slack integration

Modern, distributed application architectures pose particular challenges when it comes to coordinating incident management. DevOps, SREs, and security teams—often spread out across separate locations and time zones, and equipped with limited knowledge of each other’s services—must work quickly to collaboratively triage, troubleshoot, and mitigate customer impact.

Aggregate, correlate, and act on alerts faster with AIOps-powered Event Management

Maintaining service availability is a challenge in today’s complex cloud environments. When a critical incident arises, the underlying cause can be buried in a sea of alerts from interconnected services and applications. Central operations teams often face an overload of disparate alerts, causing confusion, delayed incident response, alert fatigue, and redundant resolution efforts. These issues can negatively impact revenue and customer experience, especially during an outage.

Track changes in your containerized infrastructure with Container Image Trends

Datadog’s Container Images view provides key insights into every container image used in your environment, helping you quickly detect and remediate security and performance problems that can affect multiple containers in your distributed system. In addition to having a snapshot of the performance of your container fleet, it’s also critical to understand large-scale trends in security posture and resource utilization over time.

Best practices for monitoring managed ML platforms

Machine learning (ML) platforms such as Amazon Sagemaker, Azure Machine Learning, and Google Vertex AI are fully managed services that enable data scientists and engineers to easily build, train, and deploy ML models. Common use cases for ML platforms include natural language processing (NLP) models for text analysis and chatbots, personalized recommendation systems for e-commerce web applications and streaming services, and predictive business analytics.