Operations | Monitoring | ITSM | DevOps | Cloud

Datadog

How Mercado Libre scales its AWS microservices without losing visibility

Learn how Mercado Libre acts more quickly, strategically, and proactively thanks to Datadog’s centralized platform and context-rich alerting.Mercado Libre hosts the largest online commerce and payments ecosystem in Latin America, which means thousands of dollars can be lost if some of their critical applications stop working for even 1 minute. Senior Technical Manager Juliano Martins and software expert Marcelo Quadros share a few reasons why they chose Datadog as their observability platform of choice for their AWS environment: the power of our infrastructure monitoring solution, extensive range of integrations, strong reputation in the market, and more.

Formalize your organization's best practices with custom Scorecards in Datadog

The Datadog Service Catalog is a centralized hub of information around the performance, reliability, security, efficiency, and ownership of your distributed services. By using the Service Catalog, teams can eliminate knowledge silos and realize seamless DevSecOps workflows.

How we manage incidents at Datadog

Incidents put systems and organizations to the test. They pose particular challenges at scale: in complex distributed environments overseen by many different teams, managing incidents requires extensive structure and planning. But incidents, by definition, break structures and foil plans. As a result, they demand carefully orchestrated yet highly flexible forms of response. This post will provide a look into how we manage incidents at Datadog. We’ll cover our entire process.

I've Made a Huge Mistake: Implementing Agile on Infrastructure Teams

Bad planning methods can damage team morale and prevent teams from improving the systems they maintain. In this talk, Sam Handler from Shopify explains how his attempts to fix poor infrastructure planning processes through Agile methods failed. Drawing from this experience, he offers several principles that can help infrastructure teams improve the way they work.

How Uber Freight Powers Intelligent Logistics with Datadog

Thiyagarajan Anandan, Uber Freight, shares how he and his team have created a center of excellence for monitoring and DevOps culture. Uber Freight, a division of Uber, delivers an end-to-end enterprise suite of Relational Logistics to advance supply chains and move the world’s goods. With more than 1,000 shippers across $18B freight under management (FUM), it’s critical for Uber Freight to provide a 99.99% uptime for its shippers and customers. Since migrating to the Datadog platform, Uber Freight for the first time has unlocked the full breadth and depth of their systems, thereby significantly decreasing MTTR/MTTD and delivering an improved customer experience.

Plan new architectures and track your cloud footprint with Cloudcraft by Datadog

In a rapidly expanding, highly distributed cloud infrastructure environment, it can be difficult to make decisions about the design and management of cloud architectures. That’s because it’s hard for a single observer to see the full scope when their organization owns thousands of cloud resources distributed across hundreds of accounts. You need broad, complete visibility in order to find underutilized resources and other forms of bloat.

Use Datadog Dynamic Instrumentation to add application logs without redeploying

Modern distributed applications are composed of potentially hundreds of disparate services, all containing code from different internal development teams as well as from third-party libraries and frameworks with limited external visibility. Instrumenting your code is essential for ensuring the operational excellence of all these different services. However, keeping your instrumentation up to date can be challenging when new issues arise outside the scope of your existing logs.

Prioritize and promote service observability best practices with Service Scorecards

The Datadog Service Catalog consolidates knowledge of your organization’s services and shows you information about their performance, reliability, and ownership in a central location. The Service Catalog now includes Service Scorecards, which inform service owners, SREs, and other stakeholders throughout your organization of any gaps in observability or deviations from reliability best practices.

Stream your Google Cloud logs to Datadog with Dataflow

IT environments can produce billions of log events each day from a variety of hosts and applications. Collecting this data can be costly, often resulting in increased network overhead from processing inefficiencies and inconsistent ingestion during major system events. Google Cloud Dataflow is a serverless, fully managed framework that enables you to automate and autoscale data processing.