Operations | Monitoring | ITSM | DevOps | Cloud

Datadog

Datadog on Site Reliability Engineering #shorts #datadog #observability

There are many different ways to implement Site Reliability Engineering (SRE). From team structures to roles and responsibilities to planning and prioritization flows, there’s no golden path for how to organize things. As Datadog has shifted from a startup to a quickly-growing public company, we’ve seen our own SRE practice evolve. With over 22,000 customers sending trillions of data points each day, keeping Datadog reliable is critical to our business.

Empower engineers to take ownership of Google Cloud costs with Datadog

Google Cloud provides a wide range of services and tools to help engineering teams reduce the complexity of migrating and deploying applications in the cloud. As engineering teams work to improve the performance, reliability, and security of their applications, they also need to be conscious of cloud costs. But engineers often don’t have access to cost data, or they only see cost data in monthly reports.

Filter and correlate logs dynamically using Subqueries

Logs provide valuable information that can help you troubleshoot performance issues, track usage patterns, and conduct security audits. To derive actionable insights from log sources and facilitate thorough investigations, Datadog Log Management provides an easy-to-use query editor that enables you to group logs into patterns with a single click or perform reference table lookups on-the-fly for in-depth analysis.

Best practices for monitoring software testing in CI/CD

A key challenge of monitoring your CI/CD system is understanding how to optimize your workflows and create best practices that help you minimize pipeline slowdowns and better respond to CI issues. In addition to monitoring CI pipelines and their underlying infrastructure, your organization also needs to cultivate effective relationships between platform and development teams.

Fine-tune observability configurations for all your Azure integrations in one place

Microsoft Azure provides an array of managed services to support many aspects of cloud computing, including application development, workload migration, and data management. To help you monitor the health and performance of these services, Datadog offers integrations with more than 40 Azure services, including Azure Kubernetes Service (AKS), Cosmos DB, and Azure App Services. Each integration provides robust data visualizations, meaningful alerts, and one-click Datadog Agent deployment.

Best practices for end-to-end service ownership with Datadog Service Catalog

In order to grow your organization effectively, you need to ensure the scalability of your systems. In a broad, distributed architecture, critical processes like incident triage, security response, and large-scale configuration changes can be difficult to execute without a programmatically accessible registry of what’s running in production and who owns it.

Simplify production debugging with Datadog Exception Replay

Debugging errors in production environments can frustrate your team and disrupt your development cycle. Once error tracking detects an exception, you then need to identify which specific line of code or module is responsible for the error. Without access to the inputs and associated states that caused the errors, reproducing them to find the root cause and a solution can be a lengthy and challenging process.

Integration roundup: Monitoring your container-native technologies

Container-native technologies increase the scalability and speed of deployment offered by containerized infrastructure, but they also present new monitoring challenges for organizations that adopt them. For example, because containers are ephemeral and share resources, tracking resource provisioning in container-native tools is essential to ensure consistent application performance.

Analyze multiple user journeys with the Datadog Sankey visualization

Funnels can be powerful tools for analyzing your UX, but figuring out exactly which user journeys you want to study can be challenging. Even if you have an ideal journey in mind, users often take steps you don’t expect. As a result, your funnels—and therefore, your optimization efforts—can easily miss the most influential pages in your application. Indeed, how do you build the best possible funnel when there are thousands of paths users can take after any given page?

Datadog on Data Science

In this episode we'll visit the world of predictive analytics and machine learning and uncover how these cutting-edge technologies are transforming the way Datadog monitors and improves its services. We’ll focus our conversation on two key aspects: using advanced statistical methods for proactive monitoring and the strategic implementation of machine learning for algorithm enhancement.