Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Flaky tests: their hidden costs and how to address flaky behavior

Flaky tests are bad—this is a fact implicitly understood by developers, platform and DevOps engineers, and SREs alike. When tests flake (i.e., generate conflicting results across test runs, without any changes to the code or test), they can arbitrarily fail builds, requiring developers to re-run the test or the full pipeline. This process can take hours—especially for large or monolithic repositories—and slow down the software delivery cycle.

Monitor your Azure OpenAI applications with Datadog LLM Observability

Azure OpenAI Service is Microsoft’s fully managed platform for deploying generative AI services powered by OpenAI. Azure OpenAI Service provides access to models including GPT-4o, GPT-4o mini, GPT-4 Turbo with Vision, DALLE-3, and the Embeddings model series, alongside the enterprise security, governance, and infrastructure capabilities of Azure.

Generate metrics from your high-volume logs with Datadog Observability Pipelines

Logs are a rich source of information, providing you with the minute details you need to troubleshoot a specific issue or perform extensive historical analysis. But with billions of logs being generated from your infrastructure every day, it isn’t practical to sift through them all to derive actionable insights. Firewall, CDN, network activity, and load balancer logs are especially high volume, requiring storage solutions that can be expensive and difficult to scale.

Reduce your AWS Step Functions' error remediation time by redriving executions directly from Datadog

AWS enables customers to retry or redrive Step Functions executions to continue any failed executions of Standard Workflows from their points of failure while maintaining all inputs. For example, if you find broken downstream logic in your code or experience unexpected errors upon execution, you can remediate those errors by fully re-running an execution or use redrive to continue this execution.

Gain visibility into your Camunda 8 components with Bordant Technologies' Datadog integration

Camunda 8 is a process orchestration platform that automates and executes business processes at scale. Many organizations orchestrate their business processes using Camunda 8 Self-Managed because it can operate in their preferred public cloud provider, such as AWS, or in a private cloud, like a Kubernetes cluster. However, hosting Camunda 8 while maintaining its health and performance will require complete visibility into your environment, helping you properly allocate resources and minimize downtime.

How to spot and fix memory leaks in Go

A memory leak is a faulty condition where a program fails to free up memory it no longer needs. If left unaddressed, memory leaks result in ever-increasing memory usage, which in turn can lead to degraded performance, system instability, and application crashes. Most modern programming languages include a built-in mechanism to protect against this problem, with garbage collection being the most common. Go has a garbage collector (GC) that does a very good job of managing memory.

How we used Datadog to save $17.5 million annually

Like most organizations, we are always trying to be as efficient as possible in our usage of our cloud resources. To help accomplish this, we encourage individual engineering teams at Datadog to look for opportunities to optimize. They can share their performance wins, big or small, in an internal Slack channel along with visualizations and, often, calculations of the resulting annual cost savings.

Optimize your AWS costs with Cloud Cost Recommendations

Managing your AWS costs is both crucial and complex, and as your AWS environment grows, it becomes harder to know where you can optimize and how to execute the necessary changes. Datadog Cloud Cost Management provides invaluable visibility into your cloud spend that enables you to explore costs and investigate trends that impact your cloud bill.

Operator vs. Helm: Finding the best fit for your Kubernetes applications

Kubernetes operators and Helm charts are both tools used for deploying and managing applications within Kubernetes clusters, but they have different strengths, and it can be difficult to determine which one to use for your application. Helm simplifies the deployment and management of Kubernetes resources using templates and version-controlled packages. It excels in scenarios where repeatable deployments and easy upgrades or rollbacks are needed.

Integration roundup: Understanding email performance with Datadog

Visibility into email health and performance is indispensable to any organization seeking to reach its customers through their inboxes. As they work to curtail spam, internet service providers (ISPs) are redefining the standards of deliverability on an ongoing basis, and organizations often struggle to adapt.