Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Detect Java code-level issues with Seagence and Datadog

In Java applications, concurrency issues can be difficult to reproduce and debug. Because work is scheduled nondeterministically across threads, the conditions that have led to an error in one execution of the program may not trigger the same issue the next time around. Exceptions that are silently handled—also known as swallowed exceptions—can also be challenging to debug because they typically do not leave any trace in the logs.

Quickly remediate issues in your Azure applications with Datadog Workflow Automation

Datadog Workflow Automation speeds up incident response and remediation for DevOps, SRE, and security teams by enabling them to automatically run predefined task sequences whenever specific alerts or security signals are triggered. After the feature’s initial release in 2023, Datadog is now excited to announce a significant expansion of its Workflow Automation capabilities with Azure actions, allowing engineers to create automated workflows for their Azure resources for the first time.

Improve your shift-left observability with the Datadog Service Catalog

Your applications are only as powerful as they are iterable. To keep up with their rapidly changing production environments, your teams need reliable CI/CD systems that implement best practices—including build and test automation, flaky test management, and deployment management. By optimizing their CI/CD pipelines, your teams can build their apps more efficiently, deploy them more safely, and catch bugs and security vulnerabilities before they make it to production.

Investigate your log processing with the Datadog Log Pipeline Scanner

Large-scale organizations typically collect and manage millions of logs a day from various services. Within these orgs, many different teams may set up processing pipelines to modify and enrich logs for security monitoring, compliance audits, and DevOps. Datadog Log Pipeline let you ingest logs from your entire stack, parse and enrich them with contextual information, add tags for usage attribution, generate metrics, and quickly identify log anomalies.

Monitor Ray applications and clusters with Datadog

Ray is an open source compute framework that simplifies the scaling of AI and Python workloads for on-premise and cloud clusters. Ray integrates with popular libraries, data stores, and tools within the machine learning (ML) ecosystem, including Scikit-learn, PyTorch, and TensorFlow. This gives developers the flexibility to scale complex AI applications without making changes to their existing workflows or AI stack.

Track service provider outages with IsDown and Datadog

When your apps and infrastructure rely on dozens of third-party providers for key functionality, it’s important to closely track their outages. If a service you rely on goes down, you need to move quickly to limit the outage’s impact on your users. IsDown provides a detailed status page aggregator and uptime monitoring for all your third-party dependencies.

Monitor your chaos engineering experiments with Steadybit's offering in the Datadog Marketplace

Steadybit is a software reliability platform that uses chaos engineering and fault injection to help organizations improve the stability and performance of their applications. By allowing customers to simulate turbulent scenarios in a controlled environment, Steadybit enables you to identify and mitigate potential system issues to reduce downtime and improve resilience.

A deep dive into CPU requests and limits in Kubernetes

In a previous blog post, we explained how containers’ CPU and memory requests can affect how they are scheduled. We also introduced some of the effects CPU and memory limits can have on applications, assuming that CPU limits were enforced by the Completely Fair Scheduler (CFS) quota. In this post, we are going to dive a bit deeper into CPU and share some general recommendations for specifying CPU requests and limits.

Highlights from AWS re:Invent 2023

Whether or not you made the journey to this year’s re:Invent, there’s always a variety of great announcements lost amid an action-packed week of keynotes, breakouts, expo hall demos, and networking sessions. No need to worry—we’re always happy to be a big part of the re:Invent experience and share our observations with you.

Introducing CoTerm, your collaborative terminal for pair programming and debugging

For too long, engineers have had to piece together an unwieldy combination of tools to collaboratively debug and resolve incidents while pair programming in real time. These activities normally require developers to work individually through a terminal, but the patchwork solutions that allow teams to work together in terminals all have significant drawbacks.