Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Accelerate incident investigations with Log Anomaly Detection

Modern DevOps teams that run dynamic, ephemeral environments (e.g., serverless) often struggle to keep up with the ever-increasing volume of logs, making it even more difficult to ensure that engineers can effectively troubleshoot incidents. During an incident, the trial-and-error process of finding and confirming which logs are relevant to your investigation can be time consuming and laborious. This results in employee frustration, degraded performance for customers, and lost revenue.

Troubleshoot faster with improved Datadog Events

Datadog Events provides customers with a data feed about their infrastructure and applications, delivering an up-to-the-minute history of activity such as code deployments, configuration changes, and triggered alerts. Events collects data from Datadog products and over 100 third-party integrations—including Docker, Jenkins, Kubernetes, Sentry, AWS CloudWatch, and Azure Service Health.

Monitor your gRPC APIs with Datadog Synthetic Monitoring

gRPC is an open source Remote Procedure Call (RPC) framework developed by Google and released in 2016. Although gRPC is still relatively new, large organizations are adopting it in increasing numbers to build APIs to connect complex microservice meshes that use disparate languages and frameworks. gRPC-based APIs can process requests up to seven times faster than REST APIs, and they also allow customers to easily implement SSL authentication, load balancing, and tracing via plug-in libraries.

Debug issues and automate remediation with Shoreline and Datadog

Shoreline is an incident response automation service that enables DevOps engineers and site reliability engineers (SREs) to quickly debug and remediate issues at scale and develop automated routines for incident management. Using Shoreline’s proprietary Op language, customers can run debug commands across all their hosts simultaneously and then deploy custom scripts via Actions to trigger automated remediations.

Troubleshoot end-to-end tests with CI Visibility and RUM

Adding automated testing to your CI/CD pipelines can help you ensure that you deploy changes safely. But as you continue to shift left, the number and complexity of tests are likely to increase, making them slower to run and harder to troubleshoot. Datadog CI Visibility can help you track the performance of your CI/CD pipelines and tests—and now you can also use Real User Monitoring (RUM) to monitor end-to-end (E2E) Cypress tests.

Monitor your hybrid mobile applications with Datadog

Hybrid mobile applications allow you to incorporate web-based content into your mobile offerings. By embedding webviews inside your iOS or Android app, you can repurpose existing code to build key mobile functionality, such as authentication processing or ad rendering. While hybrid apps can help streamline your development process, they can also make monitoring your system more complex.

Monitor your AWS Lambda functions' ephemeral storage usage

AWS Lambda is AWS’s solution for highly portable, serverless computing. With Lambda functions, you can deploy and run business logic code without managing the underlying servers. Today, AWS announced that Lambda customers can now provision up to 10 GB of ephemeral storage for each of their functions, making them well-suited for new, data-intensive workloads—including machine learning inference, large media file processing, financial analysis, and more.

Real-time distributed tracing for .NET Lambda functions

In 2020 we released distributed tracing for AWS Lambda functions written in Python, Node.js, and Ruby, providing you with health and performance insights across your serverless applications. Since then, we’ve expanded our support to additional Lambda runtimes such as Java and Go, and are pleased to announce that real-time distributed tracing is now also available for.NET Lambda functions.

How to manage log files using logrotate

Logs are records of system events and activities that provide valuable information used to support a wide range of administrative tasks—from analyzing application performance and debugging system errors to investigating security and compliance issues. Large-scale production environments emit enormous quantities of logs, which can make them more challenging to manage and introduces the risk of losing important data if underlying resources run out of space.

Create and navigate a documentation library with Notebooks

Datadog Notebooks enable your teams to create and manage key reports and documentation as they build out, monitor, and maintain their infrastructure. Notebooks can include both text and graphs of any telemetry data you have collected in Datadog, and they support collaborative editing so that multiple team members can edit and leave comments simultaneously.