Operations | Monitoring | ITSM | DevOps | Cloud

Troubleshoot and resolve Kubernetes issues with AI-powered guided remediation

As teams adopt Kubernetes at greater scale, they face increased complexity in keeping their growing list of workloads and services up and running. Achieving the visibility and context needed to detect and resolve incidents quickly is difficult amid a constant flood of telemetry data and alerts. Furthermore, Kubernetes expertise often remains siloed in DevOps and infrastructure teams.

Monitor the cost of your public sector applications with Datadog Cloud Cost Management

As federal, state, and local government agencies work to modernize their digital infrastructure and applications, managing costs effectively remains a constant challenge. Federal directives like Cloud Smart indicate the need for public sector IT organizations to track and optimize their cloud spends. However, as an organization’s IT environment grows in complexity, it becomes difficult to correlate cost data and extract useful insights.

Troubleshooting RAG-based LLM applications

LLMs like GPT-4, Claude, and Llama are behind popular tools like intelligent assistants, customer service chatbots, natural language query interfaces, and many more. These solutions are incredibly useful, but they are often constrained by the information they were trained on. This often means that LLM applications are limited to providing generic responses that lack proprietary or context-specific knowledge, reducing their usefulness in specialized settings.

This Month in Datadog - October 2024

On the October episode of This Month in Datadog, Jeremy Garcia (VP of Technical Community and Open Source) covers unified Error Tracking, Security Operational Metrics, and a new Datadog Serverless feature for retrying or redriving failed AWS Step Functions executions directly from Datadog. Later in the episode, Shri Subramanian (Group Product Manager) spotlights Datadog LLM Observability’s native integration with Google Gemini. Also featured are our blog posts Operator vs.

Create ServiceNow tickets from Datadog alerts

ServiceNow is a popular IT service management platform for recording, tracking, and managing a company’s enterprise-level IT processes in a single location. In addition to helping you manage your ServiceNow CMDB, Datadog also integrates with ServiceNow IT Operations Management (ITOM) and IT Service Management (ITSM), enabling you to automatically create and manage ServiceNow incidents and events from the Datadog platform.

How we use Scorecards to define and communicate best practices at scale

In modern, distributed applications, shared standards for performance and reliability are key to maintaining a healthy production environment and providing a dependable user experience. But establishing and maintaining these standards at scale can be a challenge: when you have hundreds or thousands of services overseen by a wide range of teams, there are no one-size-fits-all solutions. How do you determine effective best practices in such a complex environment?

Introducing the Datadog Architecture Center

To prevent visibility gaps in your cloud environment, you need to efficiently deploy observability solutions that integrate easily with key technologies in your stack and scale reliably with new applications and migrated workloads. But observability deployments can be complex, often requiring deep and specific knowledge that may not be available within your teams.

Track and troubleshoot MongoDB performance with Datadog Database Monitoring

Many modern applications rely on MongoDB and MongoDB Atlas to manage growing data volumes and to provide flexible schema and data structures. As organizations adopt these and other NoSQL databases, effective monitoring and optimization become critical, especially in distributed environments.

Ensure high service availability with Datadog Service Management

Adopting a cloud-based, distributed architecture may help your organization scale quickly, but it can also add complexity. Correlating telemetry, security signals, and alerts across services often proves difficult, resulting in slower issue remediation. Additionally, when something goes wrong, figuring out who to contact—for example, the on-call responder or the service owner— may become needlessly time-consuming.

Best practices for monitoring cloud costs with Datadog Scorecards

To ensure that your organization’s cloud spend is efficient, you need detailed and granular visibility to understand what comprises your costs, what causes them to change, and how the cloud services and resources you use are enabling your business goals. Extending your visibility and more closely monitoring your cloud costs can position you to successfully adopt FinOps, which provides a framework that can help you maximize the value you get from your cloud spend.