Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Monitor your Anthropic applications with Datadog LLM Observability

Anthropic is an AI research and development company focused on building reliable and safe artificial intelligence systems. Their flagship product is Claude, an advanced language model and conversational AI assistant known for its strong capabilities in natural language processing, reasoning, and task completion. Anthropic places a particular emphasis on AI safety and ethics, and its models and APIs are used by organizations across various industries to build powerful, safe, and performant AI applications.

Event Logs Explained: Your Guide to System Health

Event logs contain critical information and the analysis of these logs will support organizations in the detection of many security incidents, from auditing user access to observing malicious traffic and even isolating monitor rule changes on a firewall. By collecting event logs systematically and analyzing them, organizations can obtain insights into their IT environment for maintaining operational efficiency and security.

Understanding the Deficiencies of AWS CloudWatch for Cloud Visibility

While CloudWatch offers basic monitoring and log aggregation, it lacks the contextual depth, multi-cloud integration, and cost efficiency required by modern IT operations. In this post, learn how Kentik delivers more detailed insights, faster queries, and more cost-effective coverage across various cloud and on-premises resources.

Is the Internet ready for L4S?

Today, Catchpoint is pleased to be sharing the results of our Global Explicit Congestion Notification (ECN) Bleaching Rates measurement campaign, covering the state of ECN bleaching worldwide, according to Catchpoint’s perspective. ISPs, telecoms and streaming services, among others (this information should be of interest to anyone with ISP dependencies), will be able to draw on this information to determine if your network or an upstream network is experiencing ECN bleaching.

Observe deleted Kubernetes components in Grafana Cloud to boost troubleshooting and resource management

As a site reliability engineer, you need constant vigilance and a keen eye for detail if you want to manage your Kubernetes infrastructure effectively. As part of that effort, you need to see the historical data from your pods, nodes, and clusters — even after they’ve been deleted or recreated. Many SREs rely on kubectl for this, and while it’s indispensable for real-time Kubernetes management, it presents some significant challenges with historical data.

The CoPE and Other Teams, Part 2: Custom Instrumentation and Telemetry Pipelines

The previous post laid out the basic idea of instrumentation and how OpenTelemetry’s auto-instrumentation can get teams started. However, you can’t rely only on auto-instrumentation. This post will discuss the limitations in more detail and how a CoPE can help teams overcome them.

How to fix network latency with network traffic monitoring tools: Use cases and examples

Seamless network performance is the cornerstone of business success. However, network latency—the delay in data transfer initiation—can greatly hinder user experiences, decrease productivity, and even incur financial losses. For businesses aspiring to thrive, it is crucial to address and resolve network latency issues. In this context, network traffic monitoring tools emerge as pivotal solutions.

Navigating IT complexity: Observability vs. monitoring for Australian SMEs' digital transformation

While traditional IT monitoring holds back Australian small and medium-sized enterprises (SMEs) in digital transformation, these organizations do realize that in the realm of IT operations, observability represents a significant advancement over traditional monitoring approaches. Unlike conventional methods that primarily focus on metrics like uptime and error rates, IT observability provides a comprehensive view of system behavior by integrating logs, metrics, traces, and events.