Operations | Monitoring | ITSM | DevOps | Cloud

Practical Tips on Handling Errors and Exceptions in Python

Have you ever encountered a confusing error message that left you wondering what went wrong in your Python code? You’re not alone. Even the most experienced developers run into exceptions, making it essential to understand how to handle them effectively. While basic syntax errors can be caught early by code editors and debugging tools, more complex issues often arise at runtime, requiring a structured approach to exception handling.
Sponsored Post

The Top 5 Security Logging Best Practices to Follow Now

Security logging is a critical part of modern cybersecurity, providing the foundation for detecting, analyzing, and responding to potential threats. As highlighted by OWASP, security logging and monitoring failures can lead to undetected security breaches. With the average cost of a data breach adding up to $4.45 million, most organizations can't afford to miss a security incident.

Finding UX Friction (...Before It Becomes a Problem)

Make it smooth. Reduce friction. Keep users moving. That’s solid advice. No one enjoys filling out a form with 10 unnecessary fields or dealing with a checkout process that feels like a maze. But you can’t fix friction if you don’t know where it’s happening. Big companies like Amazon, Netflix, and Airbnb don’t just guess where users are struggling. They track the right UX metrics, run experiments, and fine-tune their products constantly.

Coroot v1.9: Kubernetes-Native Database Monitoring Made Easy

From day one, we built Coroot to work beyond just Kubernetes. Many teams still run databases and other stateful services on dedicated VMs or bare-metal servers. But that’s starting to change. More and more teams no longer see Kubernetes as a platform just for stateless apps. Powerful Kubernetes operators now handle day-2 operations like failover, backups, and disaster recovery—making it easier than ever to run databases on Kubernetes. And the number of teams choosing this path keeps growing.

Optimizing Item Search: How Rollbar Engineered Faster, More Capable Search

Searching through error data efficiently is critical for developers using monitoring tools. At Rollbar, we recently completed a significant overhaul of our Item Search backend. The previous system faced performance limitations and constraints on search capabilities. This post details the technical challenges, the architectural changes we implemented, and the resulting performance gains.

Is It Time to Switch Your Network Monitoring Tool? How to Know & Choose the Right Upgrade

A while ago, your company chose a network monitoring tool that worked perfectly — back when most employees worked in the office, networks were centralized, applications ran on-premise, and "the cloud" was just a buzzword.

Simplifying public sector observability with OpenTelemetry and Elastic

Public sector organizations today face unique challenges in maintaining and optimizing their IT infrastructure and prioritizing efficiency and interoperability. With a mix of modern cloud and legacy systems, ensuring consistent performance, reliability, and security is paramount. To effectively observe across these environments, government agencies need observability tools that are open, flexible, and scalable. OpenTelemetry (OTel) is fast becoming a pivotal part of that flexible toolset.

How to prevent performance bottlenecks in Google Compute Engine: CPU spikes, RAM waste, and network overload

Cloud computing is all about efficiency. You need to get the most out of your resources without overspending or causing performance issues. For example, if you’re running virtual machines in Google Compute Engine, you need to size your instances correctly, optimize your workloads, and monitor your network traffic to prevent unexpected failures. However, when resources aren’t properly managed, things can quickly spiral out of control.

Remediate Kubernetes incidents faster using private actions in your apps and workflows

The Datadog Action Catalog provides more than 1,400 actions to help you accelerate remediation across your infrastructure directly within Datadog. With actions, you can use Workflow Automation to configure workflows that automatically address issues as they happen and build custom apps in App Builder that empower anyone in your organization to act when incidents occur.

Enrich your existing Datadog telemetry with custom metadata using Reference Tables

As your applications scale and generate more telemetry, it becomes increasingly difficult to sift through the data and analyze it against cost, business functions, and security measures. Logs, events, and other telemetry on their own may not include enough meaningful context or readable details, leading to slower troubleshooting, inefficient business processes, and higher costs.