Operations | Monitoring | ITSM | DevOps | Cloud

Handling persistent storage problems in Kubernetes clusters

Persistent storage is the backbone of stateful applications running in Kubernetes. Whether you are managing databases, logs, or application states, ensuring transactional data remains intact despite pod restarts or node failures is a challenge. In this blog, we will discuss the most common persistent storage issues in Kubernetes and how to handle them with practical, real-world solutions.

Monitoring for Kubernetes API server performance lags

The Kubernetes API server is a key component in the control plane. Every interaction, whether deploying applications, scaling workloads, or monitoring system health, depends on the API server. Consider the human body: We have the brain as the critical organ, and the nerves function as the control system. The Kubernetes API server is like the nerve center of cluster management.

Troubleshooting Kubernetes deployment failures

Do you feel like you're solving a puzzle when deploying applications in Kubernetes? You are not alone in this! When something goes wrong during application deployment, it becomes all the more crucial to diagnose the issue methodically and get things back on track. This guide walks you through practical steps for troubleshooting deployment failures efficiently.

How to Effectively Monitor Nginx and Prevent Downtime

Nginx is widely known for its high performance and reliability. However, just like any software running in production, it requires continuous monitoring to ensure smooth operation. Issues such as high latency, unexpected crashes, or overwhelming traffic spikes can lead to performance degradation or even complete outages. Therefore, implementing a robust monitoring strategy is crucial to maintaining the health and stability of your Nginx deployment.

Everything You Need to Know About OpenTelemetry Agents

If you’re reading this, chances are you’re already familiar with OpenTelemetry (OTel)—the open-source standard for collecting observability data. But what about OpenTelemetry agents? How do they work, and why do they matter? This guide unpacks everything you need to know about OTel agents—where they fit in your stack, how to set them up, and common pitfalls to watch out for. Let’s get into it.

Getting started with Azure cost dashboards

As an Azure admin, it is of critical importance that you keep an eye on how much cost you are incurring running your workloads in the cloud. You also want to have sight of any deployed resources that are not contributing to business and accumulating cost over time. Using a dedicated Azure plugin, SquaredUp dashboards will help you understand your Azure costs across services, resources, locations and apps – so you can keep tabs on how much you're spending and identify opportunities to save costs.

Improve gaming app performance with Unity support in Datadog RUM

As mobile gaming evolves, players have higher expectations for seamless experiences, real-time interactions, and cross-platform accessibility. Whether you’re developing games for iOS, Android, or another mobile operating system, maintaining and optimizing the performance of your game is critical for player retention. For instance, if a mobile game becomes laggy or begins to drop frames during gameplay, players will grow frustrated and abandon the game altogether.

TCP Checks Now Available in Checkly

Checkly has always helped you monitor your APIs and web services, ensuring they stay fast, reliable, and available. But application reliability doesn’t stop there—databases, message queues, and mail servers all play a crucial role in your infrastructure. To provide full application reliability, we’re expanding into network monitoring with TCP checks. Now, you can monitor critical non-HTTP services directly in Checkly—without adding extra tools to your stack.

Why and How You Should Use Your Learning & Visiting Budget

When I joined Checkly as Junior People Operations Manager, one of the benefits that immediately stood out to me was the Learning & Visiting budget. I found myself wondering—how is this budget actually being used across the company? At the start of the year, many of our team members plan how they’ll use their learning budget—whether to enhance professional skills or pursue self-driven projects. With flexible guidelines, we encourage them to invest in what matters most.

CI/CD at scale: A performance analysis of CircleCI vs GitHub Actions

When evaluating CI/CD platforms, it can be easy to view them as commodities — interchangeable tools that accomplish the same basic tasks. But as development teams scale, small differences in platform performance can be compounded, significantly impacting development velocity and resource utilization. To better understand these differences, we conducted a head-to-head comparison between CircleCI and GitHub Actions, focusing specifically on performance at enterprise scale.