Operations | Monitoring | ITSM | DevOps | Cloud

TCP Checks Now Available in Checkly

Checkly has always helped you monitor your APIs and web services, ensuring they stay fast, reliable, and available. But application reliability doesn’t stop there—databases, message queues, and mail servers all play a crucial role in your infrastructure. To provide full application reliability, we’re expanding into network monitoring with TCP checks. Now, you can monitor critical non-HTTP services directly in Checkly—without adding extra tools to your stack.

Why and How You Should Use Your Learning & Visiting Budget

When I joined Checkly as Junior People Operations Manager, one of the benefits that immediately stood out to me was the Learning & Visiting budget. I found myself wondering—how is this budget actually being used across the company? At the start of the year, many of our team members plan how they’ll use their learning budget—whether to enhance professional skills or pursue self-driven projects. With flexible guidelines, we encourage them to invest in what matters most.

Shorten your MTTR with Checkly Traces

We all know that Checkly is a ‘secret weapon’ for engineering teams who want to shorten their mean time to detection (MTTD). With Checkly, you can know within minutes if your service is unavailable for users, or acting unexpectedly. In this article we’ll talk about how Checkly traces can help you expand on the benefits of Checkly, adding insights that will help you diagnose root causes, and further reduce your mean time to resolution (MTTR) for outages and other incidents.

Networks are everyone's business - TCP Checks for app developers

Checkly is the industry’s best tool to monitor your production applications. With the power of playwright, developers can test the systems they’ve developed, and roll out those tests as production monitors running from multiple geographies on the Checkly system. And Checkly monitors thousands of API endpoints with complex validation, setup and cleanup scripts, and reliable alerting. So why are we expanding into TCP-based checks?

Optimize MTTD with the right check frequency

Checkly enables engineers to automate the monitoring of their production services. Using the automation framework Playwright, you can run an end-to-end test on a regular cadence to make sure every feature is working for your users. But once you’ve got your check set up, either with Playwright scripting, a Terraform template, or an OpenAPI spec, we come to the question of what frequency you should run these checks. Should you be checking every few minutes, or every hour?

Making sure you get a Checkly alert for every detected failure

It’s every ops team’s biggest anxiety: a monitoring system detects a failure, but the notification either isn’t delivered or isn’t noticed by the team. Now we have to wait for users to complain before our team knows about the problem. Checkly sends an alert every time the system detects a failure, but how can you be sure you’re getting those alerts, and that those alerts are going to the right people?

Announcing Checkly Traces: Unified Synthetic Monitoring and Distributed Tracing

Until recently, Checkly was telling you what broke in your app. Now, it can also tell you why it broke. We're excited to announce the general availability of Checkly Traces, a new addition to our synthetic monitoring platform that bridges the gap between frontend monitoring and backend observability. By combining synthetic monitoring with distributed tracing, Checkly Traces empowers development teams to detect, diagnose, and resolve issues faster than ever before.

DOES Cache Rule Everything Around Me? - Using Compression for our Prometheus Cache

Checkly is a key part of a professional developer’s workflow, making it easy to know if your service is up or down, and measure performance. As we integrate with almost any development workflow, we also have Prometheus endpoints to let you use the popular Grafana stack to keep track of your site checks’ status. As large enterprise users grew in usage, their check performance data grew in parallel, and our endpoint started returning occasional 429 status codes.