isDown

https://isdown.app/

Lisbon, Portugal

2020

Jun 16, 2026 | By Nuno Tomas

Today I'm sharing some big news. IsDown is joining UptimeRobot When I started IsDown, the idea was simple. Keeping track of outages across dozens of vendor status pages was painful, and I wanted to make it easy to see, in one place, when the services you depend on go down. Thousands of teams now rely on IsDown to do exactly that. Joining UptimeRobot is the natural next step.

Read Post

Error Budget in SRE: The Complete Guide (2026)

May 20, 2026 | By Nuno Tomas

An error budget is the acceptable amount of unreliability permitted by your SLO over a defined time window. It is not a target. It is not a stretch goal. It is a hard ceiling that, when breached, should trigger a pre-agreed organizational response — feature freezes, postmortems, or infrastructure investment. The formula is blunt: Error Budget = 1 - SLO Target Error Budget (time) = (1 - SLO Target) × Window Duration For a 30-day window: That last number should make you uncomfortable.

Read Post

Cloud Outage History: Six Years of Recurring Failures

May 13, 2026 | By Nuno Tomas

Cloud infrastructure has never been more reliable in theory. In practice, the last six years of cloud outage history have delivered some of the most disruptive incidents on record. Not because cloud providers got worse, but because the systems built on top of them got larger, more interconnected, and more brittle in ways that don't show up until everything breaks at once.

Read Post

Sponsored Post

How to Reduce MTTR When Third-Party Services Go Down

May 7, 2026 | By Nuno Tomas

Most MTTR guides assume the problem is in your infra. For modern apps, it's often not - it's Stripe, AWS, Auth0, or another vendor. Vendor status pages lie by omission. The lag between impact and acknowledgment can stretch to an hour or more. You need two runbooks, proactive vendor monitoring, and graceful degradation baked in before the 3 AM page hits. This post shows you exactly how.

Read Post

April 2026: IsDown Users Saved 16.5 Hours with Early Outage Detection

May 3, 2026 | By Nuno Tomas

In April 2026, IsDown's early detection system gave users a 3.6-hour head start on a major outage — plenty of time to implement workarounds before the vendor even acknowledged the problem. Across 45 early detections, our users saved a collective 16.5 hours by knowing about outages an average of 22 minutes before official status pages were updated.

Read Post

AWS Outage History: What Engineering Teams Should Learn

Apr 22, 2026 | By Nuno Tomas

If you've been running production workloads on AWS for more than a year, you've felt it: the 3 am PagerDuty alert, the scramble to check the AWS console, the frantic Slack thread asking, "Is this us or is this AWS?" And then, minutes or hours later, the AWS Service Health Dashboard finally acknowledges what your users have been experiencing all along. It happens because AWS is the backbone of modern infrastructure.

Read Post

Sponsored Post

How to Monitor AWS Status: Don't Wait for the Health Dashboard

Apr 7, 2026 | By Nuno Tomas

The AWS Health Dashboard is slow, sometimes broken during major outages, and only tells you what AWS admits is broken. Real SREs layer three monitoring sources: AWS-native tools (CloudWatch, EventBridge), third-party aggregators (IsDown), and internal synthetic checks. Skip the vendor status page as your primary alert source.

Read Post

March 2026: IsDown Users Saved 10.5 Hours with Early Outage Detection

Apr 6, 2026 | By Nuno Tomas

In March 2026, IsDown users collectively saved 10.5 hours by receiving outage alerts before vendors officially acknowledged problems. The most significant early detection gave users a 2.3-hour head start when The Federal Reserve's FedACH system experienced issues. This data reveals the persistent gap between when users experience problems and when vendors update their status pages.

Read Post

Multi-Language Status Page Widgets: Customize Widget Messages in Any Language

Mar 11, 2026 | By Nuno Tomas

If your product serves users in multiple regions, your status page widget shouldn't be stuck in English. A customer in São Paulo seeing "All Systems Operational" when they expect "Todos os Sistemas Operacionais" is a small friction, but small frictions compound. It signals that their language isn't a priority, and it adds cognitive load during the exact moment they're checking whether something is broken. Until now, IsDown widgets shipped with hardcoded English messages. That's changed.

Read Post

AI Systems Status Report - February 2026

Mar 8, 2026 | By Nuno Tomas

This report covers the operational status of major AI systems during February 2026, including Anthropic, Cohere, DeepSeek, Google Gemini, Groq Cloud, OpenAI, Perplexity, Replicate, and xAI. The data includes official incidents reported on vendor status pages and unconfirmed incidents detected through IsDown's monitoring systems.

Read Post

IsDown is a status page aggregator & outage monitoring tool for all your business-critical dependencies. Easily monitor with a health dashboard with all your external services. Instant notifications on outages. All in one place.

It has never been easier to understand the outages in your external cloud services.

Birds-eye view over all your services statuses: Check the status page aggregated of all your services in one place. No more going to each of the status pages and managing them individually.
Outage monitoring in real time: We monitor 24 hours a day, 7 days a week and will notify you if there is an incident. No more wasting time trying to figure out why something isn't working.
Alerts in your favorite channels: Get instant notifications in your email, Slack, Teams, or Discord when we detect a service outage. Outage monitoring where you are already doing your work.
Avoid notifications clutter: Configure which notifications you want to receive from each service. Filter notifications by service components. You can opt to receive notifications only when a specific component is affected. You can also choose to receive notifications with a certain severity.
Easily integrate with your current tools and workflows: Using Zapier or Webhooks, you can easily integrate notifications into your processes. PagerDuty integration is also available.
Dedicated dashboard for each team's services: Create one dashboard for each of your teams. Monitor only the services that each teams uses. Dedicated dashboard with custom notification settings.
Prepare for scheduled maintenances: Never again be caught off guard by unexpected maintenance from your services. A feed of the next scheduled maintenances is available.
Weekly Digest of the services' outages: Every Monday, you'll receive a weekly summary of what happened the previous week as well as the maintenance schedule for the following week.

Easily Monitor Outages In All External Services. From One Place.

Monthly Archive

Follow Us