Operations | Monitoring | ITSM | DevOps | Cloud

OnlineOrNot

Improving your on-call schedule with runbooks

Incidents are a stressful time for your team: your service isn't working the way you expect and your customers/stakeholders want to know what's going on. The last thing you want to do is let your team improvise everything when it comes to responding to incidents. Google's own SRE book has great overall tips for incident management, part of which involves "develop(ing) and document(ing) your incident management procedures in advance", which this article dives into.

I built my HTTP API docs from scratch

You might be thinking “building HTTP API docs from scratch? in 2024? wtf?”, and you’re probably right. After all redoc has been around since 2016, and there are hundreds of “generate beautiful documentation from your OpenAPI spec” startups around, some even use AI now. To be honest, I didn’t even know it was possible to do-it-yourself when I started looking into it.

How to get your first ten customers

It'll soon be the third anniversary of publicly launching OnlineOrNot on Twitter, and I often get asked what I did to get my first paying customers - so I felt like sharing. I assume when most folks ask this that they're looking for the one thing they can do to finally start getting paid customers. Let me be clear: it's never just one thing.

Ways to avoid losing your domain

Imagine you're sitting in your office, and you start noticing emails coming in asking if you'd like to buy your domain. "Huh, that's weird, I already own that domain" you think to yourself. A few more emails come in, and they're getting past the spam filter, so you decide to double check your domain manager. Doubt starts creeping into your mind, you start panicking, and you frantically scroll down to where the domain should be, and... It's gone.

Our lessons from the latest AWS us-east-1 outage

In case you missed it, AWS experienced an outage or "elevated error rates" on their AWS Lambda APIs in the us-east-1 region between 18:52 UTC and 20:15 UTC on June 13, 2023. If this sounds familiar, it's because it's almost a replay of what happened on December 7, 2021, although that outage was significantly more severe and took longer to restore.