Operations | Monitoring | ITSM | DevOps | Cloud

Blog

Introducing Snuba: Sentry's New Search Infrastructure

For most of 2018, we worked on an overhaul of our underlying event storage system. We’d like to introduce you to the result of this work — Snuba, the primary storage and query service for event data that powers Sentry in production. Backed by ClickHouse, an open source column-oriented database management system, Snuba is now used for search, graphs, issue detail pages, rule processing queries, and every feature mentioned in our push for greater visibility.

Mattermost 5.11: New remote CLI tool, hackfest winners, free online training, and more

Mattermost 5.11 includes platform improvements that will help your team get more done in less time. Try these new features by downloading Mattermost 5.11 today. Since it includes security updates, upgrading is recommended.

Lambda and Kinesis - beware of hot streams

Back in 2017, I wrote a post titled “3 pro tips for Developers working with Kinesis streams”, in which I explained why you should avoid hot streams with many Lambda subscribers. When you have five or more functions subscribed to a Kinesis stream you will start to notice lots of ReadProvisionedThroughputExceeded errors in CloudWatch.

Why Cloud Cost Optimization Shouldn't be Finance's Responsibility

If you’re a cloud architect or engineering lead, chances are you’ve had a defensive conversation with finance about the AWS bill. Maybe it looked a little something like this… Unfortunately, this scenario is all too familiar, yet understandable from Finance Frank’s point of view. He’s just trying to do his job, but has zero context into which engineering activities are costing the organization so much (or why these costs are variable on a month-to-month basis).

Stop Your Database From Hating You With This One Weird Trick

Let’s not bury the lede here: we use Observability-Driven Development at Honeycomb to identify and prevent DB load issues. Like every online service, we experience this familiar cycle. This is not a bad thing! It’s a normal thing. Databases are easy to start with and do an excellent job of holding important data.

Single Pane or Single Pain of Glass?

A lot has been written about the ever-elusive “Single Pane of Glass” (or SPOG). From calling it a myth like BigFoot or The Loch Ness Monster , to reporting that “a centralized, service-centric view into IT environments has become a must-have capability for IT Operations” (2018 Digital Enterprise Journal Study), both opponents and proponents admit that the implementation of a centralized view into IT Ops is a real need, but at the same time, a major operational challenge.

Takeaways From ServiceNow's Knowledge 2019

We had a great time in Las Vegas, attending ServiceNow’s Knowledge 2019 conference. We enjoyed everything the city has to offer, while also exploring the latest on IT workflow transformations. Though there are several valuable experiences to report on, I’ll cover just a few takeaways from Knowledge ‘19 and how it resonated with the OnPage team.

SLO, SLA, SLI Oh My! Creating them can be easy

Imagine you are driving a car on a freeway. Your speedometer is telling you you’re going 62 mph. But you “gotta go fast”. Faster than then 65 mph speed limit. So you go for it: first 68mph, then 75mph, then 80mph. Then you pass a police officer hiding in a speed trap. To your dismay, they pull you over and give you a ticket. All is not lost: there is a silver lining here.

AKS Cluster Performance: How to Better Operate Kubernetes in Azure

AKS is the managed service from Azure for Kubernetes. When you create an AKS cluster, Azure creates and operates the Kubernetes control plane for you at no cost. The only thing you do as a user is to say how many worker nodes you’d like, plus other configurations we’ll see in this post. So, with that in mind, how can you improve the AKS cluster performance of a service in which Azure pretty much manages almost everything?