Operations | Monitoring | ITSM | DevOps | Cloud

TikTok Emerges from Shutdown Without Bytedance's US CDN

Kentik’s Doug Madory looks into this weekend’s 14-hour outage of popular video sharing service TikTok, which was slated to be banned from the US per recent legislation. While TikTok came back, it is notably no longer being served by parent company Bytedance’s US CDN. We delve into the traffic statistics in this blog post.

How to fix the root cause of a failed reliability test

You’re well on your way to becoming more reliable. You’ve added your services, found and fixed some Detected Risks, and run your first set of reliability tests. However, some of your tests returned as “Failed.” Not to worry: this isn’t a reflection of you or your engineering skills but rather an opportunity to learn more about how your systems work and, more importantly, how to make them more resilient.

How to Build a Cloud Strategy That Works for Your Business

As technology advances at lightning speed, more and more businesses are turning to the cloud to boost growth, improve efficiency, and stay ahead of the competition. However creating a cloud strategy that matches your business goals, budget, and security needs can be tricky. It’s not just about switching to the cloud—it’s about using it wisely to get the most out of it.

Serilog: Configuration, Error Handling & Best Practices

When building modern.NET applications, logging is one of those things you don’t want to get wrong. Serilog steps in as a popular logging framework that has earned its spot as a go-to tool for developers. Why? Because it’s flexible, versatile, and does an awesome job of giving you clear insights into your app's behavior. But what exactly is Serilog?

SLF4J vs Log4j: Key Differences and Choosing the Right One

When building robust, maintainable, and scalable Java applications, logging plays an essential role in debugging, monitoring, and ensuring smooth performance. Two of the most widely used logging frameworks in the Java ecosystem are SLF4J and Log4j. While both serve similar purposes, they offer different approaches and features, making it important to understand their differences before making a choice.

Lumigo Upgrades Kubernetes Operator for More Insights, Exponential Savings, and Simplicity

We’re excited to introduce the enhanced Lumigo Kubernetes Operator, now more powerful than ever. With just a quick installation, you gain comprehensive observability—bringing together logs, metrics, and traces in a single platform to provide deeper insights and faster troubleshooting. The improved Lumigo Kubernetes Operator unlocks cluster-wide visibility by collecting key infrastructure metrics and logs—allowing you to monitor, analyze, and optimize with minimal effort.

AIOps: Prove It!

I’ve read a steadily increasing stream of articles about using AI in SRE, and I have yet to find one that inspires my trust. Each article makes impressive claims about the capabilities of AI and the way it can be applied to SRE tasks, but the vast majority are light on details. AI tools, and especially LLMs, are growing incredibly quickly, and I feel that these tools have a ton of potential.

Custom database query monitoring: Use cases to unlock business-critical insights

Custom database queries are invaluable for businesses seeking actionable insights from their data. Unlike general monitoring tools, these queries deliver a deeper, more tailored view of critical metrics, help identify patterns, detect anomalies, and address specific operational requirements.

New Relic Cost Optimization: 9 Surefire Ways To Cut Your Observability Costs

New Relic has established itself as a top observability platform with full-stack monitoring. Unifying all telemetry data — metrics, events, logs, and traces — into one platform delivers deep performance insights and enables faster troubleshooting without juggling multiple tools. Also, New Relic prioritizes developers with tools like CodeStream, integrating error details and telemetry directly into the IDE.