Operations | Monitoring | ITSM | DevOps | Cloud

Site Reliability Engineering (SRE) 101: Everything You Need to Know | Harness Blog

A single second of latency can cost e-commerce sites millions in revenue, while just minutes of downtime trigger customer churn that takes months to recover. Modern users expect instant responses and seamless experiences, making reliability a competitive feature that directly impacts business outcomes. Site Reliability Engineering treats operations as a software problem rather than a manual discipline. SRE applies engineering principles to achieve measurable reliability through automation.

What's New in InfluxDB 3 Explorer 1.7: Table Management, Data Import, Transforms, and More

InfluxDB 3 Explorer 1.7 is a step forward for anyone who wants to manage their time series data without constantly switching between the UI and a terminal. This release adds table-level schema management, the ability to import data from other InfluxDB instances, and a new Transform Data section to reshape your data, all within the Explorer UI.

Ephemeral Leaks and Automated BGP Route Leak Detection

Many BGP route leaks reported by automated detection systems are actually brief, low-impact artifacts of normal BGP convergence. Doug Madory examines examples from Cloudflare Radar, Routeviews, and Jared Mauch’s long-running leak detector to show how these “ephemeral leaks” arise, why they usually don’t disrupt traffic, and why they still matter for routing security.

AI vs. Hype: Redefining Engineering Excellence with Ron Miller

In this episode of "ShipTalk: Engineering Excellence," host Thomas Dockstader sits down with Ron Miller, editor at Fast Forward, to discuss the real-world impact of AI on software development. They dive deep into the maturity of AI-driven code, the rise of the "citizen developer," and why traditional writing and communication skills are becoming the new must-have for modern engineers.

N+1 Detection in AppSignal's OpenTelemetry Trace Timeline

N+1 query problems are one of the most common, and quietly damaging, performance issues in production applications. One extra query per record feels harmless in development. At scale, it becomes the reason your response times degrade and your database buckles under load. Today, AppSignal adds N+1 detection to its OpenTelemetry support. When we identify the pattern in a trace, we collapse the repetitive spans directly in the timeline, making the problem immediately visible in the trace itself.

Beyond the pull request: why code review is not infrastructure validation

Code review and infrastructure validation are distinct problems. While AI can review syntax, only an active, data-complete environment can validate system-wide state. Upsun provides the unified configuration file needed to turn "looks good to me" into verified production-readiness.