Operations | Monitoring | ITSM | DevOps | Cloud

Kafka's 80% Problem

Kafka is too expensive and complex for 80% of users. Most Kafka usage is small-data - ~60% of clusters are sub-1 MB/s, yet teams pay big-data prices. Diskless (KIP-1150), Tiered, and Iceberg topics give Apache Kafka multiple storage classes, but they’re advanced storage primitives. They are powerful in the hands of seasoned platform teams but too complex for beginners.

Enhancing JFrog Internal Operations with Near Zero Downtime Migration

Data migrations have long been a significant source of anxiety for businesses and IT teams alike. The thought of moving critical databases often conjures images of prolonged downtime, service interruptions, and the ever-present risk of data loss. Indeed, statistics show that “90% of businesses experience unexpected downtime during database migrations, leading to significant revenue loss and customer dissatisfaction”.

Automating Expo app build delivery to QA with CircleCI and EAS webhooks

Manually sharing mobile app builds with Quality Assurance (QA) engineers can be a tedious and error-prone process. Developers often find themselves exporting.apk or.ipa files, uploading them to Google Drive or Dropbox, and then pinging the QA team on Slack to announce the upload, all while juggling deadline and code reviews. This manual process not only slows down feedback cycles but also leaves room for human error, miscommunication, or outdated builds being tested.

How to Deploy Calico Whisker and Goldmane in Manifest Only Setups

If you’re running Calico using manifests, you may have found that enabling the observability features introduced in version 3.30, including Whisker and Goldmane, requires a more hands-on approach. Earlier documentation focused on the Tigera operator, which automates key tasks such as certificate management and secure service configuration. In a manifest-based setup, these responsibilities shift to the user.

Identify recurring issues and reveal their root cause with BigPanda IT Problem Management

For many enterprises, incident response feels like déjà vu. The same issues keep happening over and over, eating up time, draining resources, and wearing down your teams. In fact, 20-40% of IT incidents are typically recurring issues, created by unresolved underlying problems. Teams prioritize speed over permanence, patching symptoms instead of addressing the root cause. They often lack the right context, documentation, or shared knowledge to permanently fix issues.

The Best Tools for Synthetic & Infrastructure Monitoring-A Comparative Guide

Both user and server-side monitoring are important to make your apps better. Tools that offer monitoring of just one side leave gaps in your diagnosis, causing negative experiences and reliability issues. Here are the top 10 tools you should consider based on their benefits and coverage.

Closing Visibility Gaps in the Modern Data Center

In today’s high-performance data centers, “all green” dashboards can mask catastrophic issues hiding just beneath the surface. If you’re missing the microbursts, hidden oversubscription, and routing imbalances that are devastating application performance, you’re flying blind. Learn how to close these visibility gaps and shift from reactive firefighting to proactive network intelligence.

Python performance monitoring for Django, Flask, Celery, and more

Here's some excellent news for the Pythonistas in the room: You can now monitor the performance of your Python applications with Honeybadger. Last year, we launched Honeybadger Insights, a new logging and observability tool bundled with Honeybadger. Insights enables you to query your application logs and events to answer performance questions, perform root-cause analyses, and create charts and dashboards to see what's happening in real time.

Telemetry Now Teaser: "Tracking the Red Sea Cable Cuts with Kentik's Cloud Latency Map"

Go behind the scenes of a major internet analysis. When the recent Red Sea cable cuts disrupted global connectivity, Kentik's Director of Internet Analysis, Doug Madory, turned to the Cloud Latency Map to track the fallout in real-time. In this clip from the latest Telemetry Now podcast, Doug walks through how he identified the latency spikes and rerouting caused by the damage.