Operations | Monitoring | ITSM | DevOps | Cloud

October 2024

Nearly 90% of our AI Crawler Traffic is From TikTok Parent Bytedance - Lessons Learned

This month, Fortune.com reported that TikTok’s web scraper — known as Bytespider — is aggressively sucking up content to fuel generative AI models. We noticed the same thing when looking at bot management analytics produced by HAProxy Edge — our global network that we ourselves use to serve traffic for haproxy.com. Some of the numbers we are seeing are fairly shocking, so let’s review the traffic sources and where they originate.

Encoding HAProxy logs in machine-readable JSON or CBOR

Standardized logging formats are important for teams that rely on logging for observability, troubleshooting, and workflow integration. Using structured formats simplifies parsing and eliminates the need to interpret fields manually, ensuring consistency across logging formats. This reduces manual work, prevents brittleness from unstructured logs, and simplifies integration between teams that feed logs into a shared aggregation system.