OTel Updates: Consistent Probability Sampling Fixes Fragmented Traces
You're sampling 1% of traces in production. A payment request fails at 3 AM. Logs show an error in order-service, but the full picture isn't there because different services made different sampling decisions. order-service kept the trace; payment-service didn't. So you end up checking logs and timestamps across a few services to piece things together. This happens because the usual probability sampling approach makes a separate choice at each service boundary.