The Definitive Guide to Traffic Replay
Traffic replay is quickly gaining traction as the best way to recreate production scenarios.
Capture and Replay
Put simply, traffic replay is the ability to capture computer network traffic and recreate it in its original form.
Network requests are, by design, ephemeral. Once they leave the wire the listening application must handle them appropriately or they are lost. For a running application network requests have specific meaning and are handled according to their protocol and content. Traffic capture and replay means storing all requests regardless of their protocol or content with the intent of playing them back at a later time or in a different environment.
In order to visualize this, let’s think about capture and replay as a DVR (the device that recorded your TV shows before all of the streaming services existed).
Captured traffic is copied in a format that can be replayed later, or even modified for use in different environments. To be clear, traffic replay is different than session replay.
Use Cases
Traffic capture and replay uses generally revolve around inspecting and recreating production scenarios.
When you start to look you’ll notice its use in many places. Wireshark has been the packet capture standard for decades and Tcpreplay is just one of many tools which can replay pcap files. Netflix created Polly.JS and Facebook records a fraction of production traffic to create better testing environments. And Uber created capture and replay load testing framework for use internally.
Contract Testing
Recording traffic for one version of a service and replaying it for another version can help validate the new version. If the same request on the new version does not receive the same response the service has introduced a bug. This is particularly useful for API integration testing implemented in CI/CD.
Load Testing
Manually scripted load tests have the same issues as manually written unit and integration tests. While useful, they are based on the assumptions of the developer. Load testing with a multiple of real application traffic creates a more realistic view of real world scenarios.
Mocks
Create a mock service (API, database, queue, etc.) from captured traffic for testing or development. Some resources are difficult to use for development like a payment API or a product store API where requests may change the internal state of a critical system. Some resources are resource intensive and expensive to run for development or testing purposes. Application testing aims to validate application behavior while keeping all other resources constant and mocks don’t experience shared load or side effects.
Sophistication
Implementations of traffic replay vary based on goals and execution, and there are several levels based on capabilities. Capture is as important as replay because it determines what is possible during the replay phase.
Level 1
Level 1 capture generally operates at network layer 4. Bytes are captured and the same bytes are replayed. Replayed traffic matches captured traffic exactly. This level is useful for recreating traffic exactly. There is no need to understand protocol or composition because replayed traffic is one-to-one. Introspection is limited to network packets and raw data. This looks exactly like the original traffic DVR visual.
Level 1 is the 80/20 of traffic replay. Most of the value is achieved with very little effort.
Level 2
Level 2 and above operates at network layer 7. Captured traffic contains metadata about the protocol and contextual request information which is often necessary to fully inspect the traffic before replay. This level may provide introspection into HTTP request URL, headers, and body. Captured traffic between a client and SQL database may provide a way to view queries outside of replay. The traffic DVR now knows about the captured content.
Basic replay is much the same as level 1, but level 2 capture is necessary for replay at higher levels. It allows deep inspection of application traffic and behavior that is difficult to get with other methods.
Level 3
Level 3 provides manual request rewriting. Request details may be manually modified. HTTP URLs can be overwritten. Headers can be changed, added, or removed. The traffic DVR supports modification during replay with the help of a provided script or configuration.
Level 3 enables traffic replays which are very close to, if not as good as, production scenarios. The primary drawback of level 3 is the manual effort required to create a valid replay for a running application outside of production.
Level 4
Level 4 is differentiated by an intelligence layer on top of the traffic to automatically detect and rewrite traffic, or create automated suggestions, without manual intervention. Relationships between IDs and tokens are detected automatically. A matching value from a JWT and query parameter can be updated and the JWT resigned. Timestamps which were 5 minutes old during capture can be recognized and updated to be 5 minutes old during replay. The traffic DVR supports automatic, intelligent modification.
Level 4 is the most difficult to achieve but provides the most value to applications during replay. Computers are unforgiving and reliably recreating production scenarios quickly falls apart when the minor details an application expects are incorrect.
Summary
Traffic capture and replay is an old concept with many uses but has historically been overlooked by most organizations due to limited access to sophisticated capture and replay capabilities. Not all traffic replay tools and platforms are equal and traditionally only the highest performing organizations have allocated the resources to build reliable and useful tools utilizing traffic replay. Modern offerings, like Speedscale, make higher levels of traffic replay sophistication available such that traffic replay is a viable option for simulating production environments.