DeepCorr – A novel efficient system for flow correlation attacks on Tor
Flow correlation represents one of the essential techniques utilized in a myriad of deanonymization attacks launched against Tor. Even though flow correlation attacks are highly important , current flow correlation techniques are ineffective in linking Tor traffic flows when implemented at a large scale, i.e. they require long flow observation rates that are highly impractical, or are associated with high false positive rates.
A recently published paper has proven that flow correlation attacks can be executed on Tor traffic flows with high levels of accuracy than before via utilization of novel learning mechanisms. The team of researchers developed a system, which they named DeepCorr, that outperforms previous techniques in correlating Tor network connections. DeepCorr utilizes an innovative deep learning framework to identify a flow correlation function specifically designed to Tor’s anonymity network – this is oppositely to previous techniques that used generic statistical correlation metrics to correlate Tor traffic flows.
The researchers’ experiments showed that with moderate learning, DeepCorr is effective in correlating Tor connections, and thus breaking its anonymity, with high levels of accuracy when compared to existing algorithms, and within shorter durations of flow observations. For example, via collection of only around 900 packets of each target of Tor traffic flow (around 900 KB of Tor traffic data), DeepCorr offers a flow correlation accuracy approaching 96%, as opposed to an accuracy of 4% of the previous system known as RAPTOR, which utilizes the same exact setting.
Via their experiments on the live Tor network, developers of DeepCorr managed to prove the strong performance of the system on a wide scale. They browsed the top 50,000 ranking websites on Alexa over Tor, and evaluated DeepCorr’s true positive and false positive rates in the correlation of the ingress and egress segments of the recorded Tor connections. These experiment’s datasets represent the largest dataset of correlated Tor traffic flows made available to the public.
The following highlights the superiority of DeepCorr’s performance:
– The researchers utilized a total of 25,000 Tor flow connections that they collected to train DeepCorr, even though in their previous experiments, while training other systems, they only utilized 5,000 flow connections. It was found that training of DeepCorr requires around a day using a single TITAN X GPU, yet it was proven that an adversary has to re-train DeepCorr once per month to maintain its correlation performance.
– DeepCorr can be utilized as a generic correlation function. DeepCorr’s performance is consistent for different test datasets with various sizes and including traffic flows routed over different circuits.
– DeepCorr’s performance is superior over previous flow correlation algorithms by a considerable margin. More importantly, DeepCorr has the ability to correlate Tor flows during short flow observation durations when compared with the observation durations of other flow observation systems.
– DeepCorr’s performance efficiency greatly improves with longer durations of flow observations and with larger sets of training.
– DeepCorr’s correlation time is considerably faster than previous works for the very same target accuracy. For example, each DeepCorr correlation requires 2 ms, while RAPTOR’s requires more than 20 ms, when both aim at a 95% accuracy on identical dataset.
Developers of DeepCorr hope that the system can raise concerns in the community on the increasing risks of large scale traffic analysis on Tor communications, in view of the novel deep learning algorithms. DeepCorr can be counter-measured by implementing traffic obfuscation techniques including those deployed by Tor pluggable transports, on all flows of Tor traffic. The developers of DeepCorr evaluated its performance on each of Tor’s currently implemented pluggable transports, which showed that meek and obfs4-iat0 offer little protection against DeepCorr’s flow correlation attacks, while obfs4-iat1 offers a better protection against DeepCorr. It is worth noting that none of these obfuscation techniques are currently implemented by public Tor relay nodes; however, obfs4-iat1 is implemented by a small percentage of Tor bridges. This calls for developing effective traffic obfuscation techniques to be implemented by Tor relays that do not impose large bandwidth and performance overheads on Tor communications.
Even though DeepCorr is presented as a flow correlation attack on the Tor network, it can also be utilized to correlate network flows in other flow correlation applications too. To prove this, authors of the paper also applied DeepCorr to the problem of detection of stepping stones, which showed that DeepCorr is superior to previous stepping stone detection algorithms within unreliable network settings.