Valhalla (Scam) cryptomarket analysis and user de-anonymization
Darknet marketplaces are cryptomarkets, where vendors can sell illicit drugs and services to anyone, and anywhere in the world. Darknet marketplaces are hidden services that are accessible via the Tor network. Users of these cryptomarkets usually use PGP to interact with each other and bitcoin as the currency for their various transactions.
A recently published thesis performed quantitative analysis of trades taking place on the Valhalla marketplace (February 2018). Researchers scraped Valhalla’s website for information about listings, vendors, and buyers, which yielded an interesting magnitude of statistics about the nature of trading taking place on this cryptomarket. The researchers that conducted the thesis also worked on creating a tool to search for, analyze, and find public data which can help de-anonymize users of darknet marketplaces on the Tor network. Throughout this article, we will summarize the methodology and results of this interesting thesis.
The team of researchers scraped multiple surface web social networks and websites, which are related to bitcoin including Reddit, Bitcointalk, Twitter, blockchain.info (now blockchain.com), etc., as well as the Valhalla marketplace itself. All obtained data was stored in a Neo4j database. A tool was created to collect this data, import it into the database, run multiple heuristics, and provide the user with an interface that facilitates visualization of data and metadata of addresses and identities related to darknet marketplaces and bitcoin’s blockchain. The tool can identify the most relevant bitcoin addresses related to drug transactions on darknet marketplaces and also the most relevant bitcoin addresses found in scraped websites. The team created two accounts on Valhalla marketplace and deposited and withdrew bitcoin in their wallets multiple times. As such, they obtained multiple bitcoin addresses controlled by Valhalla’s marketplace administrators. Thereafter, they ran the tool and certified the tool’s efficiency to identify whether or not the bitcoin addresses they obtained from these transactions actually belonged to the same identity (i.e. Valhalla’s administrators).
The researchers’ application for analyzing gathered data:
The researchers created a special application for analyzing gathered data. Figure (1) illustrates the application’s architecture, which is comprised of multiple python scripts divided into 5 categories:
- Scripts for scraping data on Valhalla market
- Scripts for parsing bitcoin’s blockchain and relevant data from publicly available social networks and blockchain related forums
- Special script for importing data to a database, creating indexes, and running heuristics
- Web server that handles GUI requests and obtains data from the database
- The web GUI for sending requests to the web server, searching, and obtaining relevant data
Figure (1): Architecture of application
In order to formulate a tool for effectively searching and visualizing the blockchain, as well as web-scraped data, the researchers had to store the blockchain and identity data on the local machine they worked on, in a manner permitting an effective performance of the heuristic analysis. The natural representation of transactions taking place between bitcoin addresses is a graph, so they decided to import all the data into the Neo4j graph database, which is one of the most widely used graph databases that supports common graph algorithms.
Results and statistics of Valhalla’s cryptomarket:
Scraping and analyzing the data from Valhalla’s marketplace revealed interesting data and statistics. We will present some of the most interesting info.
The researchers were able to scrape 25,309 product listings, 981 vendors, and 6,381 buyers’ feedbacks. There were 17,314 (68%) listings including drug substances, the remainder included ebooks, premium accounts, guns, fake/counterfeit IDs, etc. Vendors had between 1 (90 vendors) and 1,083 (1 vendor) active product listings, with an average of 25.69 listings per vendor.
Figure (2) includes a histogram that plots the number of vendors against their active listings. Most vendors tend to have just a few active listings. The vendors were shipping their products from 39 distinct countries. Valhalla was originally established as a local Finnish crypto market, which might be why many vendors shipping from Finland and also a large percentage of Finns among high revenue vendors.
Figure (2) Histogram for number of active listings per vendor
Table (1) shows the various categories of product listings, market shares of each, monthly revenue, buyers’ feedbacks, and monthly revenue.
Table (1): Product categories and monthly revenue
The thesis managed to perform scraping of the Valhalla cryptomarket which yielded meaningful and interesting statistics regarding its vendors, product listings, and users. The statistical findings of the thesis along with the previously published analysis of darknet marketplaces (cryptomarkets), yet the researchers who conducted this thesis found some statistics and data where Valhalla market was an outlier and concluded the reasons why is it so. The researchers scraped Valhalla marketplace once, but the feedbacks data of buyers were related to the time frame of only one month. Continuous monitoring of Valhalla cryptomarket for longer durations could boost the statistical description with further data about changes of various trends on this marketplace over time. The application developed by the researchers was proven to be fully working and production ready, it finds the most relevant bitcoin addresses with possible associated identities for a high percentage of given addresses. The application could be extended in multiple ways.
The researchers used two heuristics that can have the lowest risk of falsely clustering bitcoin addresses belonging to different identities. The heuristics have not clustered all the addresses and transactions done within the Valhalla cryptomarket, however they found, that with just dozens of bitcoin deposits and withdrawals between the researchers (via the accounts they created for purpose of the experiment) and the market, they were able to identify 9% of the estimated cryptomarket’s bitcoin flow over any given month.
It is worth mentioning that the researchers might have underestimated the cryptomarket’s cash flow, because they based the estimation merely on data obtained from feedbacks left by buyers. The price that they paid in fees throughout these transactions was less than 100 dollars. Perpetual deposits and withdrawals over a longer period of time seem financially feasible and can lead to a revelation of the majority of Valhalla’s bitcoin addresses and transactions.
The researchers’ application could be extended via implementing more heuristics for clustering bitcoin addresses from previous researchers related to cryptomarket analysis. These heuristics are not solely based on a bitcoin transactions graph, but also on users’ expected behavior, and data obtainable by continuously running one or even multiple bitcoin nodes. These heuristics are less reliable, yet can offer more options to cluster bitcoin addresses that belong to the same owner.
The whole application backend can be operated on a consumer-grade notebook. The backend database is indexed and utilizes memory well, and the GUI requests take a maximum of 30 seconds to process. This represents a great advantage for using the application; however, its setup takes a long time due to the fact that bitcoin’s blockchain is currently estimated to include around 150GB of data which has to be downloaded, or inserted, onto the database, indexed and processed by heuristics. For setting the application up on the server and render it remotely accessible by users, different options for storing and retrieving data on the remote server hardware can be considered, such as a powerful RAM extension to be efficient in storing the whole database in RAM.