Automated detection of cryptocurrency investment scams at scale
Repository URI
Repository DOI
Change log
Authors
Abstract
The ecosystem of cryptocurrencies has grown and changed significantly since Bitcoin’s inception in 2008 (Nakamoto, 2008). Similarly, the number of people using cryptocurrencies as a means of investment, speculation and form of payment has risen over the past few years. This expansion, however, has opened opportunities for cybercriminals, leading to an increase in cryptocurrency-related scams. Although extensive research has been carried out in relation to this type of scam, there is limited research that analyses the textual content from online forums and social media to identify cryptocurrency investment scams at scale in an automated manner. This thesis addresses this gap by employing machine learning models to detect cryptocurrency investment scams through the analysis of textual conversations, offering insights into the evolution of scam luring tactics and the monetary impact these fraudulent schemes have in society.
First, in this thesis I use machine learning models for multi-class text classification to identify advertisements of cryptocurrency investment scams. My objective is to develop a highly accurate method for detecting cryptocurrency investment scam advertisements at scale by analysing the text content of posts in online forums and social media. Unlike previous studies that primarily relied on wallet and transaction data for classification, this thesis leverages the text content within forum posts to identify fraudulent schemes. This method provides an alternative scalable solution for early scam detection, potentially improving the effectiveness of existing monitoring systems and broadening the scope of detection in cases where transaction data is limited or unavailable. By testing and comparing several machine learning models, I discover that traditional models such as XGBoost can outperform more advanced deep learning models such as LSTM. However, I also find that the BERT model achieves the best performance without overfitting the training data.
Second, I use this model to identify cryptocurrency investment scam advertisements in the Bitcointalk forum and the lures used by cybercriminals to attract victims into these fraudulent schemes. Once these advertisements and their lures are identified, I investigate their frequency change over 13 years. The main findings show that the forum’s usage reached its peak in 2018 and has decreased by more than 80% since then. I discover that changes in the forum activity seemed to follow Bitcoin price cycles up until 2018. The popularity of the forum has since fallen, probably due to users move to other online platforms. The findings also show that cryptocurrency scam advertisements on Bitcointalk reached a maximum of 85 per day in January 2018. A transition in scam-related keywords used in the forum posts over time is also discovered. From 2014 to 2018, the keyword “Ponzi” was most frequently mentioned, whereas from 2018 to 2021, “HYIP” became more prevalent. The findings also indicate that between 2015 and 2017, cybercriminals lured victims into these fraudulent schemes with promises of financial rewards. However, between 2018 and 2023, they used mainly authoritative and distraction tactics as their primary luring technique. The results show that during the COVID-10 pandemic there was an increased number of cryptocurrency scam advertisements identified in the forum.
Third, I use the BERT model to identify cryptocurrency scam advertisements across four platforms: Bitcointalk, Reddit, YouTube, and mobile applications advertised on 12 modded app markets. The main findings show that the best-performing model is BERT, which is used to identify 94,821 cryptocurrency scam advertisements across all platforms. Subsequently, I analyse the semantic and syntactic characteristics of all cryptocurrency scam advertisements identified. I also explore differences in the number of cryptocurrency investment scam advertisements found within retirement-related subreddits compared with other subreddits focused on investment. The results show that retirement subreddits contain fewer cryptocurrency investment scam advertisements than other investment-focused subreddits.
Finally, I estimate monetary revenues by analysing the transactions of Bitcoin addresses found within each cryptocurrency scam advertisement and linked webpages. The monetary loss estimation from 976 Bitcoin addresses reflects that more than $45 million has been lost with an average loss per transaction of $3,707.35.
In this thesis, I provide a methodology that can help detect cryptocurrency investment scam advertisements proactively by analysing their text content, a technique that has not been used before. I also shed light into the tactics that cybercriminals can employ to lure victims into cryptocurrency investment scams by analysing conversations at scale from several online platforms. The findings can contribute to the development of more effective policies and robust systems, enhancing the detection of cryptocurrency investment scams and protecting users from these deceptive schemes.