Classification of twitter accounts into automated agents and human users
View / Open Files
Publication Date
2017-07-31Journal Title
Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017
Conference Name
ASONAM '17: Advances in Social Networks Analysis and Mining 2017
ISBN
9781450349932
Publisher
ACM
Pages
489-496
Type
Conference Object
This Version
AM
Metadata
Show full item recordCitation
Gilani, Z., Kochmar, E., & Crowcroft, J. (2017). Classification of twitter accounts into automated agents and human users. Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017, 489-496. https://doi.org/10.1145/3110025.3110091
Abstract
© 2017 Association for Computing Machinery. Online social networks (OSNs) have seen a remarkable rise in the presence of surreptitious automated accounts. Massive human user-base and business-supportive operating model of social networks (such as Twitter) facilitates the creation of automated agents. In this paper we outline a systematic methodology and train a classifier to categorise Twitter accounts into ‘automated’ and ‘human’ users. To improve classification accuracy we employ a set of novel steps. First, we divide the dataset into four popularity bands to compensate for differences in types of accounts. Second, we create a large ground truth dataset using human annotations and extract relevant features from raw tweets. To judge accuracy of the procedure we calculate agreement among human annotators as well as with a bot detection research tool. We then apply a Random Forests classifier that achieves an accuracy close to human agreement. Finally, as a concluding step we perform tests to measure the efficacy of our results.
Identifiers
External DOI: https://doi.org/10.1145/3110025.3110091
This record's URL: https://www.repository.cam.ac.uk/handle/1810/298180
Rights
All rights reserved
Licence:
http://www.rioxx.net/licenses/all-rights-reserved
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.