Repository logo
 

Detecting Trending Terms in Cybersecurity Forum Discussions

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Aycock, Seth 
Buttery, Paula 

Abstract

We present a lightweight method for identifying currently trending terms in relation to a known prior of terms, using a weighted log-odds ratio with an informative prior. We apply this method to a dataset of posts from an English-language underground hacking forum, spanning over ten years of activity,with posts containing misspellings, orthographic variation, acronyms, and slang. Our statistical approach supports analysis of linguistic change and discussion topics over time, without a requirement to train a topic model for each time interval for analysis. We evaluate the approach by comparing the results to TF-IDF using the discounted cumulative gain metric with human annotations, finding our method outperforms TF-IDF on information retrieval.

Description

Keywords

Journal Title

Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Conference Name

Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Journal ISSN

Volume Title

Publisher

Association for Computational Linguistics

Rights

All rights reserved
Sponsorship
Cambridge Assessment (unknown)
Cambridge Assessment (Unknown)
ESRC (ES/T008466/1)
Engineering and Physical Sciences Research Council (2276284)
Cambridge Assessment, University of Cambridge EPSRC Doctoral Training Studentship (Jack Hughes)