Evaluating adversarial attacks against multiple fact verification systems

Thorne, J; Vlachos, A; Christodoulopoulos, C; Mittal, A

Evaluating adversarial attacks against multiple fact verification systems

Published version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/305856

Repository DOI

https://doi.org/10.17863/CAM.52938

Files

Published version (237.22 KB)

Type

Conference Object

Authors

Thorne, J

Vlachos, Andreas

https://orcid.org/0000-0003-2123-5071

Christodoulopoulos, C

Mittal, A

Abstract

© 2019 Association for Computational Linguistics Automated fact verification has been progressing owing to advancements in modeling and availability of large datasets. Due to the nature of the task, it is critical to understand the vulnerabilities of these systems against adversarial instances designed to make them predict incorrectly. We introduce two novel scoring metrics, attack potency and system resilience which take into account the correctness of the adversarial instances, an aspect often ignored in adversarial evaluations. We consider six fact verification systems from the recent Fact Extraction and VERification (FEVER) challenge: the four best-scoring ones and two baselines. We evaluate adversarial instances generated by a recently proposed state-of-the-art method, a paraphrasing method, and rule-based attacks devised for fact verification. We find that our rule-based attacks have higher potency, and that while the rankings among the top systems changed, they exhibited higher resilience than the baselines.

Journal Title

EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference

Conference Name

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Publisher

Association for Computational Linguistics

Publisher DOI

https://doi.org/10.18653/v1/d19-1292

Rights

Attribution 4.0 International

Collections

Cambridge University Research Outputs