Evidence-based verification and correction of textual claims

Thorne, James

Evidence-based verification and correction of textual claims

Repository URI

https://www.repository.cam.ac.uk/handle/1810/333449

Repository DOI

https://doi.org/10.17863/CAM.80873

Files

Thesis (2.98 MB)

Type

Thesis

Authors

Thorne, James

Abstract

This thesis considers the task of fact-checking: predicting the veracity of claims made in written or spoken language using evidence. However, in previous task formulations, modelling assumptions ignore the requirement for systems to retrieve the necessary evidence. To better model how human fact-checkers operate, who first find evidence before labelling a claim's veracity, the methodology proposed in this thesis requires automated systems to retrieve evidence from a corpus to justify the veracity predictions made when modelling this task. The primary contribution of this thesis is the development and release of FEVER, a large-scale collection of human-written claims annotated with evidence from Wikipedia. Analysis of systems trained on this data highlights challenges in resolving ambiguity and context, as well as being resilient to imperfect evidence retrieval. To understand the limitations of models trained on datasets such as FEVER, contemporary fact verification systems are further evaluated using adversarial attacks -- instances constructed specifically to identify weaknesses and blind spots. However, as automated means for generating adversarial instances induce their own errors, this thesis proposes considering instances' correctness, allowing fairer comparison. The thesis subsequently considers how biases captured in these models can be mitigated with fine-tuning regularised with elastic weight consolidation. Finally, the thesis presents a new extension to the verification task: factual error correction. Rather than predicting the claim's veracity, systems must also generate a correction for the claim so that it is better supported by evidence, acting as another means to communicate the claim's veracity to an end-user. In contrast to previous work on explainable fact-checking, the method proposed in this chapter does not require additional data for supervision.

Date

2021-09-01

Advisors

Vlachos, Andreas

Keywords

natural language processing, machine learning, fact verification, evidence, fact-checking, fact checking, nlp

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Sponsorship

This thesis was supported with the support of an Amazon Alexa Graduate Research Fellowship

Collections

Theses - Computer Science and Technology