Augmenting Multi-modal Question Answering Systems with Retrieval Methods

Lin, Weizhe

doi:https://doi.org/10.17863/CAM.112647

Augmenting Multi-modal Question Answering Systems with Retrieval Methods

Repository URI

https://www.repository.cam.ac.uk/handle/1810/374676

Repository DOI

https://doi.org/10.17863/CAM.112647

Files

Primary Thesis (16.76 MB)

Type

Thesis

Authors

Lin, Weizhe

https://orcid.org/0000-0002-0754-4524

Abstract

The quest to develop artificial intelligence systems capable of handling intricate tasks has propelled the prominence of deep learning, particularly since 2016, when neural network models emerged as the mainstream approach. With applications ranging from recommender systems to speech recognition, these models have revolutionised various domains. However, challenges persist, especially in incorporating extensive domain-specific knowledge and mitigating the generation illusion inherent in large language models.

This thesis explores the integration of retrieval-augmented generation (RAG) into multi-modal question answering (QA) systems as a solution to these challenges. By leveraging external knowledge sources, RAG enhances model accuracy and access to domain-specific information. The research unfolds in the following order:

Firstly, to efficiently and effectively leverage the external knowledge for answering knowledge-intensive, visually-grounded questions, we introduce RA-VQA (Retrieval Augmented Visual Question Answering), a framework tailored for knowledge-based visual question answering (KB-VQA). We demonstrate the efficacy of joint training for retriever and generator models in maximising performance.

Secondly, FVQA (Fact-based Visual Question Answering) 2.0 introduces semi-automatically annotated adversarial samples to address data distribution imbalances and enhance system robustness, showcasing substantial improvements in handling challenging scenarios.

Thirdly, the development of FLMR (Fine-grained Late-interaction Multi-modal Retriever), a state-of-the-art multi-modal retriever, and its scaled-up version, PreFLMR (Pre-trained FLMR), underscore the significance of late-interaction models in achieving superior multi-modal retrieval performance. We show that the proposed models are capable of capturing finer-grained interactions between query and context, offering efficient and accurate retrieval across a wide range of multi-modal retrieval tasks.

Then the focus pivots to retrieval methods in TableQA, introducing ITR (Inner Table Retriever) for closed-domain scenarios and LI-RAGE (Late Interaction Retrieval Augmented Generation with Explicit Signals) for open-domain TableQA tasks. Both frameworks exhibit remarkable performance improvements over existing approaches. We show that incorporating retrieval methods in TableQA substantially pushed the research boundary, offering state-of-the-art question answering performance.

Through meticulous experimentation and innovation, this thesis not only advances the theoretical understanding of multi-modal retrieval augmented systems but also contributes practical frameworks and datasets that address critical challenges in question answering across diverse domains. As the journey towards effective AI systems continues, these contributions serve as a solid foundation for future advancements in information retrieval and question answering in multi-modal contexts.

Date

2024-07-11

Advisors

Byrne, William

Keywords

artificial intelligence, dense retrieval, information retrieval, knowledge-based visual question answering, machine learning, multi-modal retrieval, multi-modal systems, retrieval augmented systems, table question answering, visual question answering

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Sponsorship

Weizhe Lin is supported by a Research Studentship funded by Toyota Motor Europe (RG92562(24020)) for the undertaking of the PhD in Engineering at the University of Cambridge.

Collections

Theses - Engineering

Augmenting Multi-modal Question Answering Systems with Retrieval Methods

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Date

Advisors

Keywords

Qualification

Awarding Institution

Rights and licensing

Sponsorship

Collections