Modelling the usefulness of document collections for query expansion in patient search
Type
Change log
Authors
Abstract
Dealing with the medical terminology is a challenge when searching for patients based on the relevance of their medical records towards a given query. Existing work used query expansion (QE) to extract expansion terms from different document collections to improve query representation. However, the usefulness of particular document collections for QE was not measured and taken into account during retrieval. In this work, we investigate two automatic approaches that measure and leverage the usefulness of document collections when exploiting multiple document collections to improve query representation. These two approaches are based on resource selection and learning to rank techniques, respectively. We evaluate our approaches using the TREC Medical Records track’s test collection. Our results show the potential of the proposed approaches, since they can effectively exploit 14 different document collections, including both domain-specific (e.g. MEDLINE abstracts) and generic (e.g. blogs and webpages) collections, and significantly outperform existing effective baselines, including the best systems participating at the TREC Medical Records track. Our analysis shows that the different collections are not equally useful for QE, while our two approaches can automatically weight the usefulness of expansion terms extracted from different document collections effectively.