Repository logo
 

A Study of Retrieval Models for Long Documents and Queries in Information Retrieval

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Change log

Abstract

Recent research has shown that long documents are unfairly penalised by a number of current retrieval methods. In this paper, we formally analyse two important but distinct reasons for normalising documents with respect to length, namely verbosity and scope, and discuss the practical implications of not normalising accordingly. We review a number of language modelling approaches and a range of recently developed retrieval methods, and show that most do not correctly model both phenomena, thus limiting their retrieval effectiveness in certain situations. Furthermore, the retrieval characteristics of long natural language queries have not traditionally had the same attention as short keyword queries. We develop a new discriminative query language modelling approach that demonstrates improved performance on long verbose queries by appropriately weighting salient aspects of the query. When combined with query expansion, we show that our new approach yields state-of-the-art performance for long verbose queries.

Description

Journal Title

Proceedings of the 25th International Conference on World Wide Web

Conference Name

Proceedings of the 25th International Conference on World Wide Web

Journal ISSN

Volume Title

Publisher

Association for Computing Machinery (ACM)

Rights and licensing

Except where otherwised noted, this item's license is described as http://www.rioxx.net/licenses/all-rights-reserved