Repository logo
 

A Study of Retrieval Models for Long Documents and Queries in Information Retrieval

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Cummins, Ronan 

Abstract

Recent research has shown that long documents are unfairly penalised by a number of current retrieval methods. In this paper, we formally analyse two important but distinct reasons for normalising documents with respect to length, namely verbosity and scope, and discuss the practical implications of not normalising accordingly. We review a number of language modelling approaches and a range of recently developed retrieval methods, and show that most do not correctly model both phenomena, thus limiting their retrieval effectiveness in certain situations. Furthermore, the retrieval characteristics of long natural language queries have not traditionally had the same attention as short keyword queries. We develop a new discriminative query language modelling approach that demonstrates improved performance on long verbose queries by appropriately weighting salient aspects of the query. When combined with query expansion, we show that our new approach yields state-of-the-art performance for long verbose queries.

Description

Keywords

4605 Data Management and Data Science, 46 Information and Computing Sciences, 4609 Information Systems

Journal Title

Proceedings of the 25th International Conference on World Wide Web

Conference Name

WWW '16: 25th International World Wide Web Conference

Journal ISSN

Volume Title

Publisher

International World Wide Web Conferences Steering Committee