Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Teichmann, Marvin; André, Araujo; Menglong, Zhu; Jack, Sim

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/294927

Repository DOI

https://doi.org/10.17863/CAM.42014

Files

Accepted version (9.71 MB)

Type

Conference Object

Authors

Teichmann, Marvin

André, Araujo

Menglong, Zhu

Jack, Sim

Abstract

Retrieving object instances among cluttered sceneefficiently requires compact yet comprehensive regionaimage representations. Intuitively, object semantics cahelp build the index that focuses on the most relevanregions. However, due to the lack of bounding-box datasefor objects of interest among retrieval benchmarks, morecent work on regional representations has focused oeither uniform or class-agnostic region selection. In thpaper, we first fill the void by providing a new dataset olandmark bounding boxes, based on the Google Landmarkdataset, that includes 86k images with manually curateboxes from 15k unique landmarks. Then, we demonstrahow a trained landmark detector, using our new datasecan be leveraged to index image regions and improvretrieval accuracy while being much more efficient thaexisting regional methods. In addition, we introduce novel regional aggregated selective match kernel (R-ASMKto effectively combine information from detected regioninto an improved holistic image representation. R-ASMboosts image retrieval accuracy substantially with ndimensionality increase, while even outperforming systemthat index image regions independently. Our complete imagretrieval system improves upon the previous state-of-the-aby significant margins on the Revisited Oxford and Pardatasets. Code and data available at the project webpaghttps://github.com/tensorflow/models/ tree/master/research/delf.

Conference Name

CVPR

Publisher DOI

https://doi.org/10.17863/CAM.42014

Rights

Collections

Cambridge University Research Outputs