Synthetically Supervised Feature Learning for Scene Text Recognition

Liu, Y; Wang, Z; Jin, H; Wassell, I

Synthetically Supervised Feature Learning for Scene Text Recognition

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/286387

Repository DOI

https://doi.org/10.17863/CAM.33700

Files

Accepted version (1.14 MB)

Type

Conference Object

Authors

Liu, Y

Wang, Z

Jin, H

Wassell, Ian

https://orcid.org/0000-0001-7927-5565

Abstract

We address the problem of image feature learning for scene text recognition. The image features in the state-of-the-art methods are learned from large-scale synthetic image datasets. However, most meth- ods only rely on outputs of the synthetic data generation process, namely realistically looking images, and completely ignore the rest of the process. We propose to leverage the parameters that lead to the output images to improve image feature learning. Specifically, for every image out of the data generation process, we obtain the associated parameters and render another “clean” image that is free of select distortion factors that are ap- plied to the output image. Because of the absence of distortion factors, the clean image tends to be easier to recognize than the original image which can serve as supervision. We design a multi-task network with an encoder-discriminator-generator architecture to guide the feature of the original image toward that of the clean image. The experiments show that our method significantly outperforms the state-of-the-art methods on standard scene text recognition benchmarks in the lexicon-free cate- gory. Furthermore, we show that without explicit handling, our method works on challenging cases where input images contain severe geometric distortion, such as text on a curved path.

Keywords

Scene text recognition, Deep learning, Neural networks, Feature learning, Synthetic data, Multi-task learning

Journal Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Conference Name

European Conference on Computer Vision

Journal ISSN

0302-9743
1611-3349

Volume Title

11209 LNCS

Publisher

Springer International Publishing

Publisher DOI

https://doi.org/10.1007/978-3-030-01228-1_27

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Collections

Cambridge University Research Outputs