ISARSTEP: A BENCHMARK FOR HIGH-LEVEL MATHEMATICAL REASONING

Li, W; Yu, L; Wu, Y; Paulson, LC

ISARSTEP: A BENCHMARK FOR HIGH-LEVEL MATHEMATICAL REASONING

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/319494

Repository DOI

https://doi.org/10.17863/CAM.66615

Files

Accepted version (838.48 KB)

Type

Conference Object

Authors

Li, W

Yu, L

Wu, Y

Paulson, LC

Abstract

A well-defined benchmark is essential for measuring and accelerating research progress of machine learning models. In this paper, we present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models. We build a non-synthetic dataset from the largest repository of proofs written by human experts in a theorem prover. The dataset has a broad coverage of undergraduate and research-level mathematical and computer science theorems. In our defined task, a model is required to fill in a missing intermediate proposition given surrounding proofs. This task provides a starting point for the long-term goal of having machines generate human-readable proofs automatically. Our experiments and analysis reveal that while the task is challenging, neural models can capture non-trivial mathematical reasoning. We further design a hierarchical transformer that outperforms the transformer baseline.

Journal Title

ICLR 2021 - 9th International Conference on Learning Representations

Conference Name

International Conference on Learning Representations

Publisher

OpenReview.net

Publisher DOI

https://doi.org/10.17863/CAM.66615

Rights

Sponsorship

European Research Council (742178)

ERC Advanced Grant ALEXANDRIA (Project GA 742178)

Collections

Cambridge University Research Outputs