Repository logo
 

ISARSTEP: A BENCHMARK FOR HIGH-LEVEL MATHEMATICAL REASONING

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Li, W 
Yu, L 
Wu, Y 
Paulson, LC 

Abstract

A well-defined benchmark is essential for measuring and accelerating research progress of machine learning models. In this paper, we present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models. We build a non-synthetic dataset from the largest repository of proofs written by human experts in a theorem prover. The dataset has a broad coverage of undergraduate and research-level mathematical and computer science theorems. In our defined task, a model is required to fill in a missing intermediate proposition given surrounding proofs. This task provides a starting point for the long-term goal of having machines generate human-readable proofs automatically. Our experiments and analysis reveal that while the task is challenging, neural models can capture non-trivial mathematical reasoning. We further design a hierarchical transformer that outperforms the transformer baseline.

Description

Keywords

Journal Title

ICLR 2021 - 9th International Conference on Learning Representations

Conference Name

International Conference on Learning Representations

Journal ISSN

Volume Title

Publisher

OpenReview.net

Rights

All rights reserved
Sponsorship
European Research Council (742178)
ERC Advanced Grant ALEXANDRIA (Project GA 742178)