Language models for verifiable mathematical automation: Interaction, integration, and autoformalization
Repository URI
Repository DOI
Change log
Authors
Abstract
Stronger automation in formal mathematical reasoning provides scalability, trust-worthiness, and accessibility: it enables efficient verification of complex proofs, reduces the likelihood of errors in intricate calculations, allows non-experts to engage with mathematical concepts, and potentially facilitates the discovery of novel mathematical insights through machine-driven exploration. Traditionally, automated reasoners only have formal-language repositories at their disposal and cannot use the much larger natural-language corpora, since natural language comprehension is difficult. While controlled natural language interfaces with formal syntax have been designed (e.g., Naproche), they are ultimately too rigid to be widely useful. Large language models, on the other hand, excel at language comprehension and have exposure to innumerable mathematical documents humans accumulated throughout the ages, rendering them an ideal candidate for an automation tool. Modern language models have been trained on a vast amount of data, and can be instructed to carry out an impressive number of tasks. The versatility of language models makes them complementary to traditional reasoners, whose rigidity severely limits wider adoption and easier use. The goal of this thesis is to answer how we can achieve trustworthy automation in mathematical reasoning with language models. To this end, we introduce a novel evaluation of language models to determine their suitable scenarios in mathematical reasoning. With the insights obtained, we then train them to write formal proofs in those scenarios with assistance from automated theorem provers, and use them to translate mathematical theorems and proofs from LATEX to formal languages. Concretely, (1) We conduct the first interactive evaluation of language models on informal mathematical reasoning. The evaluation reveals language models’ inherent strength in memorisation and having flexible inputs/outputs, and weaknesses in complex reasoning and algebraic manipulations. (2) With LLMs’ strengths and weaknesses in mind, we develop the first approach combining automated theorem provers and language models for proof search: Thor. Thor uses language models for creative proof steps, and calls Sledgehammer (Isabelle’s tool with access to many automated theorem provers) to fill in premise selection details. We also develop Magnushammer, a hammer-like system with a transformer- based relevance filter, and show that Magnushammer outperforms Sledgehammer both individually and as a plugin for Thor. (3) All existing approaches for proof search operate either over natural or formal language, without bringing together their respective expressiveness and rigour. To remedy this dichotomy, we use language models to perform autoformalization, the process of turning informal mathematical theorems and proofs into formal ones. We validate the usefulness of autoformalization by showing autoformalized theorems can serve as training data to improve proof search systems. We also take repositories of formal theorems in multiple languages and informalise them into natural language. This results in a large parallel dataset of informal and formal mathematical theorems, MMA, which induces significant autoformalization capability boost when language models are fine-tuned on it. (4) Finally, we construct Draft, Sketch, and Prove, a methodology that uses language models to translate informal proofs into formal proof sketches, and then calls automated theorem provers to fill in the gaps. This neuro-symbolic method can empower the informal proofs with rigour by effectively translating them into formal ones and verifying them. The research in this thesis presents a new paradigm of theorem proving research in which language models constitute both a scalable power engine for automating numerous tasks and an accessible bridge between human users and machines for reasoning.
Description
Date
Advisors
Li, Wenda

