This is an account of a mathematician’s first experiences with the proof assistant (interactive theorem prover) Isabelle/HOL, including a discussion on the rationale behind formalising mathematics and the choice of Isabelle/HOL in particular, some instructions for new users, some technical and conceptual observations focussing on some of the first difficulties encountered, and some thoughts on the use and potential of proof assistants for mathematics.

Writing a mathematical proof could be compared to making a beautiful carpet featuring elaborate patterns: the end result looks impressive, mind-boggling even, but the actual process in fact consists of simple arguments, like small stitches. While formalising

It should be clarified that proof assistants are not the same as computer algebra tools, such as MATLAB or Mathematica: the latter mainly do computations, while the former mainly do

In an online list [

Nowadays, the most popular proof assistants for formalising mathematics are Isabelle, Coq, Lean, Mizar, HOL4, HOL Light, Agda and Metamath, but this article, as the title suggests, focusses on Isabelle/HOL (: Higher Order Logic) in particular. Isabelle was first developed by Lawrence Paulson and Tobias Nipkow [

Proof assistants are becoming more and more popular among younger mathematicians and students. For example, note the prize-winning project by a group of undergraduate students from Bremen formalising Matiyasevich’s proof of Hilbert’s 10th problem with Isabelle/HOL [

Hoping that this may be of help to students interested in formalisation of mathematics and automated reasoning, I am sharing this report

The plan of this paper is as follows: in Sect.

However one would like to describe mathematical practice, it is certainly not the same as programming (nor should it be reduced to it). As computers and artificial intelligence are becoming more and more integrated into every aspect of our lives, “modernising” mathematical practice in that respect does not sound unexpected, but this is not an issue of modernisation for the sake of it (although for some people this would be a sufficient reason); in fact, formalising mathematics offers many direct benefits. There are different approaches to making a choice about what material to formalise. These are: (a) formalising the mathematical curriculum, that is, basic material that undergraduate students are usually taught; (b) formalising advanced, famous results, e.g. the aforementioned formalisations of Gödel’s incompleteness theorems, the four-colour theorem and Hilbert’s 10th problem; (c) formalising new research results, that can be either from mainstream research papers or that could be considered groundbreaking (like the aforementioned proof of the Kepler conjecture). To the above we can add a fourth strategy, which is not quite achievable with the current state-of-the-art but is one of the main goals for the future of the field: (d) discover new mathematical results through the process of formalising.

There are different expected benefits that motivate each strategy.

An obvious first reason is verification. This applies mostly to (c), since in all other cases the material has been checked by a great number of people over the years and we wouldn’t expect to find mistakes in elementary material. With more advanced and more recent results the probability of finding mistakes of course starts to increase. A recent example in the theory of Gromov-hyperbolic spaces is a mistake found through formalisation in Isabelle by Gouëzel and Shchur [

Another reason, applying mostly to (a) and (b), is that contributing to the (very fast growing) libraries of formal proofs amounts to the creation of a database with a huge potential. A “physical” mathematical library consisting of “material” books, or even an online library consisting of pdf files, is more restrictive – while here we have to do with a library written in code: something we can modify, interact with, reuse. More importantly, formalised material can be used to create tools for goal (d). A vision for the future is the creation of an interactive assistant that would provide “brainstorming” tips to research mathematicians in real time assisting them in the process of discovering (or inventing) a new result.

A third reason, applying to all (a), (b), (c) (and (d)), is that the process of formalising in itself can help the user gain brand new insights even in already familiar topics. To a large extent this is because of the high level of detail in which a formalised proof must be written, but also because using new tools forces to look at familiar material from a new angle.

Last but not least, formalisation can also serve educational purposes (this applies mostly to (a)).

I will not elaborate further on a general discussion about this topic – I rather restrict to mentioning the inspiring papers by Avigad [

As a pure mathematician with some background in logic and proof theory, my own interest in the topic was initially driven not only by a fascination for the emerging culture of re-imagining mathematical practice in the light of new AI developments but also by philosophical questions on the nature of mathematical proofs, e.g. when encountering in my own research work different proofs of similar statements giving completely different computational content [

In particular, regarding my mathematics background, my PhD research [

What is it that makes a “good” proof?

which of course has many possible answers:

a shorter proof;

a more “elegant” proof (which is of course usually subjective);

a simpler proof (consider Hilbert’s 24th problem (1900):

thinking in terms of Reverse Mathematics – a proof in a weaker subsystem of

an interdisciplinary proof (e.g. a geometric proof for an algebraic problem or vice-versa would be considered to give a deeper mathematical insight);

a proof that is easier to reuse i.e. if it provides some algorithm or technique or intermediate result that can be useful in different contexts too;

a proof giving “better” computational content.

What do we mean by “better” computational content?

a bound of lower complexity?

a bound that is more precise numerically?

a bound that is more “elegant”?

How are the aforementioned proof features related to each other (if at all)?

Could we ever ensure that we get the optimal computational content (from a given proof)?

I joined the ALEXANDRIA project in October 2017, some months after obtaining my PhD in pure mathematics and with no prior formalisation experience in any proof assistant. Although proof mining has trained me in patiently de-constructing proofs into every elementary sub-step and in tidying up proofs after finding their underlying logical structure, which are both necessary skills for anyone interested in formalisation, starting to explore a proof assistant was to me a brand new challenge.

The project ALEXANDRIA:

Isabelle has several major advantages over other proof assistants: first of all, it uses the structured language

The first step is of course to download Isabelle (which is distributed for free under open-source licenses) from the Isabelle website [

Reading the manuals and tutorials can be helpful, but usually learning-by-doing can be more efficient. It would be thus advisable to start by exploring some theories from the library (e.g., after having installed Isabelle, by opening a session and going to

File→ Open→ isabelle→ src→ HOL→ Analysis

to get to the Analysis Library theory files) and/or download one of the AFP entries (there are detailed instructions on the website [

A theory file starts with theory [the name of the file] followed by imports [the libraries or AFP entries that need to be imported for the purposes of the theory at hand]. The theory is then developed between the keywords begin and end. The file should be saved with the same name that is used right after theory, followed by “.thy”.

The user can call the automation tools to look for proofs simply by typing try0 and Sledgehammer. In order to search for already formalised material in the loaded theories of the active session, the user can type find_theorems or find_theorems name: followed by some search word. While looking at an open .thy file, the user can place the cursor on any object and by pressing the Command key and clicking, the jEdit interface takes the user to the theory where the object is defined. The keyword sorry can be used in the place of a missing proof, so that a statement can be temporarily regarded as “proven” and the user may even refer to the result to use it in other proofs and return to the missing proof later. The keyword oops has a similar use, except that it “abandons” a proof typically in the middle; unlike sorry, it cancels the entire proof attempt up to the previous lemma/theorem/proposition/corollary keyword and the claimed statement is not available to be used.

Looking at examples online from the Library [

Note that while working directly on a formalisation within an active Isabelle session the colours will differ, namely the user will notice that fixed/free variables appear in blue and bound variables appear in green.

Most new users would agree that Isar looks understandable, at least not much less than

A definition of the 7-dimensional real cross product from my AFP entry on Octonions [

A statement attesting that norm equality implies inner product equality (and the converse) together with a proof, from the Analysis/HOL Library [

Let us now look at an example in more detail, comparing the “informal” with the formal proof. Consider the following version of the Cauchy-Schwarz inequality together with a proof. An “informal” proof is given below. Note that we use

The above proof is written in Isabelle (this is from the Analysis/HOL Library [

We continue with a few more examples of well-known statements formulated in Isabelle.

The statement of the Stone-Weierstrass theorem for polynomial functions, from the Analysis/HOL Library [

A version of Zorn’s lemma, together with a proof, from the HOL Library [

The statement and a proof of the Riemann mapping theorem, from the Complex_Analysis/HOL Library [

As in any new programming (or even natural) language, the first challenge for a new user is familiarisation with the syntax and the essential keywords. In fact, I found Isar to be both quite intuitive in terms of structure and easily readable, while the jEdit user interface is very user-friendly. The fact that Isar admits structured proofs is a major advantage. There are, however, certain (mostly syntactic) features that may seem surprising to a new user. Some miscellaneous characteristic examples are:

The standard proof patterns

have "a<b" also have "...<c" finally show "a<c" by auto and

have "a<b" moreover have "...<c" ultimately show "a<c" by

auto.

The syntax does not allow for shortcuts such as “

"a>0∧b>0∧c>0" or "a>0" and "b>0" and "c>0".

The user has to remember to always include type information. Even arabic numbers are in certain cases regarded as constants of some unknown type unless their type is explicitly stated (e.g.(1::int)/(2::int)^0 = 1 is proven just by the proof method simp while for 1/2^0 = 1 automation gives no answer nor do we get any explanatory error messages). Moreover, for exponentiation, if the exponent is of type real or integer one has to use powr, while for type natural the symbol ^ works. Also, often the user has to switch the type of a variable or constant (of_int, of_real, of_nat) when confronted for instance with division.

The meaning of some keywords, e.g.: where, that, when, at_top,

sequentially.

The use of several symbols, e.g.: the meet and join operators for lattices are symbolised by ⊓, ⊔ instead of ∧, ∨ that are normally used in the literature; different kinds of arrows; the absolute value symbol.

Overall the extremely high level of detail in which proofs must be written.

After syntax, search is of very high significance for the mathematician user, both in terms of finding material by thematic classification, and of finding essential technical lemmas and definitions while in the process of formalising new work. In the latter case, it is extremely helpful that the interface allows for words to function as hyperlinks: as mentioned, by pressing the Command key while clicking on each object, the user is taken to the theory where the object at hand is defined. However, there is room for improvement with respect to search features, from different points of view. In particular:

(1) It is possible to search for theorems only on the basis of symbolic patterns occurring in them or of their names (via find_theorems or find_theorems name: as mentioned) so an unexpected name could create obstacles. Even though find_theorems is helpful most of the time, in some cases it could be limiting. In particular:

As search is done based on pattern-matching, especially new users may sometimes not know what are the appropriate search words for the notions that they are looking for (e.g. “summable” yields many results while “summability” yields no results, “infimum” and “supremum” yield no results while a lot of related material exists in the libraries etc). That is, searching for

find_theorems is case-sensitive (e.g. “borel” gives 510 results while “Borel” gives no results). We are aiming for case-insensitive search.

Search is performed only in the libraries and theories that have been already loaded by the user. Ideally, search would be performed in all the libraries (and the AFP) regardless of what the user has loaded.

It would be useful to differentiate the search for facts based on mathematical objects related to their statements from the search for facts based on mathematical objects related to their proofs and to patterns that may occur in their proofs.

An efficient method of filtering and ranking the search results would also be useful to have.

(3) Efficient search for proof patterns and algorithms is another big challenge to be tackled. To this end, employing tools from machine learning is considered very promising.

Within our project, Yiannos Stathopoulos and I are currently working on SErAPIS (:

The high level of detail that is required when formalising a mathematical proof can render the process time-consuming, thus efficient automation is vital. This is one important factor that makes Isabelle/HOL more user-friendly than other proof assistants. In general, Sledgehammer and try0 work remarkably well for very simple statements, while the possibility of counterexample finding with Quickcheck and nitpick can be very helpful. However, in many occasions certain elementary examples (e.g. splitting up summations) cannot be solved by automation as one would initially expect. Another observation is that for algebraic expressions where minus is involved, it is often much easier to find proofs by automation when we are working with type integer than when we are working with type natural, due to the respective definitions not having the same algebraic properties. Moreover, in certain cases where long and complicated expressions are involved and in examples involving distributivity, simplification, handling inequalities involving division/multiplicative inverses, comparing inequalities after multiplying with a complicated expression, comparing complicated expressions involving logarithms or exponents, Sledgehammer times out (while try0 also fails), even though in examples of the same style but involving simpler expressions they are successful in finding proofs. It is worth noting that automation power is constantly getting improved, so in a couple of years, if not sooner, more and more complicated examples will most likely be solvable by automation.

As a sidenote regarding automation, a common source of errors for inexperienced users is the use of the keyword sorry which temporarily regards an unproven statement as proven, so that the user can go on with the formalisation and even use the statement in other proofs, and return later to fill in the gap. This may lead automation to suggest a proof of a statement B using a wrong statement A that had been “shown” by sorry prior to statement B. This is a consequence of the principle of explosion in logic (ex falso sequitur quodlibet–from falsehood anything follows). Even though sorry is extremely helpful, new users should be aware of the above issue.

Finally, other issues of secondary importance that the user should be aware of relate to the rigidity of the structure of the formalised fundamental material in the Library. For instance, the definition of an “algebra” in the Library includes associativity, so working with nonassociative algebras is not possible within an existing type class, e.g. see my development of Octonions [

It should be noted that in my experience so far I did not encounter a problem with the fact that Isabelle is based on simple types instead of dependent types. Lawrence Paulson discusses this issue [

For the first project I suggested to formalise a proof that is (1) research-level, (2) using elementary material that is already in the Isabelle Library, (3) not very long or too complicated. I thus opted for an irrationality criterion for infinite series by Hančl [

After a discussion with Anthony Bordg I realised that it can also be meaningful to formalise a theorem even if all the proofs of the statements its proof relies on are yet not fully formalised. This would be achieved by using the theorems on which our theorem at hand depends as assumptions (implemented as a

While formalising Aristotle’s assertoric syllogistic in Isabelle/HOL [

I have also recently formalised some material on amicable numbers [

Another direction that currently interests me is a possible use of formalisation for the benefit of automating pen-and-paper proof mining, which I conjecture could be achieved once machine learning comes into play. I suggest, in particular, that while building extensive libraries of formal mathematical proofs, it would be meaningful to opt for formalising proofs whose computational content is made explicit in the meantime, so that as automation improves and blocks of (sub)proofs get generated automatically, the preserved computational content would get recycled, recombined and would eventually manifest itself in different contexts. To this end, I do not suggest restricting to only constructive proofs, but I suggest that proof mined (i.e. possibly also non-constructive proofs but with some explicit computational content) should be preferable, if possible [

Using a proof assistant to verify mathematics is somewhat reminiscent of writing down a relative consistency proof as logicians do. (Of course, this is not a problem for formalists.

This reminiscence can be seen on two different levels: (1) From the point of view of the correctness of the core of the system. This can be trusted considering the underlying architecture of the proof system and the small size of the core [

This part is to be read as a caution to new proof assistant users; as already explained, using a proof assistant is not a panacea guaranteeing correctness, as the work might always be prone to human errors which, as we illustrate in the examples below, may not be detected by the proof assistant. Mathematicians may make mistakes that fall into different categories. Several (extremely naive) examples for each category of mistakes and “mistakes” that may be committed by human mathematicians are given below, and it is shown how Isabelle may not necessarily detect them, as they are essentially “semantic” mistakes that are syntactically correct. Of course, a mathematician would not make the specific elementary mistakes mentioned below, but mistakes of a similar nature in a much more sophisticated context could possibly be made, and as is the case with the naive mistakes below, they may be similarly overlooked.

This is logically correct, but mathematically undesirable. This is accepted by Isabelle as an instance of the logical axiom

This is again logically correct, but mathematically undesirable and accepted by Isabelle as an instance of

We have seen that superfluous assumptions are permitted–moreover it can be that some might even contradict each other:

An instance of the above in mathematics, for example would be:

The first examples are instances of assuming a wrong mathematical fact while the last one is an instance of the assumptions being incompatible as in the above subsection. For example, such mistakes (unless of course we are using assumptions that we conjecture to be false so that they yield a proof by contradiction) could occur when an author makes bad use of the literature by using two or more nonequivalent definitions of a mathematical object within the same context, which may easily happen [

This instance of course falls into the aforementioned case of assuming wrong facts, but this demonstrates how imprecise approximations (which a user may be often tempted to make, especially when dealing with numerical computations) in particular can be accepted by Isabelle without any error messages, potentially leading to further wrong numerical results.

Mathematical practice involves trial-and-error (Lakatos [

Examples 6.1.1-6.1.4 should not be considered surprising: we can view them as a manifestation of the fact that Isabelle relies on logical inference [

Examples 6.1.5 and 6.1.6 are not surprising either, as for Isabelle to suggest a modification of the assumptions as a human mathematician would do would require Isabelle to feature a kind of “Intelligence” that no proof assistant nowadays features (not yet?). Russell and Norvig [

The conclusion to draw from the above examples is that, after all, the user is the one in control and thus should be careful to use Isabelle (or any other proof assistant) responsibly. It should be stressed that the high value of a proof assistant lies in the fact that it prevents another very common and dangerous source of fallacy in mathematical practice:

Is the way that we are doing mathematics being revolutionised? Most likely, yes.

This is reflected in an important recent milestone; for the first time, a class titled “Computer science support for mathematical research and practice” is included in the new Mathematics Subject Classification announced in early 2020 (MSC 2020, Class 68Vxx). This class includes topics such as computer assisted proofs, proofs employing automated/ interactive theorem provers, formalisation of mathematics in connection with theorem provers. Thus, the aforementioned topics are now officially recognised as areas of mathematical research (in addition to computer science research as it was until now).

As an anecdote indicative of the current climate I mention that during the panel discussion of the workshop

This revolution in mathematical practice can be regarded as two-fold: The first aspect is associated with our expectation of a certain level of correctness and thus refers more to the benefits of formalisation for the purpose of verification. Barendregt and Wiedijk [

The second aspect of the revolution is associated with the tools used in our day-to-day work. A groundbreaking tool would be an interactive assistant that would provide working mathematicians with brainstorming in the form of tips on how to prove a statement, or even suggestions for new conjectures. This would be achieved by implementing machine learning tools in an appropriate way, and after a substantial amount of mathematics has been formalised. Today we are far from this goal, but even before we approach it, just typing mathematics in Isar (or any other machine language) instead of using paper and pen is already a big shift in our way of working, which would inevitably influence our way of thinking. We are one of the first generations that now type much more than using a pen in our daily lives, and it is conceivable that our strong impulse to write with a pen or a piece of chalk while shaping a brand new thought or trying to understand an idea may disappear in future generations. This is of course an issue related to psychology and cognitive science rather than purely to mathematical practice. As a striking example it should at least be noted that most mathematicians nowadays have no trouble understanding

Aristotle in his Nicomachean Ethics wrote [

The author was supported by the ERC Advanced Grant ALEXANDRIA (Project 742178) funded by the European Research Council and led by Professor Lawrence Paulson at the University of Cambridge, UK. Thanks to Lawrence Paulson for reading and commenting on an earlier version of this paper, to the anonymous reviewers for their useful suggestions and to Lawrence Paulson, Wenda Li, Manuel Eberl, Jeremy Avigad, Kevin Buzzard, Peter Koepke, Anthony Bordg, Edward Ayers and Mohammad Abdulaziz for many interesting discussions. The workshop

I consider the terms “mechanising”/ “mechanisation” as perhaps more appropriate than the terms “formalising”/ “formalisation” in our context, as the latter may be confused with formalisation in the sense of what was attemped e.g. in Principia Mathematica by Whitehead and Russell [

HOL, Mizar, PVS, Coq, Otter/Ivy, Isabelle/Isar, Alfa/Agda, ACL2, PhoX, IMPS, Metamath, Theorema, Leog, Nuprl, Omega, B method and Minlog.

HOL Light, Isabelle, Metamath, Coq, Mizar, ProofPower, Lean, PVS, nqthm/ACL2, NuPRL/MetaPRL.

This is an updated version of some earlier online notes from July 2019 that I had shared on ResearchGate.

Although the original proofs of the four-colour theorem and the Kepler conjecture were computer-assisted, they were not discovered through the process of formalising, but were formalised later for verification purposes.

Kevin Buzzard gave an instance of this phenomenon regarding notions of equivalence and similarity for groups during his talk at

These can also be found under “Documentation” on [

Isabelle mailing list archive, available on

Here we refer to formalism as the philosophical view that mathematical statements can be considered as statements about the consequences of the manipulation of symbols using established rules and that ontologically mathematics is much more akin to a game than a representation of some objective reality.

Benedikt Löwe gave an analysis of the evolution of the use of

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.