Functional programming in Gallina

Warning

TCPA’s home page recommends newcomers to start with a tutorial or book. The present lecture notes are such a texbook, one which is meant to be progressive and self-contained. So if you wonder about something, check out the index (top right and bottom right of every page).

It is counter-productive to look for things online by default: doing so will keep you insecure in your knowledge, forever depending on others. Also, most of the time, online things

  • use terms you do not know yet, and/or
  • use shortcuts without explaining what they cut short,

and then what.

Use the lecture notes as a firm point to lift your programming and proving world and to acquire an actionable knowledge in which you feel secure and with which you can move forward safely, independently, powerfully, and beautifully.

That being said, if you find that something is missing in the lecture notes, by all means contact the lecturer: he will diligently add what is missing.

Goal

The goal of this lecture note is to establish a zone of comfort regarding functional programming in Gallina, tCPA’s resident programming language.

Resources

Three windows

Edit the accompanying file:

Under the assumption that (setq proof-three-window-mode-policy 'hybrid) is lying somewhere in one’s .emacs file , one should be editing this Gallina file with 3 windows:

  • on the left, the window corresponding to one’s .v file;
  • on the top right, the goal window which one will use when proving things; and
  • on the bottom right, the response window that one will use when programming and when reading error messages.

We can split the left window any way we like (C-x 2 for example, to have a multiple view of our .v file, and then C-x 0 to eliminate the current window; or C-x 1 to edit our file with the full screen). At any point, typing C-c C-l brings us back to the canonical three windows.

Warning

Occasionally, Proof General might appear as it is confused. Then one should go back to the theorem or lemma one is proving, and continue from there.

TCPA’s toplevel loop

One interacts with tCPA through a series of toplevel forms in the file that is edited in the left window. Each of these toplevel forms is ended with a period:

<toplevel-forms> ::= {<toplevel-form>.}*

<toplevel-form> ::= Compute <expression>
                  | Check <expression>
                  | <definition>

<definition> ::= Definition <identifier> { : <type> }? := <expression>
               | Inductive <identifier> { : <type> }? := ...

<expression> ::= <natural-number>
               | <boolean>
               | <identifier>
               | <expression> <expression>
               | fun <identifier> { : <type> }? => <expression>
               | (<expression>, <expression>)
               | ...

<natural-number> ::= 0 | 1 | 2 | ...

<boolean> ::= true | false

Like OCaml, Gallina provides syntactic sugar in the form of infix notations for arithmetic operations. So for example,

  • for all expressions e1 and e2, the expression e1 + e2 is syntactic sugar for the nested application plus e1 e2, where plus denotes the addition function, which is implemented as fun n1 : nat => fun n2 : nat => ...;
  • likewise, for all expressions e1 and e2, the nested application minus e1 e2 is syntactically sugared as e1 - e2 and the nested application mult e1 e2 is syntactically sugared as e1 * e2, etc.
Pablito: Syntactic sugar?
Bong-sun: It means “short-hand notation”.
Dana: For example, e1 + e2 and plus e1 e2 are parsed into the same abstract-syntax tree.
Alfrothul: So they really mean the same.
Halcyon: The term is due to Peter Landin.

The Check command

Check is given an argument and is used at the toplevel to ask tCPA about the type of this argument.

For example, let us issue the command:

Check true.

The effect of this command is that the *response* window now contains:

true
     : bool

This contents manifests that true has type bool.

Likewise, one can check that 42 has type nat.

Anton: Actually, plus is an alias for Nat.add.
Pablito: How do you know?
Anton: I evaluated Check plus.
Pablito: Ah.
Anton (helpfully): So addition is implemented with a function add.
Pablito: And this function add is defined in the library Nat, I get it.

The Compute command

Compute is given an argument and is used at the toplevel to ask tCPA to compute this argument.

For example, let us issue the command:

Compute (123 + 321).

The effect of this command is that the *response* window now contains:

= 444
: nat

Likewise, one can compute 6 * 9, which actually yields 54 in base 10.

Halcyon: What is this? A hitch-hiker guide?

Example of inductive definition: natural numbers

For example, natural numbers are predefined in base 1 and represented as Peano numbers:

Inductive nat :=
  O : nat
| S : nat -> nat

where “predefined” means “already defined in tCPA’s arithmetic library”.

As in OCaml, in the declaration of an inductive data type, the first constructor may be preceded or not by |:

Inductive nat :=
| O : nat
| S : nat -> nat

The constructors are O (which is nullary) and S (which is unary). By default, natural numbers are read (desugared) and printed (sugared) as the ordinary arabic numbers in base 10. So for example, 0 (the digit) and O (the letter) can be used interchangeably.

Food for thought:

  • Evaluate Check 0. and Check O..

    How do the results of evaluation compare? Why?

  • Evaluate Check 1., Check (S 0)., and Check (S O)..

    How do the results of evaluation compare? Why?

  • Evaluate Check 2., Check (S (S 0))., and Check (S (S O))..

    How do the results of evaluation compare? Why?

Interlude

Alfrothul: So the grammar above can be made more precise.

Anton: Right. Instead of writing:

<natural-number> ::= 0 | 1 | 2 | ...

Alfrothul: ...we can write:

<natural-number> ::= O | S <natural-number>

Anton: Right. A self-referential but finite description of arbitrary large data.

Decomposition of inductive data

Inductive data are decomposed with a match expression that enumerates all the possible constructors and their arguments:

match <expression> with
  <constructor> {<identifier>}* => <expression>
| ...
| <constructor> {<identifier>}* => <expression>
end

Unlike in OCaml, match expressions are terminated with end.

And as in OCaml, in a match expression, the first constructor may be preceded or not by |:

match <expression> with
| <constructor> {<identifier>}* => <expression>
| ...
| <constructor> {<identifier>}* => <expression>
end

So for example, we can define a function that tests whether a given natural number is 0 or not:

Definition zerop_v0 : nat -> bool :=
  fun n =>
    match n with
      O =>
      true
    | S n' =>
      false
    end.

Definition zerop_v1 (n : nat) : bool :=
  match n with
  | O =>
    true
  | S n' =>
    false
  end.

In the S clause, n' is not used, so as in OCaml, we can not declare it and put an underscore instead:

Definition zerop_v2 (n : nat) : bool :=
  match n with
    O =>
    true
  | S _ =>
    false
  end.

Note how in the match-expression, the clauses are in the same order as in the type declaration of nat. Using the same ordering is considered good style.

Recursive definitions

Inductive data are processed by recursive functions:

<definition> ::= Fixpoint <identifier> {({<identifier>}+ { : <type> }?)}+ := <expression>

Example of recursive definition: the addition function

For example, let us consider the addition function over natural numbers, based on the following inductive specification of addition. Given an integer j,

  • base case: adding 0 to j yields j;
  • induction step: given a number i' such that adding it to j yields ih (which is the induction hypothesis), adding S i' to j should yield S ih.

Here is a direct implementation of this inductive specification. This implementation is recursive and the recursive call implements the induction hypothesis:

Definition add_v1 (n j : nat) : nat :=
  let fix visit i :=
    match i with
      O =>
      j
    | S i' =>
      S (visit i')
    end
  in visit n.

The definition of add_v1 is lambda dropped in that it uses a lexical block where visit is declared. This lexical block is used to declare visit. This local declaration is scope sensitive because the definiens uses j, which is declared outside this block.

The definition of add_v1 is not recursive because it contains no call to add_v1. The local definition of visit is recursive because of the recursive call to visit in the conditional branch for positive numbers. So add_v1 is declared with Definition and visit is declared with let fix.

Alternatively, we can implement the addition function with the following lambda-lifted recursive function:

Fixpoint add_v2 (i j : nat) : nat :=
  match i with
    O =>
    j
  | S i' =>
    S (add_v2 i' j)
  end.

The definition of add_v2 is recursive because of the recursive call to add_v2 in the conditional branch for positive numbers. So add_v2 is declared with Fixpoint.

The definition of add_v2 is lambda lifted in that it is a toplevel recursive definition. This definition is scope insensitive in that all the variables it uses are declared as their formal parameters.

In OCaml, it is a good idea to “lambda-drop” one’s programs, i.e., to define them with lexical blocks, as in the definition of add_v1 above, so that the definition of visit is local (i.e., invisible outside the definition of add_v1). In Gallina, it is a better idea to “lambda-lift” one’s programs, i.e., only have toplevel recursive functions, as in the definition of add_v2 above, so that we can refer to their name when reasoning about them.

We say that add_v2 is the “lambda-lifted” counterpart of add_v1 and conversely that add_v1 is the “lambda-dropped” counterpart of add_v2.

Here is another implementation of the addition function:

Fixpoint add_v3 (i j : nat) : nat :=
  match i with
    O =>
    j
  | S i' =>
    add_v3 i' (S j)
  end.

This definition is recursive because of the recursive call to add_v3. It is lambda lifted in that it is a toplevel recursive definition. It is also tail recursive because the recursive call to add_v3 is a tail call.

The .v file that accompanies the present lecture note also features a unit-test function for addition.

Interlude

Na-bong: Question.

Mimer: Yes, Na-bong?

Na-bong: How do we know that these three definitions implement the inductive specification of addition?

Dana: Right. And how do we know these definition are equivalent?

Mimer: Excellent questions! In previous courses, these are things that we tested. And here, we are going to prove them. Stay tuned.

Exercise 01

Go through the standard functions over natural numbers in the accompanying .vs file and program them in Gallina, including unit tests: addition, multiplication, exponentiation, factorial function, Fibonacci function, and evenness / oddness function using single (instead of mutual) recursion. If there is an opportunity to use accumulators, go for it and write an extra definition.

The goal of this exercise is to become familiar with the notation and use of Gallina, so the more definitions you write, the better.

Things to keep in mind:

  • Each (recursive) definition should implement a(n inductive) specification.
  • Since natural numbers have two constructors (namely O and S), the match-expressions in your recursive definitions should have two clauses (one for O and one for S), and only two clauses.

Partial solution for Exercise 01: the multiplication function

Na-bong: Let’s see. Multiplication is iterated addition, right?

Pablito: Iterated addition?

Na-bong: You know, three times four is four plus four plus four, where we add four three times.

Pablito: True that.

Alfrothul: Dibs on the unit-test function:

Definition test_mult (candidate: nat -> nat -> nat) : bool :=
  (Nat.eqb (candidate 0 0) 0)
  &&
  (Nat.eqb (candidate 0 10) 0)
  &&
  (Nat.eqb (candidate 10 0) 0)
  &&
  (Nat.eqb (candidate 1 10) 10)
  &&
  (Nat.eqb (candidate 10 1) 10)
  &&
  (Nat.eqb (candidate 2 3) 6)
  (* etc. *)
  .

Anton: OK, an intuitive sample of numbers.

Na-bool: Let me also add the intuitive characterization:

Definition test_mult (candidate: nat -> nat -> nat) : bool :=
  ...
  &&
  (Nat.eqb (candidate 3 4) (4 + 4 + 4))
  (* etc. *)
  .

Alfrothul: Let me add a test for commutativity:

Definition test_mult (candidate: nat -> nat -> nat) : bool :=
  ...
  &&
  (let x := 3
   in let y := 4
      in Nat.eqb (candidate x y) (candidate y x))
  (* etc. *)
  .

Anton: And another for associativity:

Definition test_mult (candidate: nat -> nat -> nat) : bool :=
  ...
  &&
  (let x := 3 in
   let y := 4 in
   let z := 5 in
   Nat.eqb (candidate x (candidate y z))
           (candidate (candidate x y) z))
  (* etc. *)
  .

Halcyon: Er... I notice that you guys do not write nested let-expressions consistently?

Alfrothul: Right. I prefer to vertically align let and in so that the nested lexical blocks appear more clearly, look:

let x := 3
in +------------+
   | let y := 4 |
   | in +-----+ |
   |    | ... | |
   |    +-----+ |
   +------------+

Anton: And I prefer to vertically align the keyword let:

let x := 3 in
let y := 4 in
let z := 5 in
...

Anton: Most of the time it doesn’t matter, and that is the case here since the definiens 4 does not refer to x and since the definiens 5 does not refer to x nor y. In OCaml, I would write:

let x = 3 and y = 4 and z = 5 in
...

Alfrothul: Whereas I would write:

let x = 3 and y = 4 and z = 5
in ...

Alfrothul: ...which admittedly is more pedantic. Either that or:

let x = 3
and y = 4
and z = 5
in ...

Alfrothul: ...which is also pedantic.

Dana: How about we test your unit-test function with the resident multiplication?

Halcyon: Let’s:

Compute (test_mult mult).

Pablito (helpfully): The *response* buffer reads:

= true
: bool

Halcyon: Good.

Bong-sun: Ah, I notice that mult is an alias for Nat.mul.

Dana: Or Init.Nat.mul, depending on your version of tCPA.

Halcyon: Er... You do?

Dana: We just evaluated Check mult..

Alfrothul: Now let’s specify multiplication inductively, in the manner of the inductive specification of addition just before. Given an integer j,

  • base case: multiplying 0 and j yields 0;
  • induction step: given a number i’ such that multiplying it and j yields ih (which is the induction hypothesis), multiplying S i’ and j should yield j + ih.

Na-bong: Aha. I notice that the intuitive sample of numbers includes an instance of the base case?

Anton: Good point. Let me add an instance of the induction step:

Definition test_mult (candidate: nat -> nat -> nat) : bool :=
  ...
  &&
  (let j := 10 in
   let i' := 3 in
   Nat.eqb (candidate (S i') j) (j + candidate i' j))
  (* etc. *)
  .

Na-bong: And let me expand the intuitive characterization:

Definition test_mult (candidate: nat -> nat -> nat) : bool :=
  ...
  &&
  (Nat.eqb (candidate 3 4) (4 + (4 + (4 + 0))))
  (* etc. *)
  .

Anton: Actually, we could also test that multiplication distributes over addition:

Definition test_mult (candidate: nat -> nat -> nat) : bool :=
  ...
  &&
  (let x := 3 in
   let y := 4 in
   let z := 5 in
   Nat.eqb (candidate x (y + z))
           (candidate x y + candidate x z))
  (* etc. *)
  .

Alfrothul (concentrating): Now for implementing the specification... It is parameterized with the second argument of the multiplication, which suggests a lambda-dropped implementation:

Definition mult_v1 (n j : nat) : nat :=
  let fix visit i :=
    match i with
      O =>
      0
    | S i' =>
      let ih := visit i' (* <- the recursive call implements the induction hypothesis *)
      in j + ih
    end
  in visit n.

Dana: Right – visit is declared in a new lexical block and j is declared outside this block. So this block is scope-sensitive.

Bong-sun: And visit is recursive to account for the inductive specification.

Halcyon: Let’s check whether mult_v1 passes the unit tests:

Compute (test_mult mult_v1).

Pablito (helpfully): The *response* buffer reads:

= true
: bool

Halcyon: Good.

Anton: Do we need to be pedantic here?

Alfrothul (tolerantly): Maybe not – there is no reason to name the result of the recursive call:

Definition mult_v1' (n j : nat) : nat :=
  let fix visit i :=
    match i with
      O =>
      0
    | S i' =>
      j + visit i'
    end
  in visit n.

Halcyon: Let’s check whether mult_v1' passes the unit tests:

Compute (test_mult mult_v1').

Pablito (helpfully): The *response* buffer reads:

= true
: bool

Halcyon: Good.

Dana: Here is a lambda-lifted implementation:

Fixpoint mult_v2 (i j : nat) : nat :=
  match i with
    O =>
    0
  | S i' =>
    j + mult_v2 i' j
  end.

Bong-sun: Right – there is no block structure, just a toplevel recursive definition.

Halcyon: Let’s check whether mult_v2' passes the unit tests:

Compute (test_mult mult_v2).

Pablito (helpfully): The *response* buffer reads:

= true
: bool

Halcyon: Good.

Alfrothul: And here is a lambda-dropped implementation that is tail recursive with an accumulator:

Definition mult_v3 (n j : nat) : nat :=
  let fix visit i a :=
    match i with
      O =>
      a
    | S i' =>
      visit i' (j + a)
    end
  in visit n 0.

Dana: Right – the addition happens at tail-call time instead of at return time.

Halcyon: Let’s check whether mult_v3 passes the unit tests:

Compute (test_mult mult_v3).

Pablito (helpfully): The *response* buffer reads:

= true
: bool

Halcyon: Good.

Na-bong: How about we lambda-lift mult_v3?

Alfrothul: Be my guest.

Na-bong: Let’s see – visit is scope-sensitive because uses a variable – namely j – that is declared outside its block, so let’s parameterize it with this variable:

Definition mult_v3' (n j : nat) : nat :=
  let fix visit i j a :=
    match i with
      O =>
      a
    | S i' =>
      visit i' j (j + a)
    end
  in visit n j 0.

Halcyon: Ahem. Let’s check whether mult_v3' passes the unit tests:

Compute (test_mult mult_v3').

Pablito (helpfully): The *response* buffer reads:

= true
: bool

Halcyon: Good.

Na-bong: Now the block is scope-insensitive, so let’s lift it into a toplevel recursive definition. To manifest that it comes from mult_v3', which we are about to rename mult_v4, let me name it mult_v4_aux:

Fixpoint mult_v4_aux (i j a : nat) : nat :=
  match i with
    O =>
    a
  | S i' =>
    mult_v4_aux i' j (j + a)
  end.

Na-bong (ending): And now the body of mult_v4 is the body of the let fix block, it just tail-calls the auxiliary function:

Definition mult_v4 (n j : nat) : nat :=
  mult_v4_aux n j 0.

Halcyon: Let’s check whether mult_v4 passes the unit tests:

Compute (test_mult mult_v4).

Pablito (helpfully): The *response* buffer reads:

= true
: bool

Halcyon: Good.

Example of inductive definition: binary trees of natural numbers

For example, binary trees of natural numbers with payloads in their leaves are declared as follows:

Inductive binary_tree_nat : Type :=
  Leaf_nat : nat -> binary_tree_nat
| Node_nat : binary_tree_nat -> binary_tree_nat -> binary_tree_nat.

The constructors are Leaf_nat (which is unary) and Node_nat (which is binary).

Example of recursive definition: binary trees of natural numbers

For example, here is how to compute the number of leaves in a given binary tree:

Fixpoint number_of_leaves_v1 (t : binary_tree_nat) : nat :=
  match t with
    Leaf_nat n =>
    1
  | Node_nat t1 t2 =>
    (number_of_leaves_v1 t1) + (number_of_leaves_v1 t2)
  end.

Exercise 02

Go through the standard functions over binary trees in the accompanying .vs file and program them in Gallina, including unit tests: number of nodes, smallest leaf (using <? to compare two natural numbers), weight, height, well-balancedness of a mobile, and mirror. If there is an opportunity to use accumulators, go for it and write an extra definition.

The goal of this exercise is to become familiar with the notation and use of Gallina, so the more definitions you write, the better.

Incidentally, the option type in Gallina is the same as in OCaml in that it features the same constructors Some and None.

Exercise 03

The previous declaration of binary trees of natural numbers had payloads in the leaves. Binary trees of natural numbers with payloads in the nodes are declared as follows:

Inductive binary_tree_nat' : Type :=
  Leaf_nat' : binary_tree_nat'
| Node_nat' : binary_tree_nat' -> nat -> binary_tree_nat' -> binary_tree_nat'.

Implement a function that counts the number of leaves and another that counts the number of nodes in these binary trees.

Exercise 04

Declare a type of binary trees of natural numbers with payloads both in the leaves and in the nodes, and implement a function that counts the number of leaves and another that counts the number of nodes in these binary trees.

Associativity

To visualize whether, for example, + associates implicitly to the left or to the right, check out, e.g., the following pair:

Check (1 + (2 + 3), (1 + 2) + 3).

The *response* window then contains:

(1 + (2 + 3), 1 + 2 + 3)
     : nat * nat

The parentheses indicate that + associates implicitly to the left.

Food for thought

Tuples of course are available. For example, evaluating Check (1, 2, 3). yields:

(1, 2, 3)
     : nat * nat * nat

Internally, tuples are represented as nested pairs, as can be visualized by evaluating Check (1, (2, 3)). and Check ((1, 2), 3).:

  • evaluating Check (1, (2, 3)). yields (1, (2, 3)) : nat * (nat * nat), and
  • evaluating Check ((1, 2), 3). yields (1, 2, 3) : nat * nat * nat,

indicating that tuples are represented as nested pairs that are linked to the left.

In case of confusion, one should simply add parentheses in one’s programs and proofs.

Resources

Version

Added the partial solution for Exercise 01 [20 Jan 2024]

Added the Gallina code for the present lecture note after the live-coding session in class [19 Jan 2024]

Updated the grammar and the narrative [19 Jan 2024]

Created [18 Jan 2024]