The goal of this lecture note is to introduce polymorphic types in OCaml.
OCaml is a statically typed language with (so far) the following grammar of types and expressions:
The corresponding typing rules for lambda-abstraction and applications read as follows:
| FUN | G, (x1 : t1) |- e2 : t2 | |
| G |- fun x1 -> e2 : t1 -> t2 | 
| APP | G |- e0 : t1 -> t2 | G |- e1 : t1 | |
| G |- e0 e1 : t2 | |||
In words:
Assuming typing rules such as
| NOT | G |- e : bool | |
| G |- not e : bool | 
| ADD | G |- e1 : int | G |- e2 : int | |
| G |- e1 + e2 : int | |||
terms such as fun x -> not x and fun x -> fun y -> x + y are monomorphically typed, i.e., their type is completely determined and their context of use is completely fixed. The former is a function expecting a Boolean and returning a Boolean, and the second is a function that expects an integer and returns a function that expects an integer and returns an integer. To wit:
# fun x -> not x;;
- : bool -> bool = <fun>
# fun x -> fun y -> x + y;;
- : int -> int -> int = <fun>
#
These types are composite (they use the type constructor ->) but their ground instances of <type> are one of the atomic types, i.e., int, bool, char, string, or unit:
<type> -->
<type> -> <type> -->    (* the left instance of <type> is the ground type bool *)
bool -> <type> -->      (* the remaining instance of <type> is the ground type bool *)
bool -> bool
<type> -->
<type> -> <type> -->           (* the left instance of <type> is the ground type int *)
int -> <type> -->
int -> <type> -> <type> -->    (* the left instance of <type> is the ground type int *)
int -> int -> <type> -->       (* the remaining instance of <type> is the ground type int *)
int -> int -> int
Given each expression, OCaml has inferred its type. Let us verify that each of these expressions has the type inferred by OCaml, i.e., let us type check these expressions by contructing the corresponding proof tree:
We start at the bottom of the proof tree, stating that in any type environment G, the expression fun x -> not x has type bool -> bool:
G |- fun x -> not x : bool -> bool
There is only one typing rule we can apply, namely FUN, where G is instantiated with G, x is instantiated with x, e is instantiated with not x, t1 is instantiated with bool, and t2 is instantiated with bool:
    (x : bool), G |- not x : bool
FUN ----------------------------------
    G |- fun x -> not x : bool -> bool
There is only one typing rule we can apply, namely NOT, where G is instantiated with G and e is instantiated with x:
    (x : bool), G |- x : bool
NOT -------------------------
    (x : bool), G |- not x : bool
FUN ----------------------------------
    G |- fun x -> not x : bool -> bool
There is only one typing rule we can apply, namely LOOKUP_FOUND, where x is instantiated with x, t is instantiated with bool, G is instantiated with G, x is instantiated with x, and t is instantiated with bool:
LOOKUP_FOUND -------------------------
             (x : bool), G |- x : bool
         NOT -------------------------
             (x : bool), G |- not x : bool
         FUN ----------------------------------
             G |- fun x -> not x : bool -> bool
The proof tree is complete, which proves that in any type environment G, the expression fun x -> not x has type bool -> bool.
We start at the bottom of the proof tree, stating that in any type environment G, the expression fun x -> fun y -> x + y has type int -> int -> int:
G |- fun x -> fun y -> x + y : int -> int -> int
There is only one typing rule we can apply, namely FUN, where G is instantiated with the current type environment, i.e., G, x is instantiated with the formal parameter of the current function abstraction, i.e., x, e is instantiated with the body of the current function abstraction, i.e., fun y -> x + y, t1 is instantiated with the domain of the current type, i.e., int, and t2 is instantiated with the codomain of the current type, i.e., int -> int (since function types implicitly associate to the right):
    (x : int), G |- fun y -> x + y : int -> int
FUN ------------------------------------------------
    G |- fun x -> fun y -> x + y : int -> int -> int
Again, there is only one typing rule we can apply, namely FUN, where G is instantiated with the current type environment, i.e., G, x is instantiated with the formal parameter of the current function abstraction, i.e., y, e is instantiated with the body of the current function abstraction, i.e., x + y, t1 is instantiated with the domain of the current type, i.e., int, and t2 is instantiated with the codomain of the current type, i.e., int:
    (y : int), (x : int), G |- x + y : int
FUN --------------------------------------
    (x : int), G |- fun y -> x + y : int -> int
FUN ------------------------------------------------
    G |- fun x -> fun y -> x + y : int -> int -> int
There is only one typing rule we can apply, namely ADD, where G is instantiated with the current type environment, i.e., (y : int), (x : int), G, e1 is instantiated with the first operand in the current expression, i.e., x, and e2 is instantiated with the second operand in the current expression, i.e., y:
    (y : int), (x : int), G |- x : int    (y : int), (x : int), G |- y : int
ADD ------------------------------------------------------------------------
    (y : int), (x : int), G |- x + y : int
FUN --------------------------------------
    (x : int), G |- fun y -> x + y : int -> int
FUN ------------------------------------------------
    G |- fun x -> fun y -> x + y : int -> int -> int
Considering the left branch, there is only one typing rule we can apply, namely LOOKUP_NOT_FOUND_YET since x and y are distinct, where x' is instantiated with the first identifier in the current environment, i.e., y, t' is instantiated with the type of the first identifier in the current environment, i.e., int, G is instantiated with the rest of the type environment, i.e., (x : int), G, x is instantiated with the identifier to look up, i.e., x, and t is instantiated with the current type, i.e., int:
                     (x : int), G |- x : int
LOOKUP_NOT_FOUND_YET ----------------------------------
                     (y : int), (x : int), G |- x : int    (y : int), (x : int), G |- y : int
                 ADD ------------------------------------------------------------------------
                     (y : int), (x : int), G |- x + y : int
                 FUN --------------------------------------
                     (x : int), G |- fun y -> x + y : int -> int
                 FUN ------------------------------------------------
                     G |- fun x -> fun y -> x + y : int -> int -> int
Still considering the left branch, there is only one typing rule we can apply, namely LOOKUP_FOUND since the variable whose binding starts the environment is the one we are looking for. So LOOKUP_FOUND is applied, where x is instantiated with the first identifier in the current environment, i.e., x, t is instantiated with the type of the first identifier in the current environment, i.e., int, G is instantiated with the rest of the type environment, i.e., G, x is instantiated with the identifier to look up, i.e., x, and t is instantiated with the current type, i.e., int:
        LOOKUP_FOUND -----------------------
                     (x : int), G |- x : int
LOOKUP_NOT_FOUND_YET ----------------------------------
                     (y : int), (x : int), G |- x : int    (y : int), (x : int), G |- y : int
                 ADD ------------------------------------------------------------------------
                     (y : int), (x : int), G |- x + y : int
                 FUN --------------------------------------
                     (x : int), G |- fun y -> x + y : int -> int
                 FUN ------------------------------------------------
                     G |- fun x -> fun y -> x + y : int -> int -> int
Considering the right branch, there is only one typing rule we can apply, namely LOOKUP_FOUND, where x is instantiated with the first identifier in the current environment, i.e., y, t is instantiated with the type of the first identifier in the current environment, i.e., int, G is instantiated with the rest of the type environment, i.e., (x : int), G, x is instantiated with the identifier to look up, i.e., y, and t is instantiated with the current type, i.e., int:
        LOOKUP_FOUND -----------------------
                     (x : int), G |- x : int
LOOKUP_NOT_FOUND_YET ----------------------------------    LOOKUP_FOUND ----------------------------------
                     (y : int), (x : int), G |- x : int                 (y : int), (x : int), G |- y : int
                 ADD -------------------------------------------------------------------------------------
                     (y : int), (x : int), G |- x + y : int
                 FUN --------------------------------------
                     (x : int), G |- fun y -> x + y : int -> int
                 FUN ------------------------------------------------
                     G |- fun x -> fun y -> x + y : int -> int -> int
The proof tree is complete, which proves that in any type environment G, the expression fun x -> fun y -> x + y has type int -> int -> int.
Some expressions, however, have a type that is not completely determined, and to this end, the grammar of types is extended with type variables:
(N.B.: 'a is pronounced “alpha”, 'b is pronounced “beta”, etc.)
A type containing variables is said to be polymorphic: it adapts to its context of use. For example, the identity function is polymorphic:
# fun x -> x;;
- : 'a -> 'a = <fun>
#
The identity function can be variously applied to values of different types:
# (fun x -> x) 33;;
- : int = 33
# (fun x -> x) true;;
- : bool = true
# (fun x -> x) 'a';;
- : char = 'a'
# (fun x -> x) "abc";;
- : string = "abc"
# (fun x -> x) ((33, true), ('a', "abc"));;
- : (int * bool) * (char * string) = ((33, true), ('a', "abc"))
# let identity = fun x -> x;;
val identity : 'a -> 'a = <fun>
# (identity 33, identity "abc");;
- : int * string = (33, "abc")
#
As pointed out in the section about Ground equality, OCaml’s equality predicate is polymorphic:
# (=);;
- : 'a -> 'a -> bool = <fun>
# 1 = 1;;
- : bool = true
# 1 = 2;;
- : bool = false
# true = true;;
- : bool = true
# 'a' = 'b';;
- : bool = false
# "hello" = "world";;
- : bool = false
#
Not only that, but this equality predicate is structural and so it scales to, e.g., tuples:
# (1, ('2', "3")) = (1, ('2', "3"));;
- : bool = true
#
It however does not scale to functions: functions cannot be tested for structural equality. The only thing that can be done with a function is to apply it, and so we can only test their extensional equality: two functions are extensionally equal whenever applying them to the same input yields the same output, for any input.
Vigfus: And there we go again with self-reference.
Halcyon: Self-reference?
Vigfus: Well, yes, we are defining the equality of functions based on the equality of their arguments and the equality of their result.
Halcyon: That indeed looks like a circular argument.
Loki: The problem is only apparent. If two objects are equal, then they are the same as each other. Hence the self-reference.
Brynja: Loki, you must love “Smoke on the water”.
Loki: Well, its opening guitar riff is awesome.
Brynja: But the argument is not circular.
Halcyon: Not circular?
Brynja: The equality at type t1 -> t2 is defined in terms of the equality at type t1 and of the equality at type t2, which are smaller types. And since types are constructed inductively, the argument may repeat if t1 and t2 are compound types (for example, two values of type t3 * t4, i.e., two pairs, are equal whenever their first projections (of type t3) are equal and their second projections (of type t4) are equal), but eventually the argument will stop at a ground type, e.g., int, bool, or unit.
Halcyon: OK, not circular.
Vigfus: But not computable either.
Halcyon: Not computable?
Vigfus: Testing whether two functions are extensionally equal is just that – testing – and we know that testing is incomplete in general.
Mimer: You got it, Vigfus.
Harald: So what is the type of fun x1 -> fun x2 -> x1?
Alfrothul: Easy peasy, and it is a polymorphic function:
# fun x -> fun y -> x;;
- : 'a -> 'b -> 'a = <fun>
#
Brynja: That is easy to prove too, witness the following typing-derivation tree:
        LOOKUP_FOUND ---------------------
                     (x : 'a), G |- x : 'a
LOOKUP_NOT_FOUND_YET -------------------------------
                     (y : 'b), (x : 'a), G |- x : 'a
                 FUN -------------------------------
                     (x : 'a), G |- fun y -> x : 'b -> 'a
                 FUN -----------------------------------------
                     G |- fun x -> fun y -> x : 'a -> 'b -> 'a
Harald: Hum. Is there an expression that has type 'a -> 'b -> 'b?
Alfrothul: I guess we could try constructing a typing-derivation tree?
Brynja: Let’s start with the mystery expression ?:
G |- ? : 'a -> 'b -> 'b
Alfrothul: Well, there is only one typing rule we can apply, namely FUN, and that forces the mystery expression ? to be fun x -> ??, where ?? is a new mystery expression:
    (x : 'a), G |- ?? : 'b -> 'b
FUN ---------------------------------
    G |- fun x -> ?? : 'a -> 'b -> 'b
Harald: Again, there is still only one typing rule we can apply, namely FUN, and that forces the new mystery expression ?? to be fun y -> ???, where ??? is another mystery expression:
    (y : 'b), (x : 'a), G |- ??? : 'b
FUN --------------------------------------
    (x : 'a), G |- fun y -> ??? : 'b -> 'b
FUN -------------------------------------------
    G |- fun x -> fun y -> ??? : 'a -> 'b -> 'b
Brynja: LOOKUP_FOUND comes to the rescue, which forces the other mystery expression ??? to be y:
LOOKUP_FOUND -------------------------------
             (y : 'b), (x : 'a), G |- y : 'b
         FUN ------------------------------------
             (x : 'a), G |- fun y -> y : 'b -> 'b
         FUN -----------------------------------------
             G |- fun x -> fun y -> y : 'a -> 'b -> 'b
Harald: The proof tree is complete, which proves that in any type environment G, the expression fun x -> fun y -> y has type 'a -> 'b -> 'b.
Alfrothul: And so fun x -> fun y -> y is your mystery expression.
Brynja: Is it unique?
Harald: I guess not:
# fun x -> fun y -> (fun z -> z) x;;
- : 'a -> 'b -> 'a = <fun>
#
Harald: Remember our first unit-test function?
Alfrothul: Yes... To simplify, it was something like:
let test_successor candidate =
  (candidate 0 = 1) && (candidate 1 = 2);;
Harald: Well, this function is applied to another function.
Alfrothul: That is indeed true:
# test_successor;;
- : (int -> int) -> bool = <fun>
#
Brynja: Reading the function type as logical implication, I think we should be able to write a function whose type is the modus ponens.
Harald: You mean something like ('a -> 'b) -> 'a -> 'b?
Brynja: Yup. The bottom part of the typing-derivation tree is easy enough to draw:
    (x : 'a), (f : 'a -> 'b), G |- ? : 'b
FUN ------------------------------------------
    (f : 'a -> 'b), G |- fun x -> ? : 'a -> 'b
FUN -------------------------------------------------
    G |- fun f -> fun x -> ? : ('a -> 'b) -> 'a -> 'b
Alfrothul: To obtain an expression of type 'b, we could apply f to something that has type 'a.
Harald: x comes handy:
    (x : 'a), (f : 'a -> 'b), G |- f : 'a -> 'b    (x : 'a), (f : 'a -> 'b), G |- x : 'a
APP ------------------------------------------------------------------------------------
    (x : 'a), (f : 'a -> 'b), G |- f x : 'b
FUN --------------------------------------------
    (f : 'a -> 'b), G |- fun x -> f x : 'a -> 'b
FUN ---------------------------------------------------
    G |- fun f -> fun x -> f x : ('a -> 'b) -> 'a -> 'b
Brynja: And now all we need to do is to look up f and x in the type environment:
        LOOKUP_FOUND ---------------------------------
                     (f : 'a -> 'b), G |- f : 'a -> 'b
LOOKUP_NOT_FOUND_YET -------------------------------------------    LOOKUP_FOUND -------------------------------------
                     (x : 'a), (f : 'a -> 'b), G |- f : 'a -> 'b                 (x : 'a), (f : 'a -> 'b), G |- x : 'a
    APP --------------------------------------------------------------------------------------------------------------
        (x : 'a), (f : 'a -> 'b), G |- f x : 'b
    FUN --------------------------------------------
        (f : 'a -> 'b), G |- fun x -> f x : 'a -> 'b
    FUN ---------------------------------------------------
        G |- fun f -> fun x -> f x : ('a -> 'b) -> 'a -> 'b
Harald: This expression is not unique, we can apply the identity function in it as much as we like.
Brynja: So the function whose type is the modus ponens expects a function and returns a function that expects an argument and applies this function to this argument.
Alfrothul: So it’s an apply function, right? To give it a name:
let apply f x =
 (* apply : ('a -> 'b) -> 'a -> 'b *)
  f x;;
Mimer: Right. This apply function is the computational content of the modus ponens.
Reading the function type as logical implication and pairing as logical conjunction, do these types have a logical content?
Halcyon: I remember my first encounter with polymorphism.
Vigfus: Pray tell.
Halcyon: My Dad asked me how I was doing at school and I told him I was struggling with additions.
Vigfus: OK?
Halcyon: So he tried to help and asked me that if I had 2 oranges and he gave me one more, how many oranges would I have.
Vigfus: And?
Halcyon: And I told him I didn’t know, because at school we had only learned how to add apples.
Vigfus (suspicious): Practicing Dad jokes, are we.
Added the postlude [16 Mar 2020]
Added the interlude about the extensional equality of functions [07 Feb 2020]
Created [06 Feb 2020]