The goal of this lecture note is to describe how to declare new types in OCaml.
OCaml also lets us declare new types for our data, be it
This expressive power is accounted for by the following refined BNF:
Let us study each of these constructs in turn.
Say that we want to define a type to represent names, and that concretely this representation is that of strings. We could keep it in mind and mentally read “name” whenever we come across a string. However, that is not convenient since not all strings represent names. So instead, we can alias strings with the following type declaration:
type name = string;;
This declaration puts us in position to use name as a type. It extends the current type environment with a new binding, just like a let-expression extends the current value environment with a new binding.
For convenience, we can alias the type of Boolean transformers:
type boolean_transformer = bool -> bool;;
Likewise, say that we need to associate names and integers by pairing them:
name * int
For convenience, we can declare this assocation as such:
type association = name * int;;
type association_list = association list;;
More complex examples can be found in Exercise 05 of Week 04:
type triple_of_int_and_char_and_bool = int * char * bool;;
type pair_of_pair_of_int_and_bool_and_of_triple_of_char_and_string_and_char = (int * bool) * (char * string * char);;
To alias a polymorphic type, one prefixes its type declaration with the corresponding type variables. For example:
type 'a singleton = 'a;;
type ('a, 'b) pair = 'a * 'b;;
type ('a, 'b, 'c) triple = 'a * 'b * 'c;;
type 'a association_list = (name * 'a) list;;
Anton: Association lists, eh?Alfrothul: They seem strangely familiar.Halcyon (frowning): As in – déjà vu.Mimer (not leading at all): The environment in a typing judgment, perhaps?Alfrothul (getting it): And in the evaluation mechanism of OCaml.Loki (casually): Polymorphism at work. Again.Anton: The empty list represents the empty environment.Alfrothul: Bindings are represented as associations.Anton: And an environment is extended by consing a new association.Alfrothul: On the left.
Aliasing types can clarify things but OCaml, left to itself, does not use our aliases unless we tell it to. To wit, the type of the swap function, fun (x, y) -> (y, x) is 'a * 'b -> 'b * 'a, not ('a, 'b) pair -> ('b, 'a) pair:
# fun (x, y) -> (y, x);;
- : 'a * 'b -> 'b * 'a = <fun>
#
As for telling OCaml, we can do that by specifying a type (say, t) next to each formal (say, x):
(x : t)
For example:
let pair_swap : ('a, 'b) pair -> ('b, 'a) pair =
fun (a, b) -> (b, a);;
Or alternatively:
let pair_swap ((a, b) : ('a, 'b) pair) : ('b, 'a) pair =
(b, a);;
Likewise:
let neg : boolean_transformer =
fun b -> if b then false else true;;
To tell OCaml which type we mean for our data types, we declare them so that they have data constructors:
type 'a singleton = Singleton of 'a;;
type ('a, 'b) pair = Pair of 'a * 'b;;
type ('a, 'b, 'c) triple = Triple of 'a * 'b * 'c;;
These three declarations have the effect of extending the BNF of types with three new types, and the language of terms with three new constructors:
SINGLETON | G |- e : t | |
G |- Singleton e : t singleton |
PAIR | G |- e1 : t1 | G |- e2 : t2 | |
G |- Pair (e1, e2) : (t1, t2) pair |
TRIPLE | G |- e1 : t1 | G |- e2 : t2 | G |- e3 : t3 | |
G |- Triple (e1, e2, e3) : (t1, t2, t3) triple |
The three type constructors Singleton, Pair, and Triple can now be used:
# Singleton 10;;
- : int singleton = Singleton 10
# Pair (true, "hello");;
- : (bool, string) pair = Pair (true, "hello")
#
And now the swap function can have a meaningful type that involves pairs without any explicit type specification:
# fun (Pair (a, b)) -> Pair (b, a);;
- : ('a, 'b) pair -> ('b, 'a) pair = <fun>
#
Analysis:
The swap function expects a pair constructed using Pair:
# let swap (Pair (a, b)) = Pair (b, a);;
val swap : ('a, 'b) pair -> ('b, 'a) pair = <fun>
# swap (Pair (1, true));;
- : (bool, int) pair = Pair (true, 1)
#
Booleans, being either true or false, provide a prototypical example of a sum type. OCaml being implemented in OCaml, the type for Booleans is initially defined as follows:
type boolean = True | False;;
This declaration extends the current type environment with a new binding and the language with two new data constructors: the type boolean now exists, together with its two constructors, True and False.
Notes:
in actuality, boolean is called bool and its two constructors are literals in OCaml (namely the lexical units true and false);
by lexical convention, types are written in lower case and data constructors are capitalized; and
in the definition of a sum type, the vertical bar before the first constructor is optional. So one could also write:
type boolean =
| True
| False;;
To decide which summand (i.e., which of True or False) one is given, one uses a match-expression. For example, here is an unparsing function for values of type boolean:
let show_boolean b =
(* show_boolean : boolean -> string *)
match b with
| True -> "True"
| False -> "False";;
And here are conversion functions between bool and boolean:
let bool_of_boolean b =
(* bool_of_boolean : boolean -> bool *)
match b with
| True -> true
| False -> false;;
let boolean_of_bool b =
(* boolean_of_bool : bool -> boolean *)
match b with
| true -> True
| false -> False;;
These two conversion functions are inverses of each other, which is another way to say that bool and boolean are isomorphic types.
The match-expression enumerates all the possible constructors, and selects which one it is being given:
# boolean_of_bool true;;
- : boolean = True
# boolean_of_bool false;;
- : boolean = False
# bool_of_boolean True;;
- : bool = true
# bool_of_boolean False;;
- : bool = false
#
Truth be told, a match-expression in OCaml is merely an abbreviation for an if-expression, so boolean_of_bool can be equivalently declared with an if-expression:
let boolean_of_bool' b =
(* boolean_of_bool' : bool -> boolean *)
if b
then True
else False;;
So for example, we can implement two versions of the negation function over values of type boolean – either directly, or indirectly by detouring via the bool type:
let test_boolean_not candidate =
(* test_boolean_not : (boolean -> boolean) -> bool *)
let b0 = (candidate True = False)
and b1 = (candidate False = True)
in b0 && b1;;
let direct_not b =
(* direct_not : boolean -> boolean *)
match b with
| True -> False
| False -> True;;
let indirect_not b =
(* indirect_not : boolean -> boolean *)
boolean_of_bool (not (bool_of_boolean b));;
Both versions pass the unit test:
# test_boolean_not direct_not;;
- : bool = true
# test_boolean_not indirect_not;;
- : bool = true
#
The idea of detouring is explored further in the mini-project about right-to-left linked lists.
In match-expressions, the vertical bar before the first constructor is optional. So each of the following definitions is syntactically correct:
let minimum i j =
(* minimum : int -> int -> bool *)
match i < j with
| true -> i (* with a vertical bar before true *)
| false -> j;;
let maximum i j =
(* maximum : int -> int -> bool *)
match i < j with
true -> j (* without a vertical bar before true *)
| false -> i;;
Anton (grumbling): How many more hairs are we going to split here?Loki (serene): The gods are in the details, Anton.
Often, when comparing two ordered values v1 and v2, we need to distinguish 3 cases:
To this end, we nest two conditional expressions, and the resulting layout is not very pleasing to the eye.
Instead, we can use OCaml’s predefined function compare that, given v1 and v2, returns -1 if v1 < v2, 0 if v1 = v2, or 1 if v1 > v2. Or we can define a sum type that accounts for this comparison:
type comparison = Lt | Eq | Gt;;
let make_comparison v1 v2 =
(* make_comparison : 'a -> 'a -> comparison *)
if v1 < v2 then Lt else if v1 = v2 then Eq else Gt;;
And now the program layout is more pleasing to the eye and therefore readable:
match make_comparison v1 v2 with
| Lt ->
...
| Eq ->
...
| Gt ->
...
Beware of the Turing tar-pitin which everything is possiblebut nothing of interest is easy.—Alan Perlis‘s programming epigram #54
Pablito: How about a concrete example of using this comparison type?
Anton: Good idea, Pablito, good idea. What do you have in mind?
Pablito: Well, nothing. That’s why I am asking for one.
Alfrothul: OK. Say that we want to compare the length of two given lists. If they have the same length, the result is Eq. If the first is shorter than the second, the result is Lt. And if the first is longer than the second, the result is Gt.
Pablito: So, like this:
let test_list_compare_length_int candidate =
let b0 = (candidate [] [] = Eq)
and b1 = (candidate [1] [10] = Eq)
and b2 = (candidate [2; 1] [20; 10] = Eq)
and b3 = (candidate [3; 2; 1] [30; 20; 10] = Eq)
and b4 = (candidate [4; 3; 2; 1] [40; 30; 20; 10] = Eq)
(* ***** *)
and b5a = (candidate [] [10] = Lt)
and b6a = (candidate [2] [20; 10] = Lt)
and b7a = (candidate [3; 2] [30; 20; 10] = Lt)
and b8a = (candidate [4; 3; 2] [40; 30; 20; 10] = Lt)
(* ***** *)
and b5b = (candidate [1] [] = Gt)
and b6b = (candidate [2; 1] [20] = Gt)
and b7b = (candidate [3; 2; 1] [30; 20] = Gt)
and b8b = (candidate [4; 3; 2; 1] [40; 30; 20] = Gt)
in b0 && b1 && b2 && b3 && b4 && b5a && b6a && b7a && b8a && b5b && b6b && b7b && b8b;;
Alfrothul: Yes, like this.
Pablito: So, like that:
let list_compare_length_v0 v1s v2s =
make_comparison (List.length v1s) (List.length v2s);;
Alfrothul: Yes, like that:
# test_list_compare_length_int list_compare_length_v0;;
- : bool = true
#
Pablito: OK.
Anton: In this implementation, both lists are completely traversed.
Pablito: Well, yes, to compute their length.
Anton: But there is no need for that. We could iteratively traverse them together, as in the solution for Exercise 39 in Week 09, and stop as soon as we reach the end of the shortest list. Look:
let list_compare_length_v1 v1s v2s =
let rec visit v1s v2s =
match v1s with
| [] ->
(match v2s with
| [] ->
Eq
| _ :: v2s' ->
Lt)
| _ :: v1s' ->
(match v2s with
| [] ->
Gt
| _ :: v2s' ->
visit v1s' v2s')
in visit v1s v2s;;
Pablito: Just a sec:
# test_list_compare_length_int list_compare_length_v1;;
- : bool = true
#
Pablito: OK. You were saying?
Anton: Well, in the base case, the first list is empty. If the second list is empty too, they have the same length. Otherwise, they don’t, and the first one is shorter.
Pablito: Ah, right, and the rest of the second list is not even traversed.
Anton. That’s the point. And in the induction step, the first list is not empty. If the second list is empty, the first one is longer.
Pablito: Right. And the rest of the first list is not traversed either.
Anton: Otherwise, we use the induction hypothesis on their tails.
So sum types are declared by enumerating its constructors. These constructors can be parameterized. For example, here is the sum type for integers and strings:
type integer_or_string = Integer of int | String of string;;
For example, here is an unparsing function for values of type integer_or_string:
let show_integer_or_string v =
(* show_integer_or_string : integer_or_string -> string *)
match v with
| Integer n -> "Integer " ^ show_int n
| String s -> "String " ^ show_string s;;
Likewise here are two predicates to test whether a value of type integer_or_string is an integer or a string:
let is_integer v =
(* is_integer : integer_or_string -> bool *)
match v with
| Integer n -> true
| String s -> false;;
let is_string v =
(* is_string : integer_or_string -> bool *)
match v with
| Integer n -> false
| String s -> true;;
Sum types can also be polymorphic. To take an example straight out of a French restaurant:
type ('a, 'b) cheese_or_dessert = Cheese of 'a | Dessert of 'b;;
Values of that type are vacuously polymorphic since the type of the other constructor is unspecified:
# Cheese "camembert";;
- : (string, 'a) cheese_or_dessert = Cheese "camembert"
# Dessert "Savarin";;
- : ('a, string) cheese_or_dessert = Dessert "Savarin"
# Dessert "Paris-Brest";;
- : ('a, string) cheese_or_dessert = Dessert "Paris-Brest"
# Cheese "Saint-Félicien";;
- : (string, 'a) cheese_or_dessert = Cheese "Saint-Félicien"
#
Pablito: What about cheesecake?Anton (prudent): The jury is still out on this one.Loki (humming): “You can’t take ev’rything you want...”Mimer (chorusing): ”...in any French restaurant.”
The option type (see Chapter The polymorphic option type) is predefined in OCaml, but still here its definition anyhow, priming the names option, Some and None in order not to shadow OCaml’s names:
type 'a option' = Some' of 'a | None';;
Applying the constructor Some' to a value of type t yields a value of type t option':
# Some' 33;;
- : int option' = Some' 33
# Some' true;;
- : bool option' = Some' true
# Some' "thing or other";;
- : string option' = Some' "thing or other"
# Some' (Some' (Some' "stuttering"));;
- : string option' option' option' = Some' (Some' (Some' "stuttering"))
#
Just as functions can be defined in a recursive fashion (i.e., in terms of themselves), types can be defined in a recursive fashion using self-reference, i.e., the name of the type we are declaring can be used on the right-hand side of the declaration:
Revisiting the section about A sound and complete representation of natural numbers in Week 02:
type nat = Z | S of nat;;
The resource file contains conversion functions from non-negative integers to nat and vice versa as well as a generator of random values of type nat and two unit-test functions verifying that mapping a random non-negative integer to a value of type nat and back yields this non-negative integer and that mapping a random value of type nat to an integer and back yields this value of type nat.
Revisiting the section about The onion, revisited in Week 02:
type onion_bulb = Bud | Layer of onion_bulb;;
As pointed out there, these two types are isomorphic because we can write conversion functions between each of them that are inverses of each other:
let rec nat_of_onion_bulb ob =
match ob with
| Bud ->
Z
| Layer ob' ->
S (nat_of_onion_bulb ob');;
let rec onion_bulb_of_nat n =
match n with
| Z ->
Bud
| S n' ->
Layer (onion_bulb_of_nat n');;
If we were to implement a generator of random values of type nat and a generator of random onion bulbs, we could verify that
The list type is predefined in OCaml, but still here a definition anyhow:
type 'a homemade_list = Nil | Cons of 'a * 'a homemade_list;;
Using these two constructors, we can construct our own lists:
# Nil;;
- : 'a homemade_list = Nil
# Cons (1, Nil);;
- : int homemade_list = Cons (1, Nil)
# Cons (2, Cons (1, Nil));;
- : int homemade_list = Cons (2, Cons (1, Nil))
# Cons ("2", Cons ("1", Nil));;
- : string homemade_list = Cons ("2", Cons ("1", Nil))
#
This home-made type is isomorphic to the type of lists since we can implement two conversion functions that map any list to a home-made list, and any home-made list to a list. These conversion functions are inverses of each other:
let rec list_of_homemade_list vs =
match vs with
| Nil ->
[]
| Cons (v, vs') ->
v :: list_of_homemade_list vs';;
let rec homemade_list_of_list vs =
match vs with
| [] ->
Nil
| v :: vs' ->
Cons (v, homemade_list_of_list vs');;
If we were to implement a generator of random lists and a generator of random home-made lists, we could verify that
This verification is carried out in the accompanying resource file.
Pablito: Struggling here.
Dana: How so?
Pablito: I am trying to express homemade_list_of_list using list_fold_right.
Dana (smiling): A worthy goal.
Pablito: But how do I do that? I mean its definition doesn’t quite match that of list_fold_right, look:
let list_fold_right nil_case cons_case vs =
let rec visit vs =
match vs with
| [] ->
nil_case
| v :: vs' ->
cons_case v (visit vs')
in visit vs;;
Dana: I’m looking.
Pablito: Clearly Nil is the nil case, and fun v ih -> Cons (v, ih) is the cons case, but then what?
Dana: That’s because homemade_list_of_list is implemented as a globally recursive function.
Pablito: OK?
Dana: Implement it with a local recursive function and you should be in business.
Pablito: You mean, like this:
let homemade_list_of_list vs_given =
let rec visit vs =
match vs with
| [] ->
Nil
| v :: vs' ->
Cons (v, visit vs')
in visit vs_given;;
Dana: Yup.
Pablito: These two implementations are equivalent, right?
Dana: Yes. The locally recursive implementation is a bit more friendly for the compiler, that’s all.
Pablito: Now what.
Dana: Oh, it’s simple really – in this definition, the compiler can distinguish between the calls to homemade_list_of_list and the initial call to visit, and therefore it has better control of recursive calls. Here, anybody can call homemade_list_of_list but only visit can call itself. And then the compiler can emit better target code. You’ll see when you implement a compiler.
Pablito: OK, thanks. I guess the same point holds for list_of_homemade_list, should I implement homemade_list_fold_right.
Dana: Yup.
Just as functions can be defined in a mutually recursive fashion (e.g., the even / odd predicates over natural numbers are defined in terms of each other), types can be defined in a mutually recursive fashion using the and keyword:
Revisiting the section about More agricultural vagaries: the case of shallots, in Week 02, here is again the grammar of shallots, a type of onion that can contain more than one bulb inside a layer:
This BNF is implemented as an OCaml data type:
type shallot_bulbs = One of shallot_bulb | More of shallot_bulb * shallot_bulbs
and shallot_bulb = Bud' | Layer' of shallot_bulbs;;
(The names Bud' and Layer' are primed to distinguish them from the data constructors in onion.)
The resource file contains both an unparser and a parser for shallots. (Their respective types are shallot_bulb -> string and string -> shallot_bulb.)
For another example, here is an alternative representation of natural numbers with one successor constructor for even numbers and another for odd numbers:
type even_nat = ZZ | E of odd_nat
and odd_nat = O of even_nat;;
Whereas 2 is represented as S (S Z) with type nat, it is represented as E (O ZZ) with type even_nat, and whereas 3 is represented as S (S (S Z)) with type nat, it is represented as O (E (O ZZ)) with type odd_nat. The parity of the represented number is now conveyed in the type of its representation.
The resource file contains conversion functions from non-negative integers to even_nat (resp. odd_nat) and vice versa as well as two unit-test functions verifying that mapping a random non-negative even integer to a value of type even_nat and back yields this non-negative even integer and that mapping a random non-negative odd integer to a value of type odd_nat and back yields this non-negative odd integer.
The grammars from Week 02 can be implemented as OCaml data types: the grammatical correctness of a sentence corresponds to the type correctness of its OCaml representation. As a corollary, once represented in OCaml, abstract-syntax trees are well-typed and therefore well-formed—unlike some of those from Exercise 12 of Week 02, which were not well-formed.
Let us revisit the section about A BNF for regular expressions and its grammar for regular expressions:
This BNF is implemented with the following data type:
type regexp =
Empty
| Atom of int
| Any
| Seq of regexp * regexp
| Disj of regexp * regexp
| Star of regexp
| Plus of regexp;;
where the non-terminal is represented as the type and its productions as the constructors for this type.
Likewise, and to a first approximation (see The proof is trivial (revisited) in the next chapter), the grammar from The proof is trivial (variations on a theme) in Week 04,
is implemented with the following data types:
type verb =
Verb1
| Verb2;;
type complement1 =
Complement11
| Complement12
| Complement13
| Complement14;;
type complement2 =
Complement21
| Complement22
| Complement23
| Complement24
| Complement25;;
type proof =
Proof of verb * complement1 * complement2;;
where the non-terminals is represented as types and their productions as the constructors for these types.
A proof is unparsed with the following OCaml functions:
let show_verb v =
match v with
| Verb1 ->
"view the problem as"
| Verb2 ->
"biject it to";;
let show_complement1 c1 =
match c1 with
| Complement11 ->
"a continuous complexity class"
| Complement12 ->
"an open DAG"
| Complement13 ->
"a non-degenerate topological space"
| Complement14 ->
"a computable hypergraph";;
let show_complement2 c2 =
match c2 with
| Complement21 ->
"exact monoids"
| Complement22 ->
"skew-symmetric combinatorial games"
| Complement23 ->
"dihedral rings"
| Complement24 ->
"semi-decidable homeomorphisms"
| Complement25 ->
"closed Cayley graphs";;
let show_proof p =
match p with
| Proof (v, c1, c2) ->
"The proof is trivial! Just " ^ show_verb v ^ " " ^ show_complement1 c1 ^ " whose elements are " ^ show_complement2 c2;;
Created [10 Jan 2023]