OCaml is a typed language. Its syntactic units are expressions, and evaluating an expression yields a value that has a type. The goal of this lecture note is to introduce OCaml expressions through their grammar, together with the corresponding grammar of types:
Each expression has a type, and satisfies the following typing judgment:
G |- e : t
In words: in the type environment G, the expression e has type t.
The rest of this lecture note incrementally defines G, e, and t.
The way to interact with OCaml is through a toplevel loop that reads an expression and, if this expression is both grammatically correct and type correct, evaluates it and prints the result, if there is one:
A comment is any sequence of characters that starts with (* and finishes with *). It is skipped by the OCaml lexical analyzer.
Also, comments can be nested, i.e., they can contain other comments.
OCaml features integers:
The corresponding typing rule reads as follows:
INT | where n is an integer | |
G |- n : int |
In words: in any type environment G, an integer n has type int.
For example, the following interactive session illustrates how to play with numbers:
# 0;;
- : int = 0
# 1;; (* an implicitly positive integer *)
- : int = 1
# +1;; (* an explicitly positive integer *)
- : int = 1
# -1;; (* a negative integer *)
- : int = -1
#
By default, i.e., off the shelf, 4611686018427387903 and -4611686018427387904 are respectively the largest and the smallest integers that can be represented in OCaml. (For the cognoscenti, exponentiating 2 with 0 yields 1, exponentiating 2 with 1 yields 2, exponentiating 2 with 2 yields 4, exponentiating 2 with 3 yields 8, exponentiating 2 with 10 yields 1024, and exponentiating 2 with 62 yields 4611686018427387904. So the largest value of type int is 2^62 - 1 and the smallest value of type int is -2^62.)
OCaml also features Booleans:
The corresponding typing rule reads as follows:
BOOL_TRUE | ||
G |- true : bool |
BOOL_FALSE | ||
G |- false : bool |
In words,
For example, the following interactive session illustrates how to play with Booleans:
# true;;
- : bool = true
# false;;
- : bool = false
#
Reminders:
OCaml also features characters:
The corresponding typing rule reads as follows:
CHAR | where c is a character | |
G |- c : char |
In words: in any type environment G, a character c has type char.
For example, the following interactive session illustrates how to play with characters:
# 'a';;
- : char = 'a'
# 'b';;
- : char = 'b'
# 'c';;
- : char = 'c'
#
Harald: So the way to write a character in OCaml is to type a letter between two single quotes?
Alfrothul: I guess not just a letter – any character. Let me try:
# 'A';;
- : char = 'A'
# '3';;
- : char = '3'
#
Harald: So, anything we can type in one keystroke is a character? Let me try:
# ' ';;
- : char = ' '
# '.';;
- : char = '.'
# '(';;
- : char = '('
#
Loki: What about the single quote itself?
Alfrothul: Uh-oh:
# ''';;
Characters 0-1:
''';;
^
Error: Syntax error
#
Mimer: OCaml uses the convention to quote the single quote character, using backslash, as a parsing convention:
# '\'';;
- : char = '\''
#
Brynja: For consistency, the backslash character should be quoted using backslash as well:
# '\\';;
- : char = '\\'
#
Mimer: Yup.
OCaml also features strings:
The corresponding typing rule reads as follows:
STRING | where s is a string | |
G |- s : string |
In words: in any type environment G, a string s has type string.
For example, the following interactive session illustrates how to play with strings:
# "hello world";;
- : string = "hello world"
#
Alfrothul: Let me guess – the way to write a string in OCaml that contains the double quote character is to quote this character with a backslash:
# "\"";;
- : string = "\""
#
Harald: Indeed. And the way to write a string in OCaml that contains the backslash character is to quote it with a backslash:
# "\\";;
- : string = "\\"
#
Loki: What about "\\\\\\"?
Brynja: Logically, it is a string that contains three consecutive occurrences of the backslash character. And I guess that we should be able to write the empty string with two consecutive occurrences of the string delimiter, i.e., the double quote character:
# "";;
- : string = ""
#
Vigfus: Wait – why does OCaml say that the type of "42" is string:
# "42";;
- : string = "42"
#
Vigfus: Shouldn’t it be, I don’t know, int or perhaps even "int" or something?
Alfrothul: Because this is not a pipe, remember?
Harald: "42" is a string that contains two characters that happen to be numerical ones.
Vigfus: So you mean that '4' and '2' don’t have type, I don’t know, int or perhaps even 'int'? OCaml says that they have type char.
Harald: Right. They are numerical characters.
Alfrothul: Also, "int" and 'int' are not grammatically correct types.
OCaml also features conditional expressions:
The corresponding typing rule reads as follows:
IF | G |- e1 : bool | G |- e2 : t | G |- e3 : t | |
G |- if e1 then e2 else e3 : t |
In words: in any type environment G, the expression if e1 then e2 else e3 has type t whenever in that environment, e1 (the test) has type bool and each of e2 (the consequent) and e3 (the alternative) has type t.
Conditional expressions are polymorphic (i.e., can have any type) in that as long as the type of their consequent and their alternative are the same, they yield values of this type, no matter what it is.
For example, the following interactive sessions illustrate how to play with conditional expressions:
a conditional expression can yield an integer:
# if true then 1 else 2;;
- : int = 1
# if false then 1 else 2;;
- : int = 2
#
a conditional expression can yield a string:
# if true then "hello" else "world";;
- : string = "hello"
# if false then "hello" else "world";;
- : string = "world"
#
Conditional expressions can also be nested, e.g., occur as a test or as a consequent in another conditional expression:
# if (if true then true else false) then 'a' else 'b';;
- : char = 'a'
# if (if false then true else false) then 'a' else 'b';;
- : char = 'b'
# if true then (if true then 'c' else 'd') else 'e';;
- : char = 'c'
# if true then (if false then 'c' else 'd') else 'e';;
- : char = 'd'
# if false then (if true then 'c' else 'd') else 'e';;
- : char = 'e'
# if false then (if false then 'c' else 'd') else 'e';;
- : char = 'e'
#
When a conditional expression occurs as a test in another conditional expression, its type must be bool.
Play with
In each case, state what result you expect and verify that OCaml validates your prediction.
OCaml also features pairs:
The corresponding typing rule reads as follows:
PAIR | G |- e1 : t1 | G |- e2 : t2 | |
G |- (e1, e2) : t1 * t2 |
In words: in any type environment G, the expression (e1, e2) has type t1 * t2 whenever in that environment, e1 (the first component) has type t1 and e2 (the second component) has type t2.
Pairs too are polymorphic in that given two components of any type t1 and t2, they yield values of type t1 * t2, no matter what t1 and t2 are.
For example, the following interactive sessions illustrate how to play with pairs:
# (1, 2);;
- : int * int = (1, 2)
# ("hello", "world");;
- : string * string = ("hello", "world")
#
Harald: I guess the two components of a pair can be of distinct types:
# (1, "world");;
- : int * string = (1, "world")
# ("hello", 2);;
- : string * int = ("hello", 2)
#
Alfrothul: Hey, we can even nest pairs:
# ((1, "world"), ("hello", 2));;
- : (int * string) * (string * int) = ((1, "world"), ("hello", 2))
#
Brynja: We can even construct pairs conditionally:
# if true then (true, 22) else (false, 33);;
- : bool * int = (true, 22)
#
Harald: And we can prove that this particular expression has this particular type.
Alfrothul: Why don’t you go ahead and do that.
Harald: First of all, we start at the bottom with what we want to check, namely that in any type environment G, the expression if true then (true, 22) else (false, 33) has type bool * int:
G |- if true then (true, 22) else (false, 33) : bool * int
Alfrothul: Oh. There is only one typing rule we can apply, namely IF, where G is instantiated with G, e1 is instantiated with true, e2 is instantiated with (true, 22), e3 is instantiated with (false, 33), and t is instantiated with bool * int:
G |- true : bool G |- (true, 22) : bool * int G |- (false, 33) : bool * int
IF ---------------------------------------------------------------------------------
G |- if true then (true, 22) else (false, 33) : bool * int
Now what. There are three branches to consider.
Harald: Let’s consider the left branch. There is only one typing rule we can apply, namely BOOL_TRUE, since true is a Boolean:
BOOL_TRUE ----------------
G |- true : bool G |- (true, 22) : bool * int G |- (false, 33) : bool * int
IF ---------------------------------------------------------------------------------
G |- if true then (true, 22) else (false, 33) : bool * int
This completes the left branch.
Alfrothul: OK. Let’s consider the branch in the middle. There is only one typing rule we can apply, namely PAIR, where G is instantiated with G, e1 is instantiated with true, e2 is instantiated with 22, t1 is instantiated with bool, and t2 is instantiated with int:
G |- true : bool G |- 22 : int
BOOL_TRUE ---------------- PAIR ---------------------------------
G |- true : bool G |- (true, 22) : bool * int G |- (false, 33) : bool * int
IF -------------------------------------------------------------------------------------------
G |- if true then (true, 22) else (false, 33) : bool * int
Oh, man. Two new branches.
Brynja (jumping in): Let’s consider the left branch of the branch in the middle. There is only one typing rule we can apply, namely BOOL_TRUE, since true is a Boolean:
BOOL_TRUE ----------------
G |- true : bool G |- 22 : int
BOOL_TRUE ---------------- PAIR ---------------------------------
G |- true : bool G |- (true, 22) : bool * int G |- (false, 33) : bool * int
IF ------------------------------------------------------------------------------------------------
G |- if true then (true, 22) else (false, 33) : bool * int
This completes the left branch of the branch in the middle.
Harald: Let’s consider the right branch of the branch in the middle. There is only one typing rule we can apply, namely INT, since 22 is a integer:
BOOL_TRUE ---------------- INT -------------
G |- true : bool G |- 22 : int
BOOL_TRUE ---------------- PAIR -------------------------------------
G |- true : bool G |- (true, 22) : bool * int G |- (false, 33) : bool * int
IF ----------------------------------------------------------------------------------------------------
G |- if true then (true, 22) else (false, 33) : bool * int
This completes the right branch of the branch in the middle.
Alfrothul: Let’s consider the right branch. Again, there is only one typing rule we can apply, namely PAIR, where G is instantiated with G, e1 is instantiated with false, e2 is instantiated with 33, t1 is instantiated with bool, and t2 is instantiated with int:
BOOL_TRUE ---------------- INT -------------
G |- true : bool G |- 22 : int G |- false : bool G |- 33 : int
BOOL_TRUE ---------------- PAIR ------------------------------------- PAIR ----------------------------------
G |- true : bool G |- (true, 22) : bool * int G |- (false, 33) : bool * int
IF ---------------------------------------------------------------------------------------------------------
G |- if true then (true, 22) else (false, 33) : bool * int
Two new branches. Again.
Harald: Let’s consider the left branch of the right branch. There is only one typing rule we can apply, namely BOOL_FALSE, since false is a Boolean:
BOOL_TRUE ---------------- INT ------------- BOOL_FALSE -----------------
G |- true : bool G |- 22 : int G |- false : bool G |- 33 : int
BOOL_TRUE ---------------- PAIR ------------------------------------- PAIR ----------------------------------
G |- true : bool G |- (true, 22) : bool * int G |- (false, 33) : bool * int
IF ---------------------------------------------------------------------------------------------------------
G |- if true then (true, 22) else (false, 33) : bool * int
This completes the left branch of the right branch.
Alfrothul: Let’s consider the right branch of the right branch. There is only one typing rule we can apply, namely INT, since 33 is a integer:
BOOL_TRUE ---------------- INT ------------- BOOL_FALSE ----------------- INT -------------
G |- true : bool G |- 22 : int G |- false : bool G |- 33 : int
BOOL_TRUE ---------------- PAIR ------------------------------------- PAIR --------------------------------------
G |- true : bool G |- (true, 22) : bool * int G |- (false, 33) : bool * int
IF ---------------------------------------------------------------------------------------------------------------
G |- if true then (true, 22) else (false, 33) : bool * int
This completes the right branch of the right branch.
Harald: So the proof tree is complete, which proves that if true then (true, 22) else (false, 33) has type bool * int in any type environment G.
More generally, OCaml also features tuples, of which pairs are a particular case:
The corresponding typing rule reads as follows:
TUPLE | G |- e1 : t1 | ... | G |- eN : tN | |
G |- (e1, ..., eN) : t1 * ... * tN |
In words: in any type environment G, the expression (e1, ..., eN) has type t1 * ... * tN whenever in that environment, e1 (the first component) has type t1, e2 (the second component) has type t2,..., and eN (the last component) has type tN.
Like pairs, tuples are polymorphic in that given N components of any type t1, ..., and tN, they yield values of type t1 * ... * tN, no matter what t1, ..., and tN are.
For example, the following interactive session illustrates how to play with tuples:
# (1, 2, 3);;
- : int * int * int = (1, 2, 3)
# (1, 2, 3, 4);;
- : int * int * int * int = (1, 2, 3, 4)
# (1, 2, 3, 4, 5);;
- : int * int * int * int * int = (1, 2, 3, 4, 5)
#
Play with tuples containing values of different types, including other tuples.
In each case, state what result you expect and verify that OCaml validates your prediction.
OCaml does not feature a tuple of size 1, but it does feature the empty tuple, which has type unit:
UNIT | ||
G |- () : unit |
In words: in any type environment G, the expression () has type unit.
For example, the following interactive session illustrates how to play with the empty tuple:
# ();;
- : unit = ()
#
Would it make sense for OCaml to feature a tuple of size 1? What do the following expressions evaluate to? And what is their type?
Does this make sense?
Harald: So parentheses in OCaml are either used for tuples...
Alfrothul: ...or to disambiguate nested expressions.
Mimer: You got it.
OCaml also features names (a.k.a. “identifiers” or “variables”), and their type is held in the type environment:
The type environment holds a collection of bindings in which names denote types. Looking up a name x in a type environment G where x denotes t is carried out with the following inference rules:
LOOKUP_FOUND | ||
(x : t), G |- x : t |
LOOKUP_NOT_FOUND_YET | G |- x : t | where x and x' differ |
(x' : t'), G |- x : t |
So for example,
In the last case, the first occurrence of x is said to “shadow” its other occurrences.
Justify the examples above by drawing the corresponding proof trees.
Let’s pick x as a name and yut as a yet-unspecified type, and look up x in the empty environment. The root of the tree reads:
. |- x : yut
We can apply neither LOOKUP_FOUND nor LOOKUP_NOT_FOUND_YET, and therefore we cannot construct a proof tree and specify a type: looking up any name is the empty environment is undefined.
The root of the tree reads:
(x : int), (y : bool), (z : char), . |- x : yut
for some yet-unspecified type yut.
We can (only) apply LOOKUP_FOUND, where x is instantiated with x, t is instantiated with int, G is instantiated with (y : bool), (z : char), ., x is instantiated with x, and t is instantiated with yut:
LOOKUP_FOUND -----------------------------------------------
(x : int), (y : bool), (z : char), . |- x : yut
Since in the application of LOOKUP_FOUND, t is instantiated both with int and with yut, yut is constrained to be int.
The proof tree is complete. Let us draw it again by replacing yut with int:
LOOKUP_FOUND -----------------------------------------------
(x : int), (y : bool), (z : char), . |- x : int
The existence of this tree proves that in the type environment (x : int), (y : bool), (z : char), ., looking up x yields the type int.
The root of the tree reads:
(x : int), (y : bool), (z : char), . |- y : yut
for some yet-unspecified type yut.
We can (only) apply LOOKUP_NOT_FOUND_YET, where x' is instantiated with x, t' is instantiated with int, G is instantiated with (y : bool), (z : char), ., x is instantiated with y, and t is instantiated with yut:
(y : bool), (z : char), . |- y : yut
LOOKUP_NOT_FOUND_YET -----------------------------------------------
(x : int), (y : bool), (z : char), . |- y : yut
We can (only) apply LOOKUP_FOUND, where x is instantiated with y, t is instantiated with bool, G is instantiated with (z : char), ., x is instantiated with y, and t is instantiated with yut:
LOOKUP_FOUND ------------------------------------
(y : bool), (z : char), . |- y : yut
LOOKUP_NOT_FOUND_YET -----------------------------------------------
(x : int), (y : bool), (z : char), . |- y : yut
Since in the application of LOOKUP_FOUND, t is instantiated both with bool and with yut, yut is constrained to be bool.
The proof tree is complete. Let us draw it again by replacing yut with bool:
LOOKUP_FOUND -------------------------------------
(y : bool), (z : char), . |- y : bool
LOOKUP_NOT_FOUND_YET ------------------------------------------------
(x : int), (y : bool), (z : char), . |- y : bool
The existence of this tree proves that in the type environment (x : int), (y : bool), (z : char), ., looking up y yields the type bool.
The root of the tree reads:
(x : int), (y : bool), (z : char), . |- z : yut
for some yet-unspecified type yut.
We can (only) apply LOOKUP_NOT_FOUND_YET, where x' is instantiated with x, t' is instantiated with int, G is instantiated with (y : bool), (z : char), ., x is instantiated with z, and t is instantiated with yut:
(y : bool), (z : char), . |- z : yut
LOOKUP_NOT_FOUND_YET -----------------------------------------------
(x : int), (y : bool), (z : char), . |- z : yut
We can (only) apply LOOKUP_NOT_FOUND_YET, where x' is instantiated with y, t' is instantiated with bool, G is instantiated with (z : char), ., x is instantiated with z, and t is instantiated with yut:
(z : char), . |- z : yut
LOOKUP_NOT_FOUND_YET ------------------------------------
(y : bool), (z : char), . |- z : yut
LOOKUP_NOT_FOUND_YET -----------------------------------------------
(x : int), (y : bool), (z : char), . |- z : yut
We can (only) apply LOOKUP_FOUND, where x is instantiated with z, t is instantiated with char, G is instantiated with ., x is instantiated with z, and t is instantiated with yut:
LOOKUP_FOUND ------------------------
(z : char), . |- z : yut
LOOKUP_NOT_FOUND_YET ------------------------------------
(y : bool), (z : char), . |- z : yut
LOOKUP_NOT_FOUND_YET -----------------------------------------------
(x : int), (y : bool), (z : char), . |- z : yut
Since in the application of LOOKUP_FOUND, t is instantiated both with char and with yut, yut is constrained to be char.
The proof tree is complete. Let us draw it again by replacing yut with char:
LOOKUP_FOUND -------------------------
(z : char), . |- z : char
LOOKUP_NOT_FOUND_YET -------------------------------------
(y : bool), (z : char), . |- z : char
LOOKUP_NOT_FOUND_YET ------------------------------------------------
(x : int), (y : bool), (z : char), . |- z : char
The existence of this tree proves that in the type environment (x : int), (y : bool), (z : char), ., looking up z yields the type char.
The root of the tree reads:
(x : int), (x : bool), (x : char), . |- x : yut
for some yet-unspecified type yut.
We can (only) apply LOOKUP_FOUND, where x is instantiated with x, t is instantiated with int, G is instantiated with (x : bool), (x : char), ., x is instantiated with x, and t is instantiated with yut:
LOOKUP_FOUND -----------------------------------------------
(x : int), (x : bool), (x : char), . |- x : yut
Since in the application of LOOKUP_FOUND, t is instantiated both with int and with yut, yut is constrained to be int.
The proof tree is complete. Let us draw it again by replacing yut with int:
LOOKUP_FOUND -----------------------------------------------
(x : int), (x : bool), (x : char), . |- x : int
The existence of this tree proves that in the type environment (x : int), (x : bool), (x : char), ., looking up x yields the type int.
Likewise, when a syntactically correct and type-correct expression is evaluated, this evaluation takes place in an environment where names denote values.
OCaml features predeclared names such as min_int and max_int that denote the smallest and the largest representable integers:
# min_int;;
- : int = -4611686018427387904
# max_int;;
- : int = 4611686018427387903
#
These names are bound in OCaml’s initial environment.
OCaml provides language support to declare our own variables to name the result of evaluating an expression at the toplevel:
In the subsequent iterations of the toplevel loop, the name declared in a let-expression can be used to refer to the value of its definiens. The following interactive session illustrates how to introduce global names and use them in the ensuing interaction. At every iteration of the toplevel loop, the system prints (from right to left) the resulting value, its type, and its name (if there is one – so far there hasn’t been any) – or an error message:
at first, x does not denote anything:
# x;;
Characters 0-1:
x;;
^
Error: Unbound value x
#
in effect, the current type environment (let’s call it G) and the current (value) environment do not contain any binding for x;
so we declare it to denote a value:
# let x = 1;;
val x : int = 1
#
in effect, the current type environment is now (x : int), G and the current environment binds x to 1;
then we can use it to access the value it denotes:
# x;;
- : int = 1
#
we can even re-declare it to denote another value:
# let x = 100;;
val x : int = 100
#
in effect, the current type environment is now (x : int), (x : int), G and the current environment binds x to 100;
then we can still use it to access the value it denotes:
# x;;
- : int = 100
#
we can also re-declare it to denote a value of another type:
# let x = "world";;
val x : string = "world"
#
in effect, the current type environment is now (x : string), (x : int), (x : int), G and the current environment binds x to "world";
again, we can use this name:
# x;;
- : string = "world"
#
if an error occurs when the definiens is evaluated, no naming takes place:
# let say_what = if 0 then 1 else -1;;
Characters 18-19:
let say_what = if 0 then 1 else -1;;
^
Error: This expression has type int but an expression was expected of type
bool
# say_what;;
Characters 0-8:
say_what;;
^^^^^^^^
Error: Unbound value say_what
#
in effect, the current type environment is still (x : string), (x : int), (x : int), G.
In OCaml, several (distinct) global names can be defined at once:
For example:
# let x = 10 and y = 100;;
val x : int = 10
val y : int = 100
#
This makes it simple, for example, to (re-)declare each of x and y to denote the previous value of the other:
# let x = y and y = x;;
val x : int = 100
val y : int = 10
#
Attempting to declare the same global name twice gives rise to an error:
# let same = true and same = false;;
Characters 20-24:
let same = true and same = false;;
^^^^
Error: Variable same is bound several times in this matching
#
OCaml also features functions and function applications:
The corresponding typing rules read as follows:
FUN | (x : t1), G |- e : t2 | |
G |- fun x -> e : t1 -> t2 |
APP | G |- e0 : t1 -> t2 | G |- e1 : t1 | |
G |- e0 e1 : t2 |
In words:
Like conditional expressions and tuples, these two syntax constructs are polymorphic.
Vocabulary:
OCaml comes with a mass of predeclared functions such as not, which has type bool -> bool and negates its argument, and, e.g., String.length, which is declared in the library String, has type string -> int, and computes the length of any given string.
Here is not at work:
# not;;
- : bool -> bool = <fun>
# not true;;
- : bool = false
# not false;;
- : bool = true
# not (not true);;
- : bool = true
# not (not false);;
- : bool = false
#
Here is String.length at work:
# String.length;;
- : string -> int = <fun>
# String.length "";;
- : int = 0
# String.length "a";;
- : int = 1
# String.length "ab";;
- : int = 2
# String.length "abc";;
- : int = 3
# String.length "\"";;
- : int = 1
# String.length "\\\\\\";;
- : int = 3
#
As one can see, the string "\"" has size 1 and the string "\\\\\\" has size 3.
Alfrothul and Brynja: Ha!
Grammatically, nothing prevents us from writing functions that return functions, and then
applying such functions to a first argument, and
applying the result to a second argument:
# ((fun x -> fun y -> x) 1) 2;;
- : int = 1
# ((fun x -> fun y -> y) 1) 2;;
- : int = 2
#
Actually, function applications associate to the left, so the outermost parentheses are unnecessary:
# (fun x -> fun y -> x) 1 2;;
- : int = 1
# (fun x -> fun y -> y) 1 2;;
- : int = 2
#
Correspondingly, the function type associates to the right, so that, e.g., in int -> (int -> int), the parentheses are unnecessary and this type is written int -> int -> int.
As it turns out, OCaml comes with a plethora of such predeclared functions, as described next.
In practice, it is too verbose to write fun x -> fun y -> ... so this verbosity is alleviated with the short hand fun x y -> ...:
# (fun x -> fun y -> x) 1 10;;
- : int = 1
# (fun x y -> x) 1 10;;
- : int = 1
# (fun x -> fun y -> y) 1 10;;
- : int = 10
# (fun x y -> y) 1 10;;
- : int = 10
#
Take the addition function, for example. OCaml provides the standard (left-associative) infix notation for it:
# 1 + 10;;
- : int = 11
# 1 + 10 + 100;;
- : int = 111
#
However, that is just syntactic sugar, i.e., a notational convenience. The truth of the matter is that behind the syntactic sugar, the addition function is an example of a function that returns another function, as revealed by parenthesizing the infix operator:
# (+);;
- : int -> int -> int = <fun>
# (+) 1;;
- : int -> int = <fun>
# (+) 1 10;;
- : int = 11
#
In other words, there is no need for a typing rule such as
ADD | G |- e1 : int | G |- e2 : int | |
G |- e1 + e2 : int |
because OCaml’s initial type environment binds (+) to int -> int -> int and the parser desugars an expression such as e1 + e2 into (+) e1 e2.
And with the convenience of infix notation, OCaml provides a wealth of such functions:
subtraction:
# (-);;
- : int -> int -> int = <fun>
# 10 - 1;;
- : int = 9
# 1 - 10;;
- : int = -9
#
multiplication:
# 2 * 3;;
- : int = 6
#
quotient:
# (/);;
- : int -> int -> int = <fun>
# 4 / 2;;
- : int = 2
# 5 / 2;;
- : int = 2
# 1 / 0;;
Exception: Division_by_zero.
#
modulo:
# (mod);;
- : int -> int -> int = <fun>
# 12 mod 5;;
- : int = 2
#
minimum of two integers (prefix notation):
# (min 2 5, min 6 3);;
- : int * int = (2, 3)
#
maximum of two integers (prefix notation):
# (max 2 5, max 6 3);;
- : int * int = (5, 6)
#
comparison of two integers:
# (2 < 3, 3 < 3, 3 < 2);;
- : bool * bool * bool = (true, false, false)
#
Boolean conjunction:
# (&&);;
- : bool -> bool -> bool = <fun>
# true && false;;
- : bool = false
#
Boolean disjunction:
# (||);;
- : bool -> bool -> bool = <fun>
# false || false || true;;
- : bool = true
#
string concatenation:
# (^);;
- : string -> string -> string = <fun>
# "hello" ^ " " ^ "world";;
- : string = "hello world"
#
The String library function also features a function index a string:
# String.get;;
- : string -> int -> char = <fun>
# String.get "abc" 0;;
- : char = 'a'
# String.get "abc" 1;;
- : char = 'b'
# String.get "abc" 2;;
- : char = 'c'
# String.get "abc" 3;;
Exception: Invalid_argument "index out of bounds".
# String.get "\"" 0;;
- : char = '"'
# String.get "\\" 0;;
- : char = '\\'
#
Alfrothul and Harald: Yay!
Can you visualize the multiplication function hidden behind the infix notation *, as we did just above for addition (+), subtraction (-), quotient (/), modulo (mod), etc.? For example:
# 1 * 2;;
- : int = 2
# 1 * 2 * 3;;
- : int = 6
# 1 * 2 * 3 * 4;;
- : int = 24
# 1 * 2 * 3 * 4 * 5;;
- : int = 120
#
Loki: No comment.
This exercise is a followup to Exercise 4.
After typing (*);; at the toplevel, type *)33;;. What happens? Why does it happen?
Shouldn’t you write ( * ) instead?
The point of these exercises is the ambiguity of (* and *) as comment delimiters and of (*) as parenthesizing the infix operator *:
(* ) is ambiguous – is it the beginning of a comment or the multiplication function?
OCaml made a choice:
# (*) *);;
Characters 0-3:
(*) *);;
^^^
Warning 1: this is the start of a comment.
#
( *) is ambiguous – is it the end of a comment or the multiplication function?
OCaml made a choice:
# ( *);;
Characters 2-4:
( *);;
^^
Warning 2: this is not the end of a comment.
- : int -> int -> int = <fun>
#
Vigfus (fresh out of solving Exercise 5): See my nice painting:
+-------------------------------------+
| |
| |
| |
| ( *);; |
| |
| |
| |
| This is not the end of a comment. |
+-------------------------------------+
Alfrothul (admirative): Magritte lives.
Harald: You can say that again.
Alfrothul: Er... Sure. Magritte lives.
Brynja: You guys are so literal sometimes.
Loki (on his way out): Say no more.
Alfrothul: Er... Did he mean no, as in no?
Harald: I think so, yes. Yes, he did.
Vigfus: I’d say yes too.
Brynja: Yes, as in yes.
Halcyon (pensive): Being told “Say no more” doesn’t make people say yes any less, it seems.
OCaml also provides a family of equality functions for each of the ground types (integers, Booleans, characters, and strings), each noted with the infix operator =:
# 1 = 1;;
- : bool = true
# 1 = 2;;
- : bool = false
# true = true;;
- : bool = true
# 'a' = 'b';;
- : bool = false
# "hello" = "world";;
- : bool = false
#
Such Boolean-valued functions are often called “predicates”.
Accordingly, the equality predicate comes with a family of typing rules:
equal_int | G |- e1 : int | G |- e2 : int | |
G |- e1 = e2 : bool |
equal_bool | G |- e1 : bool | G |- e2 : bool | |
G |- e1 = e2 : bool |
equal_char | G |- e1 : char | G |- e2 : char | |
G |- e1 = e2 : bool |
equal_string | G |- e1 : string | G |- e2 : string | |
G |- e1 = e2 : bool |
Say that we need to evaluate a series of expressions that only differ in part, e.g., to start enumerating the sequence of odd numbers:
# 2 * 0 + 1;;
- : int = 1
# 2 * 1 + 1;;
- : int = 3
# 2 * 2 + 1;;
- : int = 5
# 2 * 3 + 1;;
- : int = 7
# 2 * 4 + 1;;
- : int = 9
# 2 * 5 + 1;;
- : int = 11
#
The part that differs is 0, 1, 2, 3, 4, and 5. The part that remains the same is the template 2 * ... + 1, often referred to as a delimited context (an expression with a hole) and noted as [2 * [.] + 1] where the outer brackets delimit the outside of the context and the inside brackets mark the hole in the context. Filling this hole with 0, for example, yields the expression 2 * 0 + 1.
OCaml provides language support to represent this context: as a function, noted here fun x -> 2 * x + 1 where the hole is abstracted with an identifier that is declared as the formal parameter of the function. Applying this function is expressed by juxtaposing the function and its argument (a.k.a. its actual parameter):
# (fun x -> 2 * x + 1) 0;;
- : int = 1
# (fun x -> 2 * x + 1) 1;;
- : int = 3
# (fun x -> 2 * x + 1) 2;;
- : int = 5
# (fun x -> 2 * x + 1) 3;;
- : int = 7
# (fun x -> 2 * x + 1) 4;;
- : int = 9
# (fun x -> 2 * x + 1) 5;;
- : int = 11
#
In each of these interactions, x successively denotes 0, 1, ..., and 5 when the expression 2 * x + 1 is evaluated.
This way, in the expression that needs to be repeatedly evaluated (here: 2 * x + 1), the unchanging parts (here, the delimited context [2 * [.] + 1]) are factored away in the body of the function.
Assuming the following typing rule
MUL | G |- e1 : int | G |- e2 : int | |
G |- e1 * e2 : int |
and using the typing rule for addition mentioned above, i.e.,
ADD | G |- e1 : int | G |- e2 : int | |
G |- e1 + e2 : int |
find the type of the expression fun x -> 2 * x + 1, knowing that infix multiplication takes precedence over infix addition, i.e., the expression 2 * x + 1 actually reads (2 * x) + 1.
First of all, let us consult OCaml, just because it’s there and we can:
# fun x -> 2 * x + 1;;
- : int -> int = <fun>
#
Now all we need to do is construct a proof tree to the effect that the term fun x -> 2 * x + 1, or again to be precise, fun x -> (2 * x) + 1 has type int -> int in any type environment G:
Let us start at the root of the tree:
G |- fun x -> (2 * x) + 1 : int -> int
There is only one typing rule we can apply, namely FUN, where G is instantiated with the current type environment, i.e., G, x is instantiated with the formal parameter of the current function abstraction, i.e., x, e is instantiated with the body of the current function abstraction, i.e., (2 * x) + 1, t1 is instantiated with the domain of the current type (namely int -> int), i.e., int, and t2 is instantiated with the codomain of the current type (namely int -> int), i.e., int:
(x : int), G |- (2 * x) + 1 : int
FUN ------------------------------------
G |- fun x -> (2 * x) + 1 : int -> int
There is only one typing rule we can apply, namely ADD, where G is instantiated with the current type environment, i.e., (x : int), G, e1 is instantiated with the first operand in the current expression, i.e., (2 * x), and e2 is instantiated with the second operand in the current expression, i.e., 1:
(x : int), G |- 2 * x : int (x : int), G |- 1 : int
ADD ------------------------------------------------------
(x : int), G |- (2 * x) + 1 : int
FUN ------------------------------------
G |- fun x -> (2 * x) + 1 : int -> int
Let’s consider the left branch. There is only one typing rule we can apply, namely MUL, where G is instantiated with the current type environment, i.e., (x : int), G, e1 is instantiated with the first operand in the current expression, i.e., 2, and e2 is instantiated with the second operand in the current expression, i.e., x:
(x : int), G |- 2 : int (x : int), G |- x : int
MUL --------------------------------------------------
(x : int), G |- 2 * x : int (x : int), G |- 1 : int
ADD -----------------------------------------------------------------------------
(x : int), G |- (2 * x) + 1 : int
FUN ------------------------------------
G |- fun x -> (2 * x) + 1 : int -> int
Let’s consider the left branch of the left branch. There is only one typing rule we can apply, namely INT (where G is instantiated with (x : int), G and n is instantiated with 2), since 2 is an integer:
INT -----------------------
(x : int), G |- 2 : int (x : int), G |- x : int
MUL --------------------------------------------------
(x : int), G |- 2 * x : int (x : int), G |- 1 : int
ADD -----------------------------------------------------------------------------
(x : int), G |- (2 * x) + 1 : int
FUN ------------------------------------
G |- fun x -> (2 * x) + 1 : int -> int
This completes the left branch of the left branch.
Let’s consider the right branch of the left branch. There is only one typing rule we can apply, namely LOOKUP_FOUND, where x is instantiated with the first identifier in the current environment, i.e., x, t is instantiated with the type of the first identifier in the current environment, i.e., int, G is instantiated with the rest of the type environment, i.e., G, x is instantiated with the identifier to look up, i.e., x, and t is instantiated with the current type, i.e., int:
INT ----------------------- LOOKUP_FOUND -----------------------
(x : int), G |- 2 : int (x : int), G |- x : int
MUL ---------------------------------------------------------------
(x : int), G |- 2 * x : int (x : int), G |- 1 : int
ADD ------------------------------------------------------------------------------------------
(x : int), G |- (2 * x) + 1 : int
FUN ------------------------------------
G |- fun x -> (2 * x) + 1 : int -> int
This completes the right branch of the left branch.
Let’s consider the right branch. There is only one typing rule we can apply, namely INT (where G is instantiated with (x : int), G and n is instantiated with 1), since 1 is an integer:
INT ----------------------- LOOKUP_FOUND -----------------------
(x : int), G |- 2 : int (x : int), G |- x : int
MUL --------------------------------------------------------------- INT -----------------------
(x : int), G |- 2 * x : int (x : int), G |- 1 : int
ADD ----------------------------------------------------------------------------------------------
(x : int), G |- (2 * x) + 1 : int
FUN ------------------------------------
G |- fun x -> (2 * x) + 1 : int -> int
This completes the right branch.
The proof tree is complete, which proves that fun x -> 2 * x + 1 has type int -> int in any type environment G.
Under the same conditions as that in Exercise 6, find the type of the expression fun x -> fun y -> 2 * x + y.
Recommendation: first peruse both Exercise 6 and Exercise 18 and their solution.
Draw a proof tree to justify your answer for Item e. of Exercise 8, under conditions similar to that in Exercise 6.
Evaluating a lambda-abstraction yields an anonymous function, which may be practical but not always, since indeed most of the time, functions are named:
# let make_odd_number = fun x -> 2 * x + 1;;
val make_odd_number : int -> int = <fun>
# make_odd_number 0;;
- : int = 1
# make_odd_number 1;;
- : int = 3
# make_odd_number 2;;
- : int = 5
# make_odd_number 3;;
- : int = 7
# make_odd_number 4;;
- : int = 9
# make_odd_number 5;;
- : int = 11
#
Harald: Do you know many programming languages where one can declare an anonymous function?Alfrothul: Er... No.
In practice, it is too verbose to write let f = fun x -> e so this verbosity is alleviated with the short hand let f x = e:
# let make_odd_number x = 2 * x + 1;;
val make_odd_number : int -> int = <fun>
# make_odd_number 6;;
- : int = 13
# let x x = 2 * x + 1;;
val x : int -> int = <fun>
# x 6;;
- : int = 13
# let make_odd_number make_odd_number = 2 * make_odd_number + 1;;
# make_odd_number 6;;
- : int = 13
#
We have met the enemy and [they are] us.—Pogo
The naming style of reusing x and make_odd_number, in the last two definitions, is chiefly used to confuse the enemy.
At any rate, this short hand is only a notation: the name that is declared still denotes a function:
# let make_even_number x = 2 * x;;
val make_even_number : int -> int = <fun>
# make_even_number;;
- : int -> int = <fun>
#
Harald: Does the name of a formal parameter matter, in a lambda-abstraction?
Mimer: Beg pardon?
Harald: Does it matter how we name it?
Alfrothul: For example, does it matter whether we write fun x -> x + 1 or fun y -> y + 1?
Mimer: No, it doesn’t matter. These two lambda-abstractions evaluate to the same function.
Brynja: What do you mean by “the same function”?
Mimer: I mean that the two resulting values behave the same. Applying them to the same input yields the same output, always. So they are undistinguishable.
Harald: You mean they are equal?
Alfrothul: Because somehow OCaml somehow disagrees:
# (fun x -> x + 1) = (fun y -> y + 1);;
Exception: Invalid_argument "equal: functional value".
#
Mimer: Rightly so, since the only way to compare two functions is to test whether applying them to each of their possible input values yields the same output values. The equality predicate denoted by (=) cannot do that. Besides, as seen above, it is only defined for each of the ground types. In fact, ...
Brynja (hastily): We take your word for it, Mimer. One thing at a time.
Suppose we need to define a function that maps a number i not to 2 * i + 1, but to 3 * i + 1, and another that maps a number i to 4 * i + 1. The same post-rationalization as above applies: we have a delimited context [fun i -> [.] * i + 1]] which we can represent as a function:
# let make_multiple_plus_one = fun m -> fun i -> m * i + 1;;
val make_multiple_plus_one : int -> int -> int = <fun>
# make_multiple_plus_one 2 10;;
- : int = 21
# make_multiple_plus_one 3 10;;
- : int = 31
# make_multiple_plus_one 4 10;;
- : int = 41
#
This function returns another function:
# let make_odd_number = make_multiple_plus_one 2;;
val make_odd_number : int -> int = <fun>
#
In practice, it is too verbose to write let bar = fun x y -> ... so this verbosity is alleviated with the short hand let bar x y = .... It is only a notation: bar still denotes a function, and so does the result of applying bar to one argument:
# let bar x y = x + y;;
val bar : int -> int -> int = <fun>
# bar;;
- : int -> int -> int = <fun>
# bar 1;;
- : int -> int = <fun>
#
That said, fully applying bar to two actual parameters “saturates” its two formal parameters and its body x + y is then evaluated in an environment where x denotes 1 and y denotes 10:
# bar 1 10;;
- : int = 11
#
This notation makes it easy to think of functions that return functions as functions that take two arguments, which is fine as long as these functions are fully applied:
# let make_multiple_plus_one m x = m * x + 1;;
val make_multiple_plus_one : int -> int -> int = <fun>
# let make_odd_number x = make_multiple_plus_one 2 x;;
val make_odd_number : int -> int = <fun>
#
On a related note, how would you characterize the sum of a non-negative integer and its successor in terms of this integer?
Vigfus: Related. How related?Brynja: Related to make_odd_number, I think.
Conversely to having functions return functions, we can also apply functions to other functions, which is ideal to write unit tests, as developed in the next chapter.
Added the literal interlude [13 Feb 2020]
Fixed a couple of typos and clarified the narrative about applying typing rules [07 Feb 2020]
Fixed a typo in the the solution of Exercise 6, thanks to Karolina Grzeszkiewicz’s eagle eye [03 Feb 2020]
Fixed a horrendous typo in the FUN rule, thanks to Karolina Grzeszkiewicz’s eagle eye [03 Feb 2020]
Polished the construction of proof trees, and added the solution for Exercise 3 [01 Feb 2020]
Added the complete enumeration of the representable integers and the reminder about M-p and M-n [31 Jan 2020]
Factored out the part about unit tests [08 Jun 2019]
Spelled out code coverage [05 May 2019]
Fixed a consistent typo in the typing derivation of the third interlude, as well as a typo in the statement of Exercise 10, thanks to Yunjeong Lee’s eagle eye [30 Mar 2019]
Fixed a typo at the end of the typing derivation in the third interlude, thanks to Risa Shindo’s eagle eye [19 Mar 2019]
Added the global definition of several variables at once [22 Feb 2019]
Added the analysis of Exercises 4 and 5 and the artistic interlude [20 Feb 2019]
Clarified the statement of Exercise 4 [13 Feb 2019]
Polished the narrative [04 Feb 2019]
Aligned this lecture note with the lecture of Week 03 [03 Feb 2019]
Created [28 Jan 2019]