OCaml: a language of algebraic expressions, with names and functions

OCaml is a statically typed language. Its syntactic units are typed expressions, and evaluating a typed expression yields a value that has this type. The goal of this lecture note is to introduce OCaml expressions through their grammar, together with the corresponding grammar of types as well as type environments:

<type_environment> ::= ...
<type> ::= ...
<expression> ::= ...

Each expression has a type, and satisfies the following typing judgment, where G stands for <type_environment>, e stands for <expression>, and t stands for <type>:

G |- e : t

In words: in the type environment G, the expression e has type t.

The rest of this lecture note incrementally defines expressions, types, and type environments.

The toplevel loop

The way to interact with OCaml is through a toplevel loop that prompts the user with # and reads a ;;-terminated expression. If this expression is both grammatically correct and type correct, it is evaluated and the result of this evaluation, if there is one, is printed:

<toplevel-expression> ::= <expression>

The toplevel loop is also referred to as a “read-eval-print loop” – “REPL” for short.

Comments

A comment is any sequence of characters that starts with (* and finishes with *). It is skipped by the OCaml lexical analyzer.

Also, comments can be nested, i.e., they can contain other comments.

Integers

OCaml features integers:

<type> ::= int
<expression> ::= <integer>
<integer> ::= 0 | -1 | 1 | -2 | 2 | -3 | 3 | ... | -4611686018427387903 | 4611686018427387903 | -4611686018427387904

The corresponding typing rule reads as follows, where n stands for <integer>:

INT 
G |- n : int

In words,

  • in any type environment G, an integer n has type int.

Furthermore,

  • evaluating an integer yields this integer.

For example, the following interactive session illustrates how to play with numbers:

# 0;;
- : int = 0
# 1;;       (* an implicitly positive integer *)
- : int = 1
# +1;;      (* an explicitly positive integer *)
- : int = 1
# -1;;      (* a negative integer *)
- : int = -1
#

In words:

  • the expression 0 has type int and evaluating it yields 0;
  • the expression 1 has type int and evaluating it yields 1;
  • the expression +1 is read as 1; it has type int and evaluating it yields 1; and
  • the expression -1 has type int and evaluating it yields -1.

Food for thought:

  • what about +0?
  • what about -0?

By default, i.e., off the shelf, 4611686018427387903 and -4611686018427387904 are respectively the largest and the smallest integers that can be represented in OCaml. (For the cognoscenti, exponentiating 2 with 0 yields 1, exponentiating 2 with 1 yields 2, exponentiating 2 with 2 yields 4, exponentiating 2 with 3 yields 8, exponentiating 2 with 10 yields 1024, and exponentiating 2 with 62 yields 4611686018427387904. So the largest value of type int is 2^62 - 1 and the smallest value of type int is -2^62.)

Booleans

OCaml also features Booleans:

<type> ::= int | bool
<expression> ::= <integer> | <boolean>
<integer> ::= 0 | 1 | -1 | 2 | -2 | ...
<boolean> ::= true | false

The corresponding typing rules read as follows:

BOOL_TRUE 
G |- true : bool
BOOL_FALSE 
G |- false : bool

In words,

  • in any type environment G, true has type bool; and
  • in any type environment G, false has type bool.

Furthermore,

  • evaluating a Boolean yields this Boolean.

For example, the following interactive session illustrates how to play with Booleans:

# true;;
- : bool = true
# false;;
- : bool = false
#

In words:

  • the expression true has type bool and evaluating it yields true; and
  • the expression false has type bool and evaluating it yields false.

Reminders:

  • M-p and M-n (read: “meta-p” and “meta-n”) let us navigate in the history of what we have typed so far at the top level.
  • The meta key is the option key on a Mac and the Alt key on a PC.

Characters

OCaml also features characters:

<type> ::= int | bool | char
<expression> ::= <integer> | <boolean> | <character>
<integer> ::= 0 | 1 | -1 | 2 | -2 | ...
<boolean> ::= true | false
<character> ::= one character delimited by two single quotes

The corresponding typing rule reads as follows, where c stands for <character>:

CHAR 
G |- c : char

In words,

  • in any type environment G, a character c has type char.

Furthermore,

  • evaluating a character yields this character.

For example, the following interactive session illustrates how to play with characters:

# 'a';;
- : char = 'a'
# 'b';;
- : char = 'b'
# 'c';;
- : char = 'c'
#

In words:

  • the expression 'a' has type char and evaluating it yields 'a';
  • the expression 'b' has type char and evaluating it yields 'b'; and
  • the expression 'c' has type char and evaluating it yields 'c'.

Interlude

Anton: So the way to write a character in OCaml is to type a letter between two single quotes?

Alfrothul: I guess not just a letter – any character. Let me try:

# 'A';;
- : char = 'A'
# '3';;
- : char = '3'
#

Anton: So, anything we can type in one keystroke is a character? Let me try:

# ' ';;
- : char = ' '
# '.';;
- : char = '.'
# '(';;
- : char = '('
#

Loki: What about the single quote itself?

Alfrothul: Uh-oh:

# ''';;
Characters 0-1:
  ''';;
  ^
Error: Syntax error
#

Mimer: There is an ambiguity here—the single quote character as a delimiter to write OCaml characters, and the single quote character as an OCaml character. As a parsing convention, in OCaml, we quote the single quote character using the backslash character:

# '\'';;
- : char = '\''
#

Dana (logical): For consistency, the backslash character should be quoted using backslash as well:

# '\\';;
- : char = '\\'
#

Mimer: Yup.

Strings

OCaml also features strings:

<type> ::= int | bool | char | string
<expression> ::= <integer> | <boolean> | <character> | <string>
<integer> ::= 0 | 1 | -1 | 2 | -2 | ...
<boolean> ::= true | false
<character> ::= one character delimited by two single quotes
<string> ::= a series of characters delimited by two double quotes

The corresponding typing rule reads as follows, where s stands for <string>:

STRING 
G |- s : string

In words,

  • in any type environment G, a string s has type string.

Furthermore,

  • evaluating a string yields this string.

For example, the following interactive session illustrates how to play with strings:

# "hello world";;
- : string = "hello world"
#

In words:

  • the expression "hello world" has type string and evaluating it yields "hello world".

Interlude

Alfrothul: Let me guess – the way to write a string in OCaml that contains the double quote character is to quote this character with a backslash:

# "\"";;
- : string = "\""
#

Anton: Indeed. And the way to write a string in OCaml that contains the backslash character is to quote it with a backslash:

# "\\";;
- : string = "\\"
#

Loki: What about "\\\\\\"?

Dana: Logically, it is a string that contains three consecutive occurrences of the backslash character. And I guess that we should be able to write the empty string with two consecutive occurrences of the string delimiter, i.e., the double quote character:

# "";;
- : string = ""
#

Pablito: Wait – why does OCaml say that the type of "42" is string:

# "42";;
- : string = "42"
#

Pablito: Shouldn’t it be, I don’t know, int or perhaps even "int" or something?

Alfrothul: Because this is not a pipe, remember?

Anton: "42" is a string that contains two characters that happen to be numerical ones.

Pablito: So you mean that '4' and '2' don’t have type, I don’t know, int or perhaps even 'int'? OCaml says that they have type char.

Anton: Right. They are numerical characters.

Alfrothul: Also, "int" and 'int' are not grammatically correct types.

Conditional expressions

OCaml also features conditional expressions (or again: if-expressions):

<type> ::= int | bool | char | string
<expression> ::= <integer> | <boolean> | <character> | <string> | if <expression> then <expression> else <expression>
<integer> ::= 0 | 1 | -1 | 2 | -2 | ...
<boolean> ::= true | false
<character> ::= one character delimited by two single quotes
<string> ::= a series of characters delimited by two double quotes

The corresponding typing rule reads as follows, where e1, e2, and e3 stand for <expression> and t stands for <type>:

IFG |- e1 : boolG |- e2 : tG |- e3 : t
G |- if e1 then e2 else e3 : t

In words,

  • in any type environment G, the expression if e1 then e2 else e3 has type t whenever in that environment, e1 (the test) has type bool and each of e2 (the consequent) and e3 (the alternative) has type t.

Furthermore,

  • to evaluate if e1 then e2 else e3 in the current environment, we first evaluate e1 in this environment;
    • if evaluating e1 does not terminate, then evaluating if e1 then e2 else e3 does not terminate either;
    • if evaluating e1 raises an error, then evaluating if e1 then e2 else e3 also raises this error;
    • if evaluating e1 in the current environment yields a Boolean, then
      • if this Boolean is true, then evaluating if e1 then e2 else e3 in the current environment reduces to evaluating e2 in this environment, and
      • if this Boolean is false, then evaluating if e1 then e2 else e3 in the current environment reduces to evaluating e3 in this environment.

Conditional expressions are polymorphic (i.e., can have any type) in that as long as the type of their consequent and their alternative are the same, they yield values of this type, no matter what it is.

For example, the following interactive sessions illustrate how to play with conditional expressions:

  • a conditional expression can yield an integer:

    # if true then 1 else 2;;
    - : int = 1
    # if false then 1 else 2;;
    - : int = 2
    #
    

    The corresponding two proof trees read as follows:

       BOOL_TRUE ----------------    INT ------------    INT ------------
                 G |- true : bool        G |- 1 : int        G |- 2 : int
    IF ------------------------------------------------------------------
       G |- if true then 1 else 2 : int
    
    
       BOOL_FALSE -----------------    INT ------------    INT ------------
                  G |- false : bool        G |- 1 : int        G |- 2 : int
    IF --------------------------------------------------------------------
       G |- if false then 1 else 2 : int
    
  • a conditional expression can yield a string:

    # if true then "hello" else "world";;
    - : string = "hello"
    # if false then "hello" else "world";;
    - : string = "world"
    #
    

    The corresponding two proof trees read as follows:

       BOOL_TRUE ----------------    STRING ---------------------    STRING ---------------------
                 G |- true : bool           G |- "hello" : string           G |- "world" : string
    IF ------------------------------------------------------------------------------------------
       G |- if true then "hello" else "world" : string
    
    
       BOOL_FALSE -----------------    STRING ---------------------    STRING ---------------------
                  G |- false : bool           G |- "hello" : string           G |- "world" : string
    IF --------------------------------------------------------------------------------------------
       G |- if false then "hello" else "world" : string
    

Conditional expressions can also be nested, e.g., occur as a test or as a consequent in another conditional expression. For clarity, we add a pair of parentheses around the nested expression:

# if (if true then true else false) then 'a' else 'b';;
- : char = 'a'
# if (if false then true else false) then 'a' else 'b';;
- : char = 'b'
# if true then (if true then 'c' else 'd') else 'e';;
- : char = 'c'
# if true then (if false then 'c' else 'd') else 'e';;
- : char = 'd'
# if false then (if true then 'c' else 'd') else 'e';;
- : char = 'e'
# if false then (if false then 'c' else 'd') else 'e';;
- : char = 'e'
#

When a conditional expression occurs as a test in another conditional expression, its type must be bool, independently of the type of this other conditional expression.

Exercise 02

Play with

  • a conditional expression with a conditional expression as its alternative; and
  • a conditional expression with three conditional expressions, one as its test, one as its consequent, and one as its alternative.

In each case, state what result you expect and verify that OCaml validates your prediction.

Also, draw the proof tree of either of these conditional expressions.

Type errors in OCaml

A key point in Week 02 was the notion of syntax correctness: if an expression is not syntactically (i.e., lexically and grammatically) correct, it is not correct. A key point in Week 03 is the notion of type correctness: if an expression is not type correct, it is not correct.

For a conditional expression to be syntactically correct, its 3 sub-expressions (the test, the consequent, and the alternative) must be syntactically correct. And for it to be type correct, the type of its test must be bool and the types of its consequent and of its alternative must be the same. Otherwise, this expression is not correct. To wit:

# if "true" then 'a' else 'b';;
Characters 3-9:
  if "true" then 'a' else 'b';;
     ^^^^^^
Error: This expression has type string but an expression was expected of type
         bool
# if true then 'a' else -1;;
Characters 22-24:
  if true then 'a' else -1;;
                        ^^
Error: This expression has type int but an expression was expected of type
         char
#

Analysis:

  • in the first interaction, the type of the expression in position of a test is not bool; and
  • in the second interaction, the types of the consequent and of the alternative are not the same.

In both cases, the error message is understandable.

If the expression contains several type errors, the error message is about the first error that is encountered:

# if "true" then 'a' else -1;;
Characters 3-9:
  if "true" then 'a' else -1;;
     ^^^^^^
Error: This expression has type string but an expression was expected of type
         bool
#

Often, though, and that is unfortunate, we don’t understand the error message, which is a cognitive bootstrapping problem because how can we understand what is wrong and why it is wrong when we are still learning what is right?

To wit:

  • In the following interaction, a light-hearted proof rule is expressed as an OCaml expression:

    # if A then why not B else whatever;;
    Characters 3-4:
      if A then why not B else whatever;;
         ^
    Error: The variant type bool has no constructor A
    #
    

    Not knowing about variant types nor about datatype constructors yet, we are not in position to understand this error message. So, patience, until Week 10.

  • In the following interaction, we ask for more trouble with a refinement of light-hearted proof rule:

    # if A then why not B else your guess is as good as mine;;
    Characters 39-41:
      if A then why not B else your guess is as good as mine;;
                                             ^^
    Error: Syntax error
    #
    

    The error message is dang uninformative. It hints that we are incorrectly using as, whatever that is and whatever that means.

  • In the following interaction, we valiantly tell OCaml that we mean your guess is as good as mine to be the alternative of the conditional expression by wrapping it into a pair of parentheses:

    # if A then why not B else (your guess is as good as mine);;
    Characters 40-42:
      if A then why not B else (your guess is as good as mine);;
                                              ^^
    Error: Syntax error: operator expected.
    #
    

    The error message is a bit more specific, but it still refers to something we do not know yet, namely as.

As now can’t reveal the mystery of tomorrow

Stevie Wonder

Pairs

OCaml also features pairs:

<type> ::= int | bool | char | string | <type> * <type>
<expression> ::= <integer> | <boolean> | <character> | <string> | if <expression> then <expression> else <expression> | (<expression>, <expression>)
<integer> ::= 0 | 1 | -1 | 2 | -2 | ...
<boolean> ::= true | false
<character> ::= one character delimited by two single quotes
<string> ::= a series of characters delimited by two double quotes

The corresponding typing rule reads as follows, where e1 and e2 stand for <expression> and t1 and t2 stand for <type>:

PAIRG |- e1 : t1G |- e2 : t2
G |- (e1, e2) : t1 * t2

In words,

  • in any type environment G, the expression (e1, e2) has type t1 * t2 whenever in that environment, e1 (the first component) has type t1 and e2 (the second component) has type t2.

Furthermore,

  • to evaluate (e1, e2), we first evaluate e1 and e2:
    • if either of these first evaluations does not terminate, then evaluating (e1, e2) does not terminate either;
    • if either of these first evaluations raises an error, then evaluating (e1, e2) also raises this error;
    • if e1 and e2 evaluate to v1 and v2, then evaluating (e1, e2) yields a pair that holds v1 and v2.

Pairs too are polymorphic in that given two components of any type t1 and t2, they yield values of type t1 * t2, no matter what t1 and t2 are.

For example, the following interactive sessions illustrate how to play with pairs:

# (1, 2);;
- : int * int = (1, 2)
# ("hello", "world");;
- : string * string = ("hello", "world")
#

The corresponding proof trees read as follows:

     INT ------------    INT ------------
         G |- 1 : int        G |- 2 : int
PAIR ------------------------------------
     G |- (1, 2) : int * int


     STRING ---------------------    STRING ---------------------
            G |- "hello" : string           G |- "world" : string
PAIR ------------------------------------------------------------
     G |- ("hello", "world") : string * string

Interlude

Anton: I guess the two components of a pair can be of distinct types:

# (1, "world");;
- : int * string = (1, "world")
# ("hello", 2);;
- : string * int = ("hello", 2)
#

Alfrothul: Hey, we can even nest pairs:

# ((1, "world"), ("hello", 2));;
- : (int * string) * (string * int) = ((1, "world"), ("hello", 2))
#

Dana: We can even construct pairs conditionally:

# if true then (true, 22) else (false, 33);;
- : bool * int = (true, 22)
#

Anton: And we can prove that this particular expression has this particular type.

Alfrothul: Why don’t you go ahead and do that.

Anton: First of all, we start at the bottom with what we want to check, namely that in any type environment G, the expression if true then (true, 22) else (false, 33) has type bool * int:

G |- if true then (true, 22) else (false, 33) : bool * int

Alfrothul: Oh. There is only one typing rule we can apply, namely IF, where G is instantiated with G, e1 is instantiated with true, e2 is instantiated with (true, 22), e3 is instantiated with (false, 33), and t is instantiated with bool * int:

   G |- true : bool    G |- (true, 22) : bool * int    G |- (false, 33) : bool * int
IF ---------------------------------------------------------------------------------
   G |- if true then (true, 22) else (false, 33) : bool * int

Now what. There are three branches to consider.

Anton: Let’s consider the left branch. There is only one typing rule we can apply, namely BOOL_TRUE, since true is a Boolean:

BOOL_TRUE ----------------
          G |- true : bool    G |- (true, 22) : bool * int    G |- (false, 33) : bool * int
       IF ---------------------------------------------------------------------------------
          G |- if true then (true, 22) else (false, 33) : bool * int

This completes the left branch.

Alfrothul: OK. Let’s consider the branch in the middle. There is only one typing rule we can apply, namely PAIR, where G is instantiated with G, e1 is instantiated with true, e2 is instantiated with 22, t1 is instantiated with bool, and t2 is instantiated with int:

                                   G |- true : bool    G |- 22 : int
BOOL_TRUE ----------------    PAIR ---------------------------------
          G |- true : bool         G |- (true, 22) : bool * int         G |- (false, 33) : bool * int
       IF -------------------------------------------------------------------------------------------
          G |- if true then (true, 22) else (false, 33) : bool * int

Oh, man. Two new branches.

Dana (jumping in): Let’s consider the left branch of the branch in the middle. There is only one typing rule we can apply, namely BOOL_TRUE, since true is a Boolean:

                              BOOL_TRUE ----------------
                                        G |- true : bool    G |- 22 : int
BOOL_TRUE ----------------         PAIR ---------------------------------
          G |- true : bool              G |- (true, 22) : bool * int         G |- (false, 33) : bool * int
       IF ------------------------------------------------------------------------------------------------
          G |- if true then (true, 22) else (false, 33) : bool * int

This completes the left branch of the branch in the middle.

Anton: Let’s consider the right branch of the branch in the middle. There is only one typing rule we can apply, namely INT, since 22 is a integer:

                              BOOL_TRUE ----------------    INT -------------
                                        G |- true : bool        G |- 22 : int
BOOL_TRUE ----------------         PAIR -------------------------------------
          G |- true : bool              G |- (true, 22) : bool * int             G |- (false, 33) : bool * int
       IF ----------------------------------------------------------------------------------------------------
          G |- if true then (true, 22) else (false, 33) : bool * int

This completes the right branch of the branch in the middle.

Alfrothul: Let’s consider the right branch. Again, there is only one typing rule we can apply, namely PAIR, where G is instantiated with G, e1 is instantiated with false, e2 is instantiated with 33, t1 is instantiated with bool, and t2 is instantiated with int:

                              BOOL_TRUE ----------------    INT -------------
                                        G |- true : bool        G |- 22 : int         G |- false : bool    G |- 33 : int
BOOL_TRUE ----------------         PAIR -------------------------------------    PAIR ----------------------------------
          G |- true : bool              G |- (true, 22) : bool * int                  G |- (false, 33) : bool * int
       IF ---------------------------------------------------------------------------------------------------------
          G |- if true then (true, 22) else (false, 33) : bool * int

Two new branches. Again.

Anton: Let’s consider the left branch of the right branch. There is only one typing rule we can apply, namely BOOL_FALSE, since false is a Boolean:

                              BOOL_TRUE ----------------    INT -------------    BOOL_FALSE -----------------
                                        G |- true : bool        G |- 22 : int               G |- false : bool    G |- 33 : int
BOOL_TRUE ----------------         PAIR -------------------------------------          PAIR ----------------------------------
          G |- true : bool              G |- (true, 22) : bool * int                        G |- (false, 33) : bool * int
       IF ---------------------------------------------------------------------------------------------------------
          G |- if true then (true, 22) else (false, 33) : bool * int

This completes the left branch of the right branch.

Alfrothul: Let’s consider the right branch of the right branch. There is only one typing rule we can apply, namely INT, since 33 is a integer:

                              BOOL_TRUE ----------------    INT -------------    BOOL_FALSE -----------------    INT -------------
                                        G |- true : bool        G |- 22 : int               G |- false : bool        G |- 33 : int
BOOL_TRUE ----------------         PAIR -------------------------------------          PAIR --------------------------------------
          G |- true : bool              G |- (true, 22) : bool * int                        G |- (false, 33) : bool * int
       IF ---------------------------------------------------------------------------------------------------------------
          G |- if true then (true, 22) else (false, 33) : bool * int

This completes the right branch of the right branch.

Anton: So the proof tree is complete, which proves that if true then (true, 22) else (false, 33) has type bool * int in any type environment G.

Exercise 03

  1. What is the type of pairs of integers and Booleans? Exhibit a simple OCaml expression with that type.
  2. What is the type of pairs of integers and pairs of integers and Booleans? Exhibit a simple OCaml expression with that type.
  3. What is the type of pairs of integers and pairs of integers and pairs of integers and Booleans? Exhibit a simple OCaml expression with that type.
  4. What is the type of pairs of Booleans and integers? Exhibit a simple OCaml expression with that type.
  5. What is the type of pairs of pairs of Booleans and integers and integers? Exhibit a simple OCaml expression with that type.
  6. What is the type of pairs of pairs of pairs of Booleans and integers and integers and integers? Exhibit a simple OCaml expression with that type.
  7. What is the type of pairs of pairs of integers and Booleans and of pairs of Booleans and integers? Exhibit a simple OCaml expression with that type.

Partial solution for Exercise 03

  1. As per the grammar of types, the type of pairs of integers and Booleans reads int * bool:

    # (2, true);;
    - : int * bool = (2, true)
    #
    

    The abstract-syntax tree of this type reads:

       *
      / \
    int bool
    

    The abstract-syntax tree of this expression reads:

    ( , )
     / \
    2  true
    
  2. As per the grammar of types, the type of pairs of integers and pairs of integers and Booleans reads int * (int * bool), using parentheses for the nested pair:

    # (0, (-1, false));;
    - : int * (int * bool) = (0, (-1, false))
    #
    

    The abstract-syntax tree of this type reads:

       *
      / \
    int  *
        / \
      int bool
    

    The abstract-syntax tree of this expression reads:

    ( , )
     / \
    0  ( , )
        / \
      -1  false
    
  1. As per the grammar of types, the type of pairs of pairs of Booleans and integers and integers reads (bool * int) * int, using parentheses for the nested pair:

    # ((true, 10), -10);;
    - : (bool * int) * int = ((true, 10), -10)
    #
    

    The abstract-syntax tree of this type reads:

          *
         / \
        *  int
       / \
    bool int
    

    The abstract-syntax tree of this expression reads:

         ( , )
          / \
      ( , ) -10
       / \
    true 10
    

Interlude

Mimer: It’s at moments like this that I wish for more punctuation marks.

Victor Borge (thoughtfully): Phonetic ones, even.

Mimer: Mr. Borge! Thank you so much for coming by!

Victor Borge (pragmatic): Let me give you a hand here.

Exercise 03, with more punctuation marks

  1. What is the type of pairs of integers and Booleans? Then, exhibit a simple OCaml expression with that type.
  2. What is the type of pairs of integers and (pairs of integers and Booleans)? And now, exhibit a simple OCaml expression with that type.
  3. What is the type of pairs of integers and (pairs of integers and (pairs of integers and Booleans))? Then – exhibit a simple OCaml expression with that type.
  4. What is the type of pairs of Booleans and integers? And now: exhibit a simple OCaml expression with that type.
  5. What is the type of pairs of (pairs of Booleans and integers) and integers? And then... Exhibit a simple OCaml expression with that type.
  6. What is the type of pairs of (pairs of (pairs of Booleans and integers) and integers) and integers? Now, now, now; exhibit a simple OCaml expression with that type.
  7. What is the type of pairs of (pairs of integers and Booleans) and of (pairs of Booleans and integers)? Exhibit a simple OCaml expression with that type!

Interlude

Victor Borge: Better?

Mimer: Yes, thanks.

Victor Borge: You are most welcome.

Mimer: But the embellishments about exhibiting simple OCaml expressions?

Victor Borge: Let’s read these questions aloud.

Tuples

More generally, OCaml also features tuples, of which pairs are a particular case:

<type> ::= int | bool | char | string | <type> * ... * <type>
<expression> ::= <integer> | <boolean> | <character> | <string> | if <expression> then <expression> else <expression> | (<expression>, ..., <expression>)
<integer> ::= 0 | 1 | -1 | 2 | -2 | ...
<boolean> ::= true | false
<character> ::= one character delimited by two single quotes
<string> ::= a series of characters delimited by two double quotes

The corresponding typing rule reads as follows, where e1, ..., and eN stand for <expression> and t1, ..., and tN stand for <type>:

TUPLEG |- e1 : t1...G |- eN : tN
G |- (e1, ..., eN) : t1 * ... * tN

In words,

  • in any type environment G, the expression (e1, ..., eN) has type t1 * ... * tN whenever in that environment, e1 (the first component) has type t1, e2 (the second component) has type t2,..., and eN (the last component) has type tN.

Furthermore,

  • to evaluate (e1, ..., eN), we first evaluate e1, ..., and eN:
    • if either of these first evaluations does not terminate, then evaluating (e1, ..., eN) does not terminate either;
    • if either of these first evaluations raises an error, then evaluating (e1, ..., eN) also raises this error;
    • if e1, ..., and eN respectively evaluate to v1, ... and vN, then evaluating (e1, ..., eN) yields a tuple that holds v1, and vN.

Like pairs, tuples are polymorphic in that given N components of any type t1, ..., and tN, they yield values of type t1 * ... * tN, no matter what t1, ..., and tN are.

For example, the following interactive session illustrates how to play with tuples:

# (1, 2, 3);;
- : int * int * int = (1, 2, 3)
# (1, 2, 3, 4);;
- : int * int * int * int = (1, 2, 3, 4)
# (1, 2, 3, 4, 5);;
- : int * int * int * int * int = (1, 2, 3, 4, 5)
#

Exercise 04

  1. What is the type of triples of integers, Booleans, and characters? Exhibit an OCaml expression with that type.
  2. What is the type of quadruples of integers, Booleans, characters, and strings? Exhibit an OCaml expression with that type.
  3. Write a tuple of strings that contains the 12 months of the year. What is its type?
  4. Write a tuple of strings that contains the 15 months of the Danish year (see the interlude just below). What is its type?
  5. What is the type of triples of pairs of integers? Exhibit an OCaml expression with that type.
  6. Write down the type of a tuple that contains at least one tuple, and exhibit an OCaml expression with that type.
  7. Write down the type of a tuple that contains at least one tuple that contains at least one tuple, and exhibit an OCaml expression with that type.

Interlude

Henrik Nordbrandt (wistfully): Everywhere else in the world, the year has 12 months.

Dana: Yes?

Henrik Nordbrandt: But in Denmark, the year has 15 months.

Dana: 15 months?

Henrik Nordbrandt: Well yes, look: 1, November; 2, December; 3, January; 4, February; 5, March; 6, April; 7, May; 8, June; 9, July; 10, August; 11, September; 12, October; and then November, November, and November.

Dana (kindly): But November has only 30 days.

Henrik Nordbrandt: I know, right?

The empty tuple

OCaml does not feature a tuple of size 1, but it does feature the empty tuple, which has type unit:

<type> ::= int | bool | char | string | <type> * ... * <type> | unit
<expression> ::= <integer> | <boolean> | <character> | <string> | if <expression> then <expression> else <expression> | ()
UNIT 
G |- () : unit

In words,

  • in any type environment G, the expression () has type unit.

Furthermore,

  • evaluating () yields ().

For example, the following interactive session illustrates how to play with the empty tuple:

# ();;
- : unit = ()
#

Exercise 05

What do the following expressions evaluate to? What is their type? Can they be written more simply?

  • (())
  • ((()))
  • ((), ())
  • ((()), ())
  • ((), (()))
  • ((()), (()))
  • (((()), (())))

Interlude – parentheses in OCaml

Anton: So parentheses in OCaml are either used for tuples...

Alfrothul: ...or to disambiguate nested expressions.

Mimer: You got it.

Dana: And OCaml does not feature a tuple of size 1.

Mimer: That too.

Exercise 06

  1. What is the type of triples of unit values? Exhibit an OCaml expression with that type.
  2. What is the type of pairs of triples of unit values and quadruples of unit values? Exhibit an OCaml expression with that type.

Names and type environments

OCaml also features names (a.k.a. “identifiers” or “variables”), and their type is held in the type environment:

<type_environment> ::= . | (<name> : <type>), <type_environment>
<type> ::= int | bool | char | string | <type> * ... * <type> | unit
<expression> ::= <integer> | <boolean> | <character> | <string> | if <expression> then <expression> else <expression> | (<expression>, ..., <expression>) | () | <name>
<integer> ::= 0 | 1 | -1 | 2 | -2 | ...
<boolean> ::= true | false
<character> ::= one character delimited by two single quotes
<string> ::= a series of characters delimited by two double quotes
<name> ::= a concatenation of characters starting with a lowercase character

The type environment holds a collection of bindings in which names denote types. It is constructed like the proverbial onion:

  • . denotes the empty type environment; and
  • given a name x, a type t, and a type environment G, (x : t), G denotes a type environment that extends G with a new binding in which x denotes t.

So for example,

  • . denotes the empty type environment that contains no bindings of names to types;
  • (c : char), . extends the previous type environment (i.e., .) with a new binding where c is associated to char: in this extended type environment, c has type char;
  • (b : bool), ((c : char), .) extends the previous type environment (i.e., (c : char), .) with a new binding where b is associated to bool: in this extended type environment, b has type bool and c has type char;
  • (i : int), ((b : bool), ((c : char), .)) extends the previous type environment (i.e., (b : bool), ((c : char), .)) with a new binding where i is associated to int: in this extended type environment, i has type int, b has type bool, and c has type char; and
  • (b : string), ((i : int), ((b : bool), ((c : char), .))) extends the previous type environment (i.e., (i : int), ((b : bool), ((c : char), .))) with a new binding where b is associated to string: in this extended type environment, b has type string, i has type int, and c has type char.

The binding of b in the extended type environment just above, i.e., (b : string), ((i : int), ((b : bool), ((c : char), .))), is said to “shadow” (i.e., to hide) the binding of b in the previous type environment just above, i.e., (i : int), ((b : bool), ((c : char), .)). In the previous type environment, b has type bool but in the extended type environment, b has type string.

The infix environment constructor , associates to the right, and therefore we omit parentheses on the right of ,:

  • (b : bool), (c : char), . is short for (b : bool), ((c : char), .),
  • (i : int), (b : bool), (c : char), . is short for (i : int), ((b : bool), ((c : char), .)), and
  • (b : string), (i : int), (b : bool), (c : char), . is short for (b : string), ((i : int), ((b : bool), ((c : char), .))).

Looking up for a binding x : t in a type environment G is carried out incrementally with the following two typing rules, where G stands for <type_environment>, x and x' stand for <name>, and t and t' stand for <type>:

LOOKUP_FOUND 
(x : t), G |- x : t
LOOKUP_NOT_FOUND_YETG |- x : twhere x and x' differ
(x' : t'), G |- x : t

In words,

  • for any name x, any type t, and any type environment G, x has type t in the type environment (x : t), G; and
  • for any type environment G, any name x, any type t, any name x' that is not the same as x, and any type t', if x has type t in G, then x also has type t in the type environment that extends G with (x' : t').

Put otherwise:

  • x has type t in a type environment that starts with (x : t), unconditionally;
  • x has type t in a type environment that starts with (x' : t') and continues with G whenever x' and x differ and x has type t in G.

We use these lookup rules to look up names in environments. So for example,

  1. looking up any name in the empty environment (i.e., .) is undefined,
  2. looking up i in (i : int), (b : bool), (c : char), . yields int,
  3. looking up b in (i : int), (b : bool), (c : char), . yields bool,
  4. looking up c in (i : int), (b : bool), (c : char), . yields char,
  5. looking up x in (i : int), (b : bool), (c : char), . is undefined,
  6. looking up x in (x : int), (x : bool), (x : char), . yields int.

In the last case, the first occurrence of x shadows its other occurrences.

Exercise 07

Justify the examples above by drawing the corresponding proof trees.

Solution for Exercise 07

  1. Let us pick x as a name and let us look up x in ., the empty environment. The root of the tree reads:

    . |- x : yut
    

    for some yet-unspecified type yut.

    We can apply neither LOOKUP_FOUND nor LOOKUP_NOT_FOUND_YET, and therefore we cannot construct a proof tree and specify a type: looking up any name is the empty environment is undefined.

  2. Let us look up i in (i : int), (b : bool), (c : char), .. The root of the tree reads:

    (i : int), (b : bool), (c : char), . |- i : yut
    

    for some yet-unspecified type yut.

    We can (only) apply LOOKUP_FOUND, where x is instantiated with i, t is instantiated with int, G is instantiated with (b : bool), (c : char), ., x is instantiated with i, and t is instantiated with yut:

    LOOKUP_FOUND -----------------------------------------------
                 (i : int), (b : bool), (c : char), . |- i : yut
    

    Since in the application of LOOKUP_FOUND, t is instantiated both with int and with yut, yut is constrained to be int.

    The proof tree is complete. Let us draw it again by replacing yut with int:

    LOOKUP_FOUND -----------------------------------------------
                 (i : int), (b : bool), (c : char), . |- i : int
    

    The existence of this tree proves that in the type environment (i : int), (b : bool), (c : char), ., looking up i yields the type int.

  3. Let us look up b in (i : int), (b : bool), (c : char), .. The root of the tree reads:

    (i : int), (b : bool), (c : char), . |- b : yut
    

    for some yet-unspecified type yut.

    We can (only) apply LOOKUP_NOT_FOUND_YET, where x' is instantiated with i, t' is instantiated with int, G is instantiated with (b : bool), (c : char), ., x is instantiated with b, and t is instantiated with yut:

                         (b : bool), (c : char), . |- b : yut
    LOOKUP_NOT_FOUND_YET -----------------------------------------------
                         (i : int), (b : bool), (c : char), . |- b : yut
    

    We can (only) apply LOOKUP_FOUND, where x is instantiated with b, t is instantiated with bool, G is instantiated with (c : char), ., x is instantiated with b, and t is instantiated with yut:

            LOOKUP_FOUND ------------------------------------
                         (b : bool), (c : char), . |- b : yut
    LOOKUP_NOT_FOUND_YET -----------------------------------------------
                         (i : int), (b : bool), (c : char), . |- b : yut
    

    Since in the application of LOOKUP_FOUND, t is instantiated both with bool and with yut, yut is constrained to be bool.

    The proof tree is complete. Let us draw it again by replacing yut with bool:

            LOOKUP_FOUND -------------------------------------
                         (b : bool), (c : char), . |- b : bool
    LOOKUP_NOT_FOUND_YET ------------------------------------------------
                         (i : int), (b : bool), (c : char), . |- b : bool
    

    The existence of this tree proves that in the type environment (i : int), (b : bool), (c : char), ., looking up b yields the type bool.

  4. Let us look up c in (i : int), (b : bool), (c : char), .. The root of the tree reads:

    (i : int), (b : bool), (c : char), . |- c : yut
    

    for some yet-unspecified type yut.

    We can (only) apply LOOKUP_NOT_FOUND_YET, where x' is instantiated with i, t' is instantiated with int, G is instantiated with (b : bool), (c : char), ., x is instantiated with c, and t is instantiated with yut:

                         (b : bool), (c : char), . |- c : yut
    LOOKUP_NOT_FOUND_YET -----------------------------------------------
                         (i : int), (b : bool), (c : char), . |- c : yut
    

    We can (only) apply LOOKUP_NOT_FOUND_YET, where x' is instantiated with b, t' is instantiated with bool, G is instantiated with (c : char), ., x is instantiated with c, and t is instantiated with yut:

                         (c : char), . |- c : yut
    LOOKUP_NOT_FOUND_YET ------------------------------------
                         (b : bool), (c : char), . |- c : yut
    LOOKUP_NOT_FOUND_YET -----------------------------------------------
                         (i : int), (b : bool), (c : char), . |- c : yut
    

    We can (only) apply LOOKUP_FOUND, where x is instantiated with c, t is instantiated with char, G is instantiated with ., x is instantiated with c, and t is instantiated with yut:

            LOOKUP_FOUND ------------------------
                         (c : char), . |- c : yut
    LOOKUP_NOT_FOUND_YET ------------------------------------
                         (b : bool), (c : char), . |- c : yut
    LOOKUP_NOT_FOUND_YET -----------------------------------------------
                         (i : int), (b : bool), (c : char), . |- c : yut
    

    Since in the application of LOOKUP_FOUND, t is instantiated both with char and with yut, yut is constrained to be char.

    The proof tree is complete. Let us draw it again by replacing yut with char:

            LOOKUP_FOUND -------------------------
                         (c : char), . |- c : char
    LOOKUP_NOT_FOUND_YET -------------------------------------
                         (b : bool), (c : char), . |- c : char
    LOOKUP_NOT_FOUND_YET ------------------------------------------------
                         (i : int), (b : bool), (c : char), . |- c : char
    

    The existence of this tree proves that in the type environment (i : int), (b : bool), (c : char), ., looking up c yields the type char.

  5. Let us look up x in (i : int), (b : bool), (c : char), .. The root of the tree reads:

    (i : int), (b : bool), (c : char), . |- x : yut
    

    for some yet-unspecified type yut.

    We can (only) apply LOOKUP_NOT_FOUND_YET, where x' is instantiated with i, t' is instantiated with int, G is instantiated with (b : bool), (c : char), ., x is instantiated with x, and t is instantiated with yut:

                         (b : bool), (c : char), . |- x : yut
    LOOKUP_NOT_FOUND_YET -----------------------------------------------
                         (i : int), (b : bool), (c : char), . |- x : yut
    

    We can (only) apply LOOKUP_NOT_FOUND_YET, where x' is instantiated with b, t' is instantiated with bool, G is instantiated with (c : char), ., x is instantiated with x, and t is instantiated with yut:

                         (c : char), . |- x : yut
    LOOKUP_NOT_FOUND_YET ------------------------------------
                         (b : bool), (c : char), . |- x : yut
    LOOKUP_NOT_FOUND_YET -----------------------------------------------
                         (i : int), (b : bool), (c : char), . |- x : yut
    

    We can (only) apply LOOKUP_NOT_FOUND_YET, where x' is instantiated with c, t' is instantiated with char, G is instantiated with ., x is instantiated with x, and t is instantiated with yut:

                         . |- x : yut
    LOOKUP_NOT_FOUND_YET ------------------------
                         (c : char), . |- x : yut
    LOOKUP_NOT_FOUND_YET ------------------------------------
                         (b : bool), (c : char), . |- x : yut
    LOOKUP_NOT_FOUND_YET -----------------------------------------------
                         (i : int), (b : bool), (c : char), . |- x : yut
    

    We can apply neither LOOKUP_FOUND nor LOOKUP_NOT_FOUND_YET, and therefore we cannot construct a proof tree and specify a type: looking up for a given name that does not occur in a given environment is undefined.

  6. Let us look up x in (x : int), (x : bool), (x : char), . The root of the tree reads:

    (x : int), (x : bool), (x : char), . |- x : yut
    

    for some yet-unspecified type yut.

    We can (only) apply LOOKUP_FOUND, where x is instantiated with x, t is instantiated with int, G is instantiated with (x : bool), (x : char), ., x is instantiated with x, and t is instantiated with yut:

    LOOKUP_FOUND -----------------------------------------------
                 (x : int), (x : bool), (x : char), . |- x : yut
    

    Since in the application of LOOKUP_FOUND, t is instantiated both with int and with yut, yut is constrained to be int.

    The proof tree is complete. Let us draw it again by replacing yut with int:

    LOOKUP_FOUND -----------------------------------------------
                 (x : int), (x : bool), (x : char), . |- x : int
    

    The existence of this tree proves that in the type environment (x : int), (x : bool), (x : char), ., looking up x yields the type int.

Interlude about environments and onion bulbs

Pablito: Tell me again why an environment is constructed like an onion bulb?

Dana: Sure. The point is that they are constructed alike:

<onion-bulb> ::= bud | (<onion-bulb>)

<environment> ::= . | (<name> : <type>), <environment>

Dana (continuing): An empty environment is like a bud, it contains nothing. And given an onion bulb, we construct a new onion bulb by wrapping a layer around it; here, the layer is represented with a pair of matching parentheses. Likewise, given an environment G, a name x, and a type t, we construct a new environment (x : t), G using the infix constructor , that adds the binding (x : t) in front of G. In this new environment, x has type t.

Pablito: OK...

Dana: Now imagine that on each layer of the onion, there is a name printed.

Pablito: OK?

Dana: Now I give you an onion and I ask you whether your name is printed on one of the layers of this onion. What do you do to answer this question?

Pablito: Well, I suppose it depends whether the onion is a bud or has an outer layer.

Dana: Yes.

Pablito: If it is a bud, then it has no layer and so the answer is no.

Dana: OK.

Pablito: Otherwise it has an outer layer. This layer has a name printed on it. If it is my name, then the answer is yes.

Dana: Indeed.

Pablito: Otherwise I peel out this layer and I repeat the process on this smaller onion, either until I reach a layer with my name printed on it, or until the onion is a bud.

Halcyon (interjecting): But what if the onion is infinite?

Pablito: Infinite?

Halcyon: Well, yes. You keep peeling layers and you will never reach a bud.

Dana: That’s not possible because the onion was constructed.

Pablito: Was constructed?

Dana: Yes. An onion is the result of starting from a bud and adding layers around it, a finite process. To construct an infinite onion, the process would not terminate, and so an infinite onion cannot be constructed.

Pablito: I think I get it—an environment is constructed by successively extending the empty environment with bindings, and names are looked up in an environment by successively checking these bindings.

Dana: Yes.

Pablito: If we reach the empty environment, the name is not bound in the environment, and otherwise we stop at the first occurrence of this name, ignoring the rest of the bindings.

Dana: Yes.

Pablito: So if there are more occurrences of this name in the rest of the bindings, the first occurrence shadows the others.

Dana: Yes.

Pablito: Thank you.

Pile on many more layers
and I’ll be joining you there.

Pink Floyd

Names and (value) environments

Likewise, when a syntactically correct and type-correct expression is evaluated, this evaluation takes place in an environment where names denote (i.e., are bound to) values.

Therefore,

  • evaluating a name in the current (value) environment yields the value denoted by this name in this environment.

Evaluating a name in the current environment is carried out by looking it up in this environment.

Predeclared names

OCaml features names that are predeclared in the initial environment. For example, min_int and max_int exist and denote the smallest and the largest representable integers:

# min_int;;
- : int = -4611686018427387904
# max_int;;
- : int = 4611686018427387903
#

These names are bound in OCaml’s initial environment.

Declaring a global name

OCaml provides language support to declare our own variables to name the result of evaluating expressions at the toplevel and extend the current environment with the binding of these names:

<toplevel-expression> ::= <expression> | let <formal> = <definiens>
<formal> ::= <name>
<definiens> ::= <expression>
<name> ::= a concatenation of characters starting with a lowercase character

When let x = d is evaluated in the current environment, d is evaluated in this environment, yielding the value v of type t if this evaluation completes. The subsequent iteration of the toplevel loop takes place with an extended type environment where x is bound the type t, and an extended value environment where x is bound to v.

The following interactive session illustrates how to introduce global names and use them in the ensuing interactions. At every iteration of the toplevel loop, the system either prints

  • a hyphen if the toplevel expression is an expression,
  • the type of the value obtained by evaluating this expression (prefixed with :), and
  • this value (prefixed with =)

or

  • the newly declared name (prefixed with val) if the toplevel expression is a global let-expression,
  • the type of the value obtained by evaluating the definiens of this global let-expression (prefixed with :), and
  • this value (prefixed with =)

unless evaluating the toplevel expression diverges or ends with an error message.

To wit:

  • at first, x does not denote anything:

    # x;;
    Characters 0-1:
      x;;
      ^
    Error: Unbound value x
    #
    

    in effect, the initial type environment (let’s call it Gt) and the corresponding initial (value) environment (let’s call it Gv) do not contain any binding for x;

  • let us declare the name x of type int denoting the value 1:

    # let x = 1;;
    val x : int = 1
    #
    

    in effect, the current type environment is now (x : int), Gt and the current environment now extends Gv with the binding of x to 1; if we were to define the value environment as we did for the type environment:

    <value_environment> ::= . | (<name> = <value>), <value_environment>

    for a suitable non-terminal <value>, the extension of Gv with the binding of x to 1 would read (x = 1), Gv;

  • then we can use x to access the value it denotes:

    # x;;
    - : int = 1
    #
    
  • we can also re-declare x to denote another value:

    # let x = 100;;
    val x : int = 100
    #
    

    in effect, the current type environment is now (x : int), (x : int), Gt and the current environment is now (x = 100), (x = 1), Gv; this new value environment extends the extension of Gv with the binding of x to 100, which shadows the previous binding of x;

  • we can still use x to access the value it denotes:

    # x;;
    - : int = 100
    #
    
  • we can also re-declare x to denote a value of another type:

    # let x = "world";;
    val x : string = "world"
    #
    

    in effect, the current type environment is now (x : string), (x : int), (x : int), Gt and the current environment is now (x = "world"), (x = 100), (x = 1), Gv; this new value environment extends the extension of the extension of Gv with the binding x to "world", which shadows the previous bindings of x;

  • again, we can use this name:

    # x;;
    - : string = "world"
    #
    
  • if an error occurs when the definiens is evaluated, no naming takes place:

    # say_what;;
    Characters 0-8:
      say_what;;
      ^^^^^^^^
    Error: Unbound value say_what
    # let say_what = if 0 then 1 else -1;;
    Characters 18-19:
      let say_what = if 0 then 1 else -1;;
                        ^
    Error: This expression has type int but an expression was expected of type
             bool
    # say_what;;
    Characters 0-8:
      say_what;;
      ^^^^^^^^
    Error: Unbound value say_what
    #
    

    in effect, the current type environment is still (x : string), (x : int), (x : int), Gt and the current environment has not changed either.

Functions

OCaml also features functions and function applications:

<type_environment> ::= . | (<name> : <type>), <type_environment>
<type> ::= int | bool | char | string | <type> * ... * <type> | unit | <type> -> <type>
<expression> ::= <integer> | <boolean> | <character> | <string> | if <expression> then <expression> else <expression> | (<expression>, ..., <expression>) | <name> | fun <formal> -> <expression> | <expression> <expression>
<integer> ::= 0 | 1 | -1 | 2 | -2 | ...
<boolean> ::= true | false
<character> ::= one character delimited by two single quotes
<string> ::= a series of characters delimited by two double quotes

where <name> and <formal> were defined in the section about declaring a global name.

The corresponding typing rules read as follows, where G stands for <type_environment>, x1 stands for <name>, e0, e1, and e2 stand for <expression>, and t1 and t2 stand for <type>:

FUN(x1 : t1), G |- e2 : t2
G |- fun x1 -> e2 : t1 -> t2
APPG |- e0 : t1 -> t2G |- e1 : t1
G |- e0 e1 : t2

In words,

  • in any type environment G, the expression fun x1 -> e2 has type t1 -> t2 whenever in that environment extended with the binding of x1 to t1, e2 has type t2; and
  • in any type environment G where the expression e0 has type t1 -> t2 and where e1 has type t1, the expression e0 e1 has type t2.

Furthermore,

  • evaluating fun x1 -> e2 in an environment yields a function; and
  • to evaluate e0 e1 in an environment, we first evaluate e0 and e1 in this environment:
    • if either of these first evaluations does not terminate, then evaluating e0 e1 does not terminate;
    • if either of these first evaluations raises an error, then evaluating e0 e1 also raises this error; and
    • if e0 and e1 evaluate to v0 and v1, then since OCaml is statically typed, v0 denotes a function; evaluating e0 e1 reduces to applying this function to v1.

Like conditional expressions and tuples, these two syntax constructs are polymorphic.

Vocabulary:

  • The expression fun x -> e is often referred to as a lambda-abstraction (or less often as a function abstraction), and accordingly fun is often read as “lambda”. Also, e is often referred to as the body of this lambda-expression while x is said to be its formal parameter.

  • The expression e0 e1 is referred to as a function application (or often just as an application), and e1 as the actual parameter.

  • When a lambda-abstraction such as fun x -> x, the identity function, is applied to an actual parameter such as 33, its body, i.e., x, is evaluated in an environment where x denotes 33, which is the topic of the next sections:

    # (fun x -> x) 33;;
    - : int = 33
    

    That (fun x -> x) 33 has type int is proved using the following proof tree, where Gv denotes any value environment:

    LOOKUP_FOUND ------------------------
                 (x : int), Gv |- x : int
             FUN -----------------------------    INT --------------
                 Gv |- fun x -> x : int -> int        Gv |- 33 : int
             APP ---------------------------------------------------
                 Gv |- (fun x -> x) 33 : int
    

    (See Solution for Exercise 10 below for a step-by-step construction of such a proof tree.)

    Likewise, the lambda-abstraction fun b -> if b then "hello" else "world" has type bool -> string; applying it to true yields "hello" since its body if b then "hello" else "world" is evaluated in an environment where b denotes true, and applying it to false yields "world" since its body if b then "hello" else "world" is evaluated in an environment where b denotes false:

    # fun b -> if b then "hello" else "world";;
    - : bool -> string = <fun>
    # (fun b -> if b then "hello" else "world") true;;
    - : string = "hello"
    # (fun b -> if b then "hello" else "world") false;;
    - : string = "world"
    #
    

    Concretely:

    • that fun b -> if b then "hello" else "world" has type bool -> string is proved using the following proof tree, where Gv denotes any value environment:

      LOOKUP_FOUND --------------------------    STRING ----------------------------------    STRING ----------------------------------
                   (b : bool), Gv |- b : bool           (b : bool), Gv |- "hello" : string           (b : bool), Gv |- "world" : string
                IF ---------------------------------------------------------------------------------------------------------------------
                   (b : bool), Gv |- if b then "hello" else "world" : string
               FUN --------------------------------------------------------------
                   Gv |- fun b -> if b then "hello" else "world" : bool -> string
      
    • that (fun b -> if b then "hello" else "world") true has type string is proved using the following proof tree, where Gv denotes any value environment:

          ...see just above...
      FUN --------------------------------------------------------------    BOOL_TRUE -----------------
          Gv |- fun b -> if b then "hello" else "world" : bool -> string              Gv |- true : bool
      APP ---------------------------------------------------------------------------------------------
          Gv |- (fun b -> if b then "hello" else "world") true : string
      
    • and that (fun b -> if b then "hello" else "world") false has type string is proved using the following proof tree, where Gv denotes any value environment:

          ...see above...
      FUN --------------------------------------------------------------    BOOL_FALSE ------------------
          Gv |- fun b -> if b then "hello" else "world" : bool -> string               Gv |- false : bool
      APP -----------------------------------------------------------------------------------------------
          Gv |- (fun b -> if b then "hello" else "world") false : string
      

    (See Solution for Exercise 10 below for a step-by-step construction of such proof trees.)

Predeclared functions

OCaml comes with a mass of pre-existing functions such as not, which has type bool -> bool and negates its argument. Here is not at work:

# not;;
- : bool -> bool = <fun>
# not true;;
- : bool = false
# not false;;
- : bool = true
# not (not true);;
- : bool = true
# not (not false);;
- : bool = false
#

Many of OCaml’s pre-existing functions are declared separately in libraries. For example, definitions that pertain to strings are declared in the library for strings, String. Take the pre-existing function for computing the length of a string. This function

  • has type string -> int,
  • is declared in the library for strings, and
  • is named length in this library (and String.length outside this library; we will revisit this dot notation when we define our own libraries, later in the semester).

Here is String.length at work:

# String.length;;
- : string -> int = <fun>
# String.length "";;
- : int = 0
# String.length "a";;
- : int = 1
# String.length "ab";;
- : int = 2
# String.length "abc";;
- : int = 3
#

In words, String.length maps a string to its number of characters.

Express interlude

Alfrothul: Let me check something:

# String.length "\"";;
- : int = 1
#

Alfrothul: Yup. One character. The backslash is purely notational.

Dana: Let me check something too:

# String.length "\\\\\\";;
- : int = 3
#

Dana: Right. Three characters. Good.

Functions that return functions

Grammatically, nothing prevents us from writing functions that return functions, and then

  • applying such functions to a first argument, and

  • applying the result to a second argument:

    # ((fun x -> fun y -> x) 1) 2;;
    - : int = 1
    # ((fun x -> fun y -> y) 1) 2;;
    - : int = 2
    #
    

Concretely:

  • that ((fun x -> fun y -> x) 1) 2 has type int is proved with the following proof tree:

            LOOKUP_FOUND ------------------------
                         (x : int), Gv |- x : int
    LOOKUP_NOT_FOUND_YET -----------------------------------
                         (y : int), (x : int), Gv |- x : int
                     FUN ----------------------------------------
                         (x : int), Gv |- fun y -> x : int -> int
                     FUN -----------------------------------------------    INT -------------
                         Gv |- fun x -> fun y -> x : int -> (int -> int)        Gv |- 1 : int
                     APP --------------------------------------------------------------------    INT -------------
                         Gv |- (fun x -> fun y -> x) 1 : int -> int                                  Gv |- 2 : int
                     APP -----------------------------------------------------------------------------------------
                         Gv |- ((fun x -> fun y -> x) 1) 2 : int
    
  • and that ((fun x -> fun y -> y) 1) 2 also has type int is proved with the following proof tree:

    LOOKUP_FOUND -----------------------------------
                 (y : int), (x : int), Gv |- y : int
             FUN ----------------------------------------
                 (x : int), Gv |- fun y -> y : int -> int
             FUN -----------------------------------------------    INT -------------
                 Gv |- fun x -> fun y -> y : int -> (int -> int)        Gv |- 1 : int
             APP --------------------------------------------------------------------    INT -------------
                 Gv |- (fun x -> fun y -> y) 1 : int -> int                                  Gv |- 2 : int
             APP -----------------------------------------------------------------------------------------
                 Gv |- ((fun x -> fun y -> y) 1) 2 : int
    

(See Solution for Exercise 10 below for a step-by-step construction of such proof trees.)

Actually, function applications associate to the left, so the outermost parentheses are unnecessary:

# (fun x -> fun y -> x) 1 2;;
- : int = 1
# (fun x -> fun y -> y) 1 2;;
- : int = 2
#

Correspondingly, the function type associates to the right, so e.g., in int -> (int -> int), the parentheses are unnecessary and this type is written int -> int -> int.

As it turns out, OCaml comes with a plethora of such predeclared functions, as described next.

Syntactic sugar for functions that return functions

In practice, it is too verbose to write fun x -> fun y -> ... so this verbosity is alleviated with the short hand fun x y -> ...:

# (fun x -> fun y -> x) 1 10;;
- : int = 1
# (fun x y -> x) 1 10;;
- : int = 1
# (fun x -> fun y -> y) 1 10;;
- : int = 10
# (fun x y -> y) 1 10;;
- : int = 10
#

Such a short hand is referred to as “syntactic sugar” since Peter Landin.

Predeclared functions that return functions: integer addition

Take the addition function, for example. OCaml provides the standard infix notation for it in the form of a + operator:

# 1 + 10;;
- : int = 11
#

This infix notation is left associative in that in, e.g., (1 + 10) + 100 and 1 + 10 + 100 are parsed into the same abstract-syntax tree:

# (1 + 10) + 100;;
- : int = 111
# 1 + 10 + 100;;
- : int = 111
#

What about right associativity, one might ask:

# 1 + (10 + 100);;
- : int = 111
#

The expression 1 + (10 + 100) is parsed into another abstract-syntax tree. And since addition is associative, both of (1 + 10) + 100 and 1 + (10 + 100) evaluate to the same result.

That being said, infix operators are only syntactic sugar, i.e., a notational convenience. The truth of the matter is that addition is implemented with a function named add which is defined in a library (i.e., a collection of definitions that are grouped separately) named Int. The type of Int.add is int -> int -> int and

  • 1 + 10 is parsed as Int.add 1 10,
  • (1 + 10) + 100 is parsed as Int.add (Int.add 1 10) 100,
  • 1 + 10 + 100 is also parsed as Int.add (Int.add 1 10) 100, and
  • 1 + (10 + 100) is parsed as Int.add 1 (Int.add 10 100).

To wit:

# Int.add 1 10;;
- : int = 11
# Int.add (Int.add 1 10) 100;;
- : int = 111
# Int.add (Int.add 1 10) 100;;
- : int = 111
# Int.add 1 (Int.add 10 100);;
- : int = 111
#

As such, the addition function is an example of a function that returns another function:

# Int.add;;
- : int -> int -> int = <fun>
# Int.add 1;;
- : int -> int = <fun>
# Int.add 1 10;;
- : int = 11
#

More syntactic sugar: for convenience, (+) is read as Int.add:

# (+);;
- : int -> int -> int = <fun>
# (+) 1;;
- : int -> int = <fun>
# (+) 1 10;;
- : int = 11
#

In other words, there is no need for a specialized typing rule such as

ADDG |- e1 : intG |- e2 : int
G |- e1 + e2 : int

because OCaml’s initial type environment binds Int.add to int -> int -> int and the parser desugars an expression such as e1 + e2 into Int.add e1 e2.

Another predeclared function that returns a function: integer subtraction

Regarding the subtraction function, OCaml provides the standard infix notation for it in the form of a - operator:

# 10 - 1;;
- : int = 9
#

This infix notation is also left associative in that in, e.g., (100 - 10) - 1 and 100 - 10 - 1 are parsed into the same abstract-syntax tree:

# 100 - 10 - 1;;
- : int = 89
# (100 - 10) - 1;;
- : int = 89
#

Unlike addition, subtraction is not associative, and therefore not only are (100 - 10) - 1 and 100 - (10 - 1) not parsed into the same abstract-syntax tree, they do not evaluate to the same result:

# 100 - (10 - 1);;
- : int = 91
#
Anton: So syntax and semantics are not the same at all.
Mimer: Yes. Syntax is about how things are written and semantics about what these things mean.
Anton: So the infix operators for addition and subtraction...
Alfrothul: ...are not the same as the addition and the subtraction functions.
Mimer: Yes. Infix operations are desugared into nested applications of library functions.

Subtraction is implemented with a function named sub which is defined in the Int library. The type of Int.sub is int -> int -> int and

  • 10 - 1 is parsed as Int.sub 10 1,
  • (100 - 10) - 1 is parsed as Int.sub (Int.sub 100 10) 1,
  • 100 - 10 - 1 is also parsed as Int.sub (Int.sub 100 10) 1, and
  • 100 - (10 - 1) is parsed as Int.sub 100 (Int.sub 10 1).

More syntactic sugar: for convenience, (-) is read as Int.sub:

# (-);;
- : int -> int -> int = <fun>
#

More predeclared functions that return functions

With the convenience of infix notation, OCaml provides a wealth of functions that return functions:

  • multiplication (Int.mul):

    # 2 * 3;;
    - : int = 6
    #
    

    N.B.:

    • Syntax: the infix operator * is left-associative.
    • Semantics: multiplication is associative (i.e., both left- and right-associative).
  • quotient (Int.div):

    # (/);;
    - : int -> int -> int = <fun>
    # 4 / 2;;
    - : int = 2
    # 5 / 2;;
    - : int = 2
    # 1 / 0;;
    Exception: Division_by_zero.
    #
    

    N.B.:

    • Syntax: the infix operator / is left-associative.
    • Semantics: division is not associative.

Interlude about errors

Alfrothul: Look, an error!

Anton: You mean, the division by 0 just above?

Alfrothul: Yes!

Anton: Well, yes. Dividing any number by 0 is undefined, so it is good that OCaml is not making an exception here.

Loki: Actually, OCaml is taking an exception to that, or so the error message says.

Dana: Guys, you are not seeing this. Most descriptions of how OCaml syntactic constructs are evaluated say “if evaluating blah raises an error” one way or another.

Alfrothul: Right. And this is the first time that we encounter a concrete situation where an error is raised.

Halcyon: Ah, I see. We can now exercise our critical sense and test the narrative:

# if 1 / 0 then "hello" else "world";;
Characters 3-8:
  if 1 / 0 then "hello" else "world";;
     ^^^^^
Error: This expression has type int but an expression was expected of type
         bool
#

Pablito: Hum. This is not the error I was expecting, but an error it is.

Anton: It’s a type error.

Pablito: This error needs to be fixed so that the correct error can take place:

# if 1 / 0 = 0 then "hello" else "world";;
Exception: Division_by_zero.
#

Alfrothul: All in a day’s work. Evaluating the test raised an error, and the whole evaluation stopped.

Halcyon: OK. I guess this matters if there are errors in your programs. Can we proceed now?

Even more predeclared functions that return functions

So, continuing—with the convenience of infix notation, OCaml provides a wealth of functions that return functions:

  • modulo (Int.rem):

    # (mod);;
    - : int -> int -> int = <fun>
    # 12 mod 5;;
    - : int = 2
    #
    

    N.B.:

    • Syntax: the infix operator mod is left-associative.
    • Semantics: the modulo operation is not associative.
  • comparison of two integers:

    # (2 < 3, 3 < 3, 3 < 2);;
    - : bool * bool * bool = (true, false, false)
    # (2 <= 3, 3 <= 3, 3 <= 2);;
    - : bool * bool * bool = (true, true, false)
    # (2 >= 3, 3 >= 3, 3 >= 2);;
    - : bool * bool * bool = (false, true, true)
    # (2 > 3, 3 > 3, 3 > 2);;
    - : bool * bool * bool = (false, false, true)
    #
    

    Food for thought: is comparison associative and does this question make sense?

  • comparison of two characters, using the alphabetic order:

    # ('a' < 'b', 'b' < 'b', 'b' < 'a');;
    - : bool * bool * bool = (true, false, false)
    # ('a' <= 'b', 'b' <= 'b', 'b' <= 'a');;
    - : bool * bool * bool = (true, true, false)
    # ('a' >= 'b', 'b' >= 'b', 'b' >= 'a');;
    - : bool * bool * bool = (false, true, true)
    # ('a' > 'b', 'b' > 'b', 'b' > 'a');;
    - : bool * bool * bool = (false, false, true)
    #
    
  • comparison of two strings, using the lexicographic order of the English dictionary:

    # ("a" < "b", "ab" < "ac", "ab" < "abz");;
    - : bool * bool * bool = (true, true, true)
    #
    
  • comparison of two booleans, where false is smaller than true:

    # (false < true, true < true, true < false);;
    - : bool * bool * bool = (true, false, false)
    # (false <= true, true <= true, true <= false);;
    - : bool * bool * bool = (true, true, false)
    # (false >= true, true >= true, true >= false);;
    - : bool * bool * bool = (false, true, true)
    # (false > true, true > true, true > false);;
    - : bool * bool * bool = (false, false, true)
    #
    
  • boolean conjunction:

    # (&&);;
    - : bool -> bool -> bool = <fun>
    # (true && true, true && false, false && true, false && false);;
    - : bool * bool * bool * bool = (true, false, false, false)
    #
    

    (Reminder: the conjunction of two Booleans is true if both of these Booleans are true; otherwise it is false.)

    N.B.:

    • Syntax: the infix operator && is left-associative.
    • Semantics: boolean conjunction is associative (i.e., both left- and right-associative).
  • boolean disjunction:

    # (||);;
    - : bool -> bool -> bool = <fun>
    # (true || true, true || false, false || true, false || false);;
    - : bool * bool * bool * bool = (true, true, true, false)
    #
    

    (Reminder: the disjunction of two Booleans is false if both of these Booleans are false; otherwise it is true.)

    N.B.:

    • Syntax: the infix operator || is left-associative.
    • Semantics: boolean disjunction is associative (i.e., both left- and right-associative).
  • string concatenation (String.concat):

    # (^);;
    - : string -> string -> string = <fun>
    # "hello" ^ " " ^ "world";;
    - : string = "hello world"
    #
    

    N.B.:

    • Syntax: the infix operator ^ is left-associative.
    • Semantics: string concatenation is associative (i.e., both left- and right-associative).

OCaml also features functions that return the smallest or the greatest of their two arguments:

  • minimum of two integers:

    # (min 2 5, min 6 3);;
    - : int * int = (2, 3)
    #
    
  • maximum of two integers:

    # (max 2 5, max 6 3);;
    - : int * int = (5, 6)
    #
    
  • minimum of two characters:

    # (min 'c' 'f', min 'g' 'd');;
    - : char * char = ('c', 'd')
    #
    
  • maximum of two characters:

    # (max 'c' 'f', max 'g' 'd');;
    - : char * char = ('f', 'g')
    #
    

And the String library function features a function to index a string:

# String.get;;
- : string -> int -> char = <fun>
# String.get "abc" 0;;
- : char = 'a'
# String.get "abc" 1;;
- : char = 'b'
# String.get "abc" 2;;
- : char = 'c'
# String.get "abc" 3;;
Exception: Invalid_argument "index out of bounds".
#

In words: given a string of length n and a non-negative integer i (the “index”) such that 0 <= i < n, String.get returns the i+1th character of the string, i.e., the first character if the index is 0, the second character if the index is 1, etc., reflecting the fact that 0 is the first natural number, 1 is the second natural number, etc.

Interlude about errors

Alfrothul: Look, another error!

Anton: You mean, the indexing out of bounds just above?

Alfrothul: Yes. Since the length of "abc" is 3, its bounds are 0 and 2. More generally, if the length of the given string is n, its bounds are 0 and n-1.

Anton: But what if the length of the given string is 0? Its bounds are 0 and -1?

Alfrothul: -1 doesn’t make sense. Surely OCaml raises an error. Let me try:

# String.get "" 0;;
Exception: Invalid_argument "index out of bounds".
#

Anton: Well, if the string is empty, it contains no characters, and so none can be indexed.

Alfrothul: And the set of non-negative integers between 0 and -1 is empty.

Anton: OK.

Interlude about corner cases

Alfrothul: Now that we are here, why not check something.

Anton: You mean a string with a quoted character inside, right?

Alfrothul: Yup:

# String.get "\"" 0;;
- : char = '"'
# String.get "\\" 0;;
- : char = '\\'
#

Alfrothul and Anton: Yay!

Loki: So you can both make a \ disappear and you can keep it too. Can you make one appear?

Dana: Sure thing:

# String.get "'" 0;;
- : char = '\''
#

Exercise 08

Can you visualize the multiplication function hidden behind the infix notation *, as we did just above for addition (+), subtraction (-), quotient (/), modulo (mod), etc.? For example:

# 1 * 2;;
- : int = 2
# 1 * 2 * 3;;
- : int = 6
# 1 * 2 * 3 * 4;;
- : int = 24
# 1 * 2 * 3 * 4 * 5;;
- : int = 120
#
Loki: No comment.

Exercise 09

This exercise is a followup to Exercise 08.

After typing (*);; at the toplevel, type *)33;;. What happens? Why does it happen?

Shouldn’t you write ( * ) instead?

Analysis of Exercises 08 and 09

The point of these exercises is the ambiguity of (* and *) as comment delimiters and of (*) as parenthesizing the infix operator *:

  1. (* ) is ambiguous – is it the beginning of a comment or the multiplication function?

    The designers of OCaml made a choice:

    # (*) *);;
    Characters 0-3:
      (*) *);;
      ^^^
    Warning 1: this is the start of a comment.
    #
    
  2. ( *) is ambiguous – is it the end of a comment or the multiplication function?

    The designers of OCaml made a choice:

    # ( *);;
    Characters 2-4:
      ( *);;
        ^^
    Warning 2: this is not the end of a comment.
    - : int -> int -> int = <fun>
    #
    

Artistic interlude

Pablito (fresh out of solving Exercise 09): See my nice ASCII art piece?:

+-------------------------------------+
|                                     |
|                                     |
|                                     |
|              ( *);;                 |
|                                     |
|                                     |
|                                     |
|  This is not the end of a comment.  |
+-------------------------------------+

Alfrothul (admirative): Magritte lives.

Anton: You can say that again.

Alfrothul: Er... Sure. Magritte lives.

Dana: You guys are so literal sometimes.

Ground equality

OCaml also provides a family of equality functions for each of the ground types (integers, Booleans, characters, and strings), each noted with the infix operator =:

# 1 = 1;;
- : bool = true
# 1 = 2;;
- : bool = false
# true = true;;
- : bool = true
# 'a' = 'b';;
- : bool = false
# "hello" = "world";;
- : bool = false
#

Such Boolean-valued functions are called “predicates”.

Accordingly, the equality predicate comes with a family of typing rules:

equal_intG |- e1 : intG |- e2 : int
G |- e1 = e2 : bool
equal_boolG |- e1 : boolG |- e2 : bool
G |- e1 = e2 : bool
equal_charG |- e1 : charG |- e2 : char
G |- e1 = e2 : bool
equal_stringG |- e1 : stringG |- e2 : string
G |- e1 = e2 : bool

Negated equality, i.e., fun x y -> not (x = y), is noted with the infix operator <> and comes with a similar family of typing rules:

# 1 <> 1;;
- : bool = false
# 1 <> 2;;
- : bool = true
# true <> true;;
- : bool = false
# 'a' <> 'b';;
- : bool = true
# "hello" <> "world";;
- : bool = true
#

Functions as parameterized expressions (a post-rationalization)

Say that we need to evaluate a series of expressions that only differ in part, e.g., to start enumerating the sequence of odd numbers:

# 2 * 0 + 1;;
- : int = 1
# 2 * 1 + 1;;
- : int = 3
# 2 * 2 + 1;;
- : int = 5
# 2 * 3 + 1;;
- : int = 7
# 2 * 4 + 1;;
- : int = 9
# 2 * 5 + 1;;
- : int = 11
#

The part that differs is 0, 1, 2, 3, 4, and 5. The part that remains the same is the template 2 * ... + 1, often referred to as a delimited context (an expression with a hole) and noted as [2 * [.] + 1] where the outer brackets delimit the outside of the context and the inside brackets mark the hole in the context. Filling this hole with 0, for example, yields the expression 2 * 0 + 1.

OCaml provides language support to represent this context: as a function, noted here fun x -> 2 * x + 1 where the hole is abstracted with an identifier that is declared as the formal parameter of the function. Applying this function is expressed by juxtaposing the function and its argument (a.k.a. its actual parameter):

# (fun x -> 2 * x + 1) 0;;
- : int = 1
# (fun x -> 2 * x + 1) 1;;
- : int = 3
# (fun x -> 2 * x + 1) 2;;
- : int = 5
# (fun x -> 2 * x + 1) 3;;
- : int = 7
# (fun x -> 2 * x + 1) 4;;
- : int = 9
# (fun x -> 2 * x + 1) 5;;
- : int = 11
#

In each of these interactions, x successively denotes 0, 1, ..., and 5 when the expression 2 * x + 1 is evaluated.

This way, in the expression that needs to be repeatedly evaluated (here: 2 * x + 1), the unchanging parts (here, the delimited context [2 * [.] + 1]) are factored away in the body of the function.

Exercise 10

Assuming the following typing rule

MULG |- e1 : intG |- e2 : int
G |- e1 * e2 : int

and using the typing rule for addition mentioned above, i.e.,

ADDG |- e1 : intG |- e2 : int
G |- e1 + e2 : int

find the type of the expression fun x -> 2 * x + 1, knowing that infix multiplication takes precedence over infix addition, i.e., the expression 2 * x + 1 actually reads (2 * x) + 1.

Solution for Exercise 10

First of all, let us consult OCaml, just because it’s there and we can:

# fun x -> 2 * x + 1;;
- : int -> int = <fun>
#

Now all we need to do is construct a proof tree to the effect that the term fun x -> 2 * x + 1, or again to be precise, fun x -> (2 * x) + 1 has type int -> int in any type environment G:

  • Let us start at the root of the tree:

    G |- fun x -> (2 * x) + 1 : int -> int
    
  • There is only one typing rule we can apply, namely FUN, where G is instantiated with the current type environment, i.e., G, x is instantiated with the formal parameter of the current function abstraction, i.e., x, e is instantiated with the body of the current function abstraction, i.e., (2 * x) + 1, t1 is instantiated with the domain of the current type (namely int -> int), i.e., int, and t2 is instantiated with the codomain of the current type (namely int -> int), i.e., int:

        (x : int), G |- (2 * x) + 1 : int
    FUN ------------------------------------
        G |- fun x -> (2 * x) + 1 : int -> int
    
  • There is only one typing rule we can apply, namely ADD, where G is instantiated with the current type environment, i.e., (x : int), G, e1 is instantiated with the first operand in the current expression, i.e., (2 * x), and e2 is instantiated with the second operand in the current expression, i.e., 1:

        (x : int), G |- 2 * x : int    (x : int), G |- 1 : int
    ADD ------------------------------------------------------
        (x : int), G |- (2 * x) + 1 : int
    FUN ------------------------------------
        G |- fun x -> (2 * x) + 1 : int -> int
    
  • Let’s consider the left branch. There is only one typing rule we can apply, namely MUL, where G is instantiated with the current type environment, i.e., (x : int), G, e1 is instantiated with the first operand in the current expression, i.e., 2, and e2 is instantiated with the second operand in the current expression, i.e., x:

        (x : int), G |- 2 : int    (x : int), G |- x : int
    MUL --------------------------------------------------
        (x : int), G |- 2 * x : int                           (x : int), G |- 1 : int
    ADD -----------------------------------------------------------------------------
        (x : int), G |- (2 * x) + 1 : int
    FUN ------------------------------------
        G |- fun x -> (2 * x) + 1 : int -> int
    
  • Let’s consider the left branch of the left branch. There is only one typing rule we can apply, namely INT (where G is instantiated with (x : int), G and n is instantiated with 2), since 2 is an integer:

    INT -----------------------
        (x : int), G |- 2 : int    (x : int), G |- x : int
    MUL --------------------------------------------------
        (x : int), G |- 2 * x : int                           (x : int), G |- 1 : int
    ADD -----------------------------------------------------------------------------
        (x : int), G |- (2 * x) + 1 : int
    FUN ------------------------------------
        G |- fun x -> (2 * x) + 1 : int -> int
    
This completes the left branch of the left branch.
  • Let’s consider the right branch of the left branch. There is only one typing rule we can apply, namely LOOKUP_FOUND, where x is instantiated with the first identifier in the current environment, i.e., x, t is instantiated with the type of the first identifier in the current environment, i.e., int, G is instantiated with the rest of the type environment, i.e., G, x is instantiated with the identifier to look up, i.e., x, and t is instantiated with the current type, i.e., int:

    INT -----------------------    LOOKUP_FOUND -----------------------
        (x : int), G |- 2 : int                 (x : int), G |- x : int
    MUL ---------------------------------------------------------------
        (x : int), G |- 2 * x : int                                        (x : int), G |- 1 : int
    ADD ------------------------------------------------------------------------------------------
        (x : int), G |- (2 * x) + 1 : int
    FUN ------------------------------------
        G |- fun x -> (2 * x) + 1 : int -> int
    
This completes the right branch of the left branch.
  • Let’s consider the right branch. There is only one typing rule we can apply, namely INT (where G is instantiated with (x : int), G and n is instantiated with 1), since 1 is an integer:

    INT -----------------------    LOOKUP_FOUND -----------------------
        (x : int), G |- 2 : int                 (x : int), G |- x : int
    MUL ---------------------------------------------------------------    INT -----------------------
        (x : int), G |- 2 * x : int                                            (x : int), G |- 1 : int
    ADD ----------------------------------------------------------------------------------------------
        (x : int), G |- (2 * x) + 1 : int
    FUN ------------------------------------
        G |- fun x -> (2 * x) + 1 : int -> int
    

    This completes the right branch.

The proof tree is complete, which proves that fun x -> 2 * x + 1 has type int -> int in any type environment G.

A practical note about let-declarations

When declaring a name in OCaml, one can also declare the type of this name:

<toplevel-expression> ::= <expression> | let <formal> : <type> = <definiens>

This feature can be put to good use to write the solution for Exercise 10:

let exercise_10 : int -> int = fun x -> 2 * x + 1;;

To wit:

# let exercise_10 : int -> int = fun x -> 2 * x + 1;;
val exercise_10 : int -> int = <fun>
#

Exercise 11

Under the same conditions as that in Exercise 10, find the type of the expression fun y -> fun x -> 2 * x + y.

Recommendation: first peruse both Exercise 10 and Exercise 22 and their solution.

Exercise 12

  1. Exhibit an OCaml expression that has type int -> int, and briefly justify your answer.
  2. Exhibit an OCaml expression that has type int -> bool, and briefly justify your answer.
  3. Exhibit an OCaml expression that has type bool -> int, and briefly justify your answer.
  4. Exhibit an OCaml expression that has type int -> int -> bool, and briefly justify your answer.
  5. Exhibit an OCaml expression that has type int -> int -> int, and briefly justify your answer. This expression should be simpler than that in Exercise 11.
  6. Exhibit an OCaml expression that has type int -> int -> int * int, and briefly justify your answer.
  7. Exhibit an OCaml expression that has type (int -> int) -> bool, and briefly justify your answer. Have you seen this type elsewhere in the lecture notes for this week?
  8. Exhibit an OCaml expression that has type (bool -> bool) -> bool, and briefly justify your answer. Have you seen this type elsewhere in the lecture notes for this week?
  9. Exhibit an OCaml expression that has type (int -> int -> int -> bool) -> bool, and briefly justify your answer. Have you seen this type elsewhere in the lecture notes for this week?

Here, “briefly justify your answer” means “simply use OCaml for each justification”:

let exercise_12a : int -> int = ...;;

let exercise_12b : int -> bool = ...;;

let exercise_12c : bool -> int = ...;;

etc.

If you can exhibit more than one OCaml expression that has the required type, by all means do it (and briefly justify your new answers and how they differ from your first answer).

Exercise 13

Draw a proof tree to justify your answer for Item e. of Exercise 12, under conditions similar to that in Exercise 10.

Naming functions

Evaluating a lambda-abstraction yields an anonymous function, which may be practical but not always, since indeed most of the time, functions are named:

# let make_odd_number = fun x -> 2 * x + 1;;
val make_odd_number : int -> int = <fun>
# make_odd_number 0;;
- : int = 1
# make_odd_number 1;;
- : int = 3
# make_odd_number 2;;
- : int = 5
# make_odd_number 3;;
- : int = 7
# make_odd_number 4;;
- : int = 9
# make_odd_number 5;;
- : int = 11
#
Anton: Do you know many programming languages where one can declare an anonymous function?
Alfrothul: Er... No.

Syntactic sugar for naming functions

In practice, it is too verbose to write let f = fun x -> e so this verbosity is alleviated with the short hand let f x = e:

# let make_odd_number x = 2 * x + 1;;
val make_odd_number : int -> int = <fun>
# make_odd_number 6;;
- : int = 13
# let x x = 2 * x + 1;;
val x : int -> int = <fun>
# x 6;;
- : int = 13
# let make_odd_number make_odd_number = 2 * make_odd_number + 1;;
# make_odd_number 6;;
- : int = 13
#
We have met the enemy and [they are] us.

Pogo

The naming style of reusing x and make_odd_number, in the last two definitions, is chiefly used to confuse the enemy.

At any rate, this short hand is only a notation: the name that is declared still denotes a function:

# let make_even_number x = 2 * x;;
val make_even_number : int -> int = <fun>
# make_even_number;;
- : int -> int = <fun>
#

Interlude

Anton: Does the name of a formal parameter matter, in a lambda-abstraction?

Mimer: Beg pardon?

Anton: Does it matter how we name it?

Alfrothul: For example, does it matter whether we write fun x -> x + 1 or fun y -> y + 1?

Mimer: No, it doesn’t matter. These two lambda-abstractions evaluate to the same function.

Dana: What do you mean by “the same function”?

Mimer: I mean that the two resulting values behave the same. Applying them to the same input yields the same output, always. So they are undistinguishable.

Anton: You mean they are equal?

Alfrothul: Because somehow OCaml disagrees:

# (fun x -> x + 1) = (fun y -> y + 1);;
Exception: Invalid_argument "equal: functional value".
#

Mimer: Rightly so, since the only way to compare two functions is to test whether applying them to each of their possible input values yields the same output values. The equality predicate denoted by (=) cannot do that. Besides, as seen above, it is only defined for each of the ground types. In fact, ...

Dana: Question.

Mimer: Yes?

Dana: Didn’t you already make this point in the sections about Program equivalence in Week 02 and Program equivalence, revisited in Week 03, i.e., this week?

Mimer: Matter of fact I did. Thanks, Dana.

Functions as parameterized functions (a post-rationalization)

Suppose we need to define a function that maps a number i not to 2 * i + 1, but to 3 * i + 1, and another that maps a number i to 4 * i + 1. The same post-rationalization as above applies: we have a delimited context [fun i -> [.] * i + 1]] which we can represent as a function:

# let make_multiple_plus_one = fun m -> fun i -> m * i + 1;;
val make_multiple_plus_one : int -> int -> int = <fun>
# make_multiple_plus_one 2 10;;
- : int = 21
# make_multiple_plus_one 3 10;;
- : int = 31
# make_multiple_plus_one 4 10;;
- : int = 41
#

This function returns another function:

# let make_odd_number = make_multiple_plus_one 2;;
val make_odd_number : int -> int = <fun>
#

Syntactic sugar for naming functions that return functions

In practice, it is too verbose to write let bar = fun x y -> ... so this verbosity is alleviated with the short hand let bar x y = .... It is only a notation: bar still denotes a function, and so does the result of applying bar to one argument:

# let bar x y = x + y;;
val bar : int -> int -> int = <fun>
# bar;;
- : int -> int -> int = <fun>
# bar 1;;
- : int -> int = <fun>
#

That said, fully applying bar to two actual parameters “saturates” its two formal parameters and its body x + y is then evaluated in an environment where x denotes 1 and y denotes 10:

# bar 1 10;;
- : int = 11
#

This notation makes it easy to think of functions that return functions as functions that take two arguments, which is fine as long as these functions are fully applied:

# let make_multiple_plus_one m x = m * x + 1;;
val make_multiple_plus_one : int -> int -> int = <fun>
# let make_odd_number x = make_multiple_plus_one 2 x;;
val make_odd_number : int -> int = <fun>
#

Food for thought

On a related note, how would you characterize the sum of a non-negative integer and its successor in terms of this integer?

Pablito: Related. How related?
Dana: Related to make_odd_number, I think.

Functions that are applied to other functions

Conversely to having functions return functions, we can also apply functions to other functions, which is ideal to write unit tests, as developed in the next chapter.

Version

Added the practical note about let-declarations [19 Mar 2023]

Introduced negated equality more precisely [19 Mar 2023]

Omitted the mention of != since it negates physical identity, not structural equality [10 Mar 2023]

Added the comparison of two characters, the comparison of two strings, and the comparison of two booleans [12 Feb 2023]

Touched up the refinement [08 Feb 2023]

Refined the description of the interactive session in Section Declaring a global name [07 Feb 2023]

Adjusted the BNFs, thanks to Kira Kutscher’s eagle eye [02 Feb 2023]

Polished the narrative [01 Feb 2023]

Expanded the narrative of the section about declaring a global name [01 Feb 2023]

Moved the section about declaring several global names at once from Week 03 to Week 04 [01 Feb 2023]

Fixed a typo in Exercise 03, with more punctuation marks.g, thanks to Kira Kutscher’s eagle eye [28 Jan 2023]

Adjusted the introduction of the backslash character as a quoting mechanism and added a partial solution for Exercise 03 [28 Jan 2023]

Quantified the variables in the type-inference rules [23 Jan 2023]

Created [10 Jan 2023]

Table Of Contents

Previous topic

Towards OCaml

Next topic

Unit tests