On simplifying OCaml expressions

When writing a program in OCaml (resp. when writing a compiler for OCaml programs), one is tempted to simplify one’s program (resp. source programs). The goal of these exercises is to reflect on which simplifications are valid and which are not. To this end, we introduce the following notion of equivalence: two expressions are observationally equivalent if evaluating them carries out the same computation. So for example, 2 + 3 and 3 + 2 are observationally equivalent because evaluating them induces the same computation:

# 2 + 3;;
- : int = 5
# 3 + 2;;
- : int = 5
#

Ditto for 1 + (10 + 100) and (1 + 10) + 100.

Rationale for this series of exercises

Each OCaml program is inductively assembled using the syntax constructors enumerated in the BNF of OCaml:

<type> ::= int | bool | char | string | <type> * ... * <type> | unit | <type> -> <type>
<type_environment> ::= . | (<name> : <type>), <type_environment>
<formal> ::= <variable> | () | (<formal>{, <formal>}*)
<expression> ::= <integer> | <boolean> | <character> | <string> | if <expression> then <expression> else <expression> | (<expression>, ..., <expression>) | <name> | fun <formal> -> <expression> | <expression> <expression> | let <formal> = <definiens> {and <formal> = <definiens>}+ in <expression> | ...
<integer> ::= 0 | 1 | -1 | 2 | -2 | ...
<boolean> ::= true | false
<character> ::= one character delimited by two single quotes
<string> ::= a series of characters delimited by two double quotes
<name> ::= a concatenation of characters starting with a lowercase character

And evaluating each expression is done based on its top constructor:

  • evaluating a literal (i.e., integer, Boolean, character, string, or unit value) yields this literal
  • evaluating a conditional expression if e1 then e2 then e3 is carried out by first evaluating e1; if evaluating e1 completes and yields a Boolean, then either the consequent e2 or the alternative e3 is evaluated depending on whether this Boolean is true or false
  • evaluating a pair (e1, e2) is carried out by first evaluating e1 and e2; if evaluating both expressions completes and yields two values v1 and v2, then the result is the pair (v1, v2)
  • evaluating a name is carried out by looking it up in the current environment
  • evaluating a lambda-abstraction fun x -> e yields a function
  • evaluating an application e0 e1 is carried out by first evaluating e0 and e1; if evaluating both expressions completes and yields two values v0 and v1, then v0 denotes a function, which is applied to v1
  • etc.

Does it mean that evaluating an expression always completes? Not necessarily:

  • evaluating the expression could raise an error, for example:

    # (1 / 0) = 0;;
    Exception: Division_by_zero.
    #
    

    In this case, the expression 1 / 0, which has type int, does not complete, nor does the expression (1 / 0) = 0, which has type bool.

    And how do we know that the expression (1 / 0) = 0 has type bool? That’s why:

    # fun () -> (1 / 0) = 0;;
    - : unit -> bool = <fun>
    #
    

    Ditto if the expression contains an assertion that fails:

    # let zero_or_one n = let () = assert (0 <= n && n <= 1) in if n = 0 then "zero" else "one";;
    val zero_or_one : int -> string = <fun>
    # zero_or_one 2 = "zero";;
    Exception: Assert_failure ("//toplevel//", 1969, -55).
    #
    

    Evaluating zero_or_one 2 does not complete, nor does evaluating zero_or_one 2 = "zero".

  • evaluating the expression could also diverge, and so never complete; for example:

    # let rec diverge () = diverge () in diverge ();;
      C-c C-cInterrupted.
    #
    

    Evaluating the sub-expression diverge (), in the body of the letrec-expression or in the body of diverge, does not complete.

  • or if a recursive call is not a tail call, it can make the OCaml runtime stack (i.e., the area of memory dedicated to managing nested function calls) overflow:

    # let rec overflow () = not (overflow ()) in overflow ();;
    Stack overflow during evaluation (looping recursion?).
    #
    

    Evaluating the sub-expression overflow (), in the body of the letrec-expression or in the body of overflow, does not complete, nor does evaluating not (overflow ()).

  • or again evaluating an expression could emit traces, e.g., of calls and returns:

    # twice_v1 3;;
    - : int = 6
    # traced_twice_v1 3;;
    twice_v1 3 ->
      visit 3 ->
        visit 2 ->
          visit 1 ->
            visit 0 ->
            visit 0 <- 0
          visit 1 <- 2
        visit 2 <- 4
      visit 3 <- 6
    twice_v1 3 <- 6
    - : int = 6
    #
    

    Evaluating twice_v1 3 and traced_twice_v1 3 both complete, but the latter evaluation can be observed to emit a trace of its recursive calls and return.

Expressions whose evaluation always silently completes (i.e., (1) always yields a value, and (2) performs no observable side effect such as tracing) are said to be pure.

To sum up, just like an OCaml expression is a combination of its parts, evaluating an OCaml expression is a combination of evaluating its parts. If evaluating each of the parts performs no observable side effect and yields a value, then this expression is pure. Otherwise, this expression is impure.

In the exercises below, you are asked whether some OCaml expressions are equivalent – let us call them e1 and e2. To answer this question, you should enumerate each of the possibilities, namely:

  • are both expressions pure?

    then if they always evaluate to the same result, then yes they are equivalent

    otherwise they are not equivalent (and you should give an example)

  • is one pure but the other impure?

    then they are not equivalent (and you should give an example)

  • otherwise, if they are both impure, ask the following questions:

    • does evaluating them print out the same thing and yields the same result?
    • does evaluating each of them raise an error and if so is it the same error?
    • does evaluating each of them diverge?
    • does evaluating each of them make the OCaml stack overflow?

    if evaluating each of e1 and e2 elicits the same behaviour (which you should argue they do), then e1 and e2 are equivalent; otherwise, they are not equivalent and you should give an example

Exercise 1

Do you think that for any OCaml variable b of type bool, the following expressions are observationally equivalent:

  1. b
  2. if b then true else false
  3. b = true

Justify your answer.

Exercise 2

Do you think that for any OCaml expression e of type bool, the following expressions are observationally equivalent:

  1. e
  2. if e then true else false
  3. e = true

Justify your answer.

Exercise 3

Do you think that for any OCaml variable b of type bool, the following expressions are observationally equivalent:

  1. b || not b
  2. true

Justify your answer.

Exercise 4

Do you think that for any OCaml expression e of type bool, the following expressions are observationally equivalent:

  1. e || not e
  2. true

Justify your answer.

Exercise 5

Do you think that for any OCaml expression e, the following expressions are observationally equivalent:

  1. e * 0
  2. 0

Justify your answer.

Exercise 6

Do you think that for any OCaml expression e, the following expressions are observationally equivalent:

  1. e * 0
  2. let x = e in 0

Justify your answer.

Exercise 7

Do you think that for any OCaml expression e, the following expressions are observationally equivalent:

  1. let x = e in 42
  2. 42

Justify your answer.

Exercise 8

Do you think that for any OCaml expression e, the following expressions are observationally equivalent:

  1. let x = e in x
  2. e

Justify your answer.

Exercise 9

Do you think that for any OCaml expressions e1 and e2 of type int, the following expressions are observationally equivalent:

  1. e1 + e2
  2. let x1 = e1 and x2 = e2 in x1 + x2
  3. let x2 = e2 and x1 = e1 in x1 + x2

Justify your answer.

Exercise 10

Reminder: for any OCaml expression e, one can desugar not e into if e then false else true.

For any OCaml expressions e1 and e2 of type bool,

  1. desugar e1 && e2, i.e., OCaml’s Boolean conjunction, into one or several conditional expressions,
  2. desugar e1 || e2, i.e., OCaml’s Boolean disjunction, into one or several conditional expressions,
  3. for any OCaml expressions e3 and e4, desugar if e1 && e2 then e3 else e4 into one or several conditional expressions, and
  4. for any OCaml expressions e3 and e4, desugar if e1 || e2 then e3 else e4 into one or several conditional expressions.

Justify your answers. (For example, e3 and e4 may be arbitrarily large.)

Version

Fixed 4 typos, thanks to Yunjeong Lee’s eagle eye [02 May 2019]

Added the rationale, which is a synthesis of answers to the questions raised in the lecturer’s office [07 Mar 2019]

Fixed the typo if ... then ... then ... into the proper if ... then ... else ... in Exercise 10, Items c and d, thanks to Sviatlana Yasiukova’s eagle eye [04 Mar 2019]

Expanded the introductory paragraph with the notion of observational equivalence [23 Feb 2019]

Created [23 Feb 2019]