Mini-project: The underlying determinism of OCaml

Warning

OCaml is a statically typed programming language, and so any OCaml expression has a type and is type-correct.

OCaml is a deterministic programming language in that it has a fixed processing order, and the goal of this mini-project is to become aware of this processing order.

To this end, here is a family of tracing identity functions:

let an_int n =
 (* an_int : int -> int *)
  let () = Printf.printf "processing %d...\n" n
  in n;;

let a_bool b =
 (* a_bool : bool -> bool *)
  let () = Printf.printf "processing %B...\n" b
  in b;;

let a_char c =
 (* a_char : char -> char *)
  let () = Printf.printf "processing '%c'...\n" c
  in c;;

let a_string s =
 (* a_string : string -> string *)
  let () = Printf.printf "processing \"%s\"...\n" s
  in s;;

let a_unit () =
 (* a_unit : unit -> unit *)
  let () = Printf.printf "processing the unit value...\n"
  in ();;

let a_function f =
 (* a_function : ('a -> 'b) -> 'a -> 'b *)
  let () = Printf.printf "processing a function...\n"
  in fun x -> f x;;

Resources

Question 01

In an addition (+ in infix notation), are the operands (i.e., the arguments of +) evaluated from left to right or from right to left?

Answer to Question 01

Let’s try:

# an_int 1 + an_int 10;;
processing 10...
processing 1...
- : int = 11
#

Since the processing order of OCaml is fixed, this example is enough to conclude that the operands of an addition are evaluated from right to left. Not just for addition, but also for the other arithmetic operations:

# an_int 10 - an_int 1;;
processing 1...
processing 10...
- : int = 9
# an_int 5 * an_int 4;;
processing 4...
processing 5...
- : int = 20
# an_int 5 / an_int 0;;
processing 0...
processing 5...
Exception: Division_by_zero.
#

Interlude

Alfrothul: Does the answer scale to nested operations?

Harald: Nested operations.

Alfrothul: Well, yes, you know, (1 + 10) + 100, for example.

Harald: Ah, right. If so, 100 should be evaluated first.

Alfrothul: Right. And then 10, and then 1.

Harald: Well, let’s try:

# (an_int 1 + an_int 10) + an_int 100;;
processing 100...
processing 10...
processing 1...
- : int = 111
#

Alfrothul: Wait – we should also trace the result of the inner addition:

# an_int (an_int 1 + an_int 10) + an_int 100;;
processing 100...
processing 10...
processing 1...
processing 11...
- : int = 111
#

Harald: Ah, right, then both operands of the outer addition are traced.

Alfrothul: Just to make sure, let me reassociate the additions.

Harald: Reassociate the additions.

Alfrothul: Well, yes, you know, 1 + (10 + 100).

Harald: Ah, right. If so, 100 should be evaluated first, and then 10.

Alfrothul: And then the right operand would be evaluated, and a trace could be emitted containing its result.

Harald: Indeed. And then 1 would be evaluated, and the result would be 111. Let me try:

# an_int 1 + an_int (an_int 10 + an_int 100);;
processing 100...
processing 10...
processing 110...
processing 1...
- : int = 111
#

Alfrothul: Yup. Scientific method, here we come.

Vigfus: So evaluating an_int (an_int 1 + an_int 10) + an_int (an_int 100 + an_int 1000) should first emit processing 1000..., and then processing 100..., and then processing 1100..., and then processing 10..., and then processing 1..., and then processing 11..., and then the result would be 1111, right?

Alfrothul: Why don’t you give it a try.

Vigfus: I also want to try an_int (an_int 1 / an_int 0) + an_int 10.

Halcyon: What about an_int (an_int (an_int 1 / an_int 0) + an_int 10)?

Question 02

In a function application, are the two sub-expressions (namely the one in position of a function, on the left, and the one that is the actual parameter of the function, i.e., the argument, on the right) evaluated from left to right or from right to left?

Subsidiary question:

  • Is this order compatible with the answer to Question 01?

Question 03

When a tuple is constructed, in which order were its components evaluated?

Question 04

Given three expressions, e0, e1, and e2, are the two following expressions equivalent (i.e., does evaluating them carry out the same computation?):

(fun x1 -> fun x2 -> e0) e1 e2

(fun (x1, x2) -> e0) (e1, e2)

Subsidiary questions:

  • Is your answer compatible with your answers to Question 02 and Question 03?
  • Assuming that e0, x1, and x2 all have type int, what are the types of fun x1 -> fun x2 -> e0 and fun (x1, x2) -> e0? Have we seen these types before?
Harald: I have seen rhetorical questions before...
Alfrothul: ...and the very last question is definitely one.

Question 05

Given two expressions, e0 and e1, are the two following expressions equivalent (i.e., does evaluating them carry out the same computation?):

(fun x1 -> e0) e1

let x1 = e1 in e0

Question 06

In a (local or global) let-expression declaring several bindings at once, i.e., with and, are the definienses evaluated from left to right or from right to left?

Question 07

Given three expressions, e0, e1, and e2, are the two following expressions equivalent (i.e., does evaluating them carry out the same computation?):

let x1 = e1 and x2 = e2 in e0

(fun (x1, x2) -> e0) (e1, e2)

Question 08

In a boolean conjunction (&& in infix notation), are the conjuncts (i.e., the arguments of &&, of type bool) evaluated from left to right or from right to left?

To answer this question completely, make sure to consider the 4 possible cases.

Subsidiary questions:

  • Why does this design make sense?
  • Where else could this design make sense?

Pure and impure expressions

A bit of terminology first:

  • An OCaml expression is said to be “impure” if evaluating it
    • incurs a side effect (e.g., a trace),
    • raises an error (e.g., due to a failed assertion, to a division by 0, etc.), or
    • diverges, i.e., seemingly yields no value.
  • If evaluating an OCaml expression incurs no side effects, raises no error, and yields a value, it is said to be “pure”.

So we can sketch a grammar of pure expressions as a sub-grammar of expressions:

n ::= ...a number...

b ::= ...a boolean...

c ::= ...a character...

s ::= ...a string...

x ::= ...a name...

e ::= ...an expression...

v ::= n | b | c | s |
    | fun formal -> e
    | x
    | (v, ..., v)
    | if v then v else v
    | let formal = v and ... and formal = v in v

The non-terminal e is for OCaml expressions in general, and the non-terminal v is for pure expressions. So, to a reasonable approximation, a pure expression (a value) can be

  • a literal (number, Boolean, character, or string),
  • a function abstraction,
  • a variable (since variables denote values),
  • a tuple of values, including the empty tuple,
  • a conditional expression whose test, consequent, and alternative are values, and
  • a let-expression whose definienses and body are values.

We can also classify [operations on ground values that do not raise errors] as pure. So for example, the addition of two values is a value, but not the division of two values because of its potential for dividing by 0.

This classification is both under-approximative and over-approximative because some pure expressions do not fit this grammar (e.g., applying the identity function to a value incurs no side effects, raises no error, and yields a value) and because an expression that fits this grammar of pure expressions can be so big that processing it exhausts the computational resources of the implementation of OCaml.

That said, the following questions assume that we can distinguish between expressions that are pure for sure (written v), and expressions that might be impure (written e).

Loki (ever the helpful one): The technical term is “without loss of generality.”

Question 09

  1. For any pure expression v of type int, would it be valid to simplify v * 0 into 0?
  2. For any potentially impure expression e of type int, would it be valid to simplify e * 0 into 0?
  3. For any pure expression v of type int, would it be valid to simplify v * 1 into v?
  4. For any potentially impure expression e of type int, would it be valid to simplify e * 1 into e?

Briefly justify your answers.

Question 10

Let e1 and e2 be potentially impure expressions.

  1. Are the two following expressions equivalent (i.e., does evaluating them carry out the same computation?):

    let x1 = e1 and x2 = e2 in (x1, x2);;
    
    let x2 = e2 and x1 = e1 in (x1, x2);;
    
  2. Are the two following expressions equivalent (i.e., does evaluating them carry out the same computation?):

    let x1 = e1 in let x2 = e2 in (x1, x2);;
    
    let x2 = e2 in let x1 = e1 in (x1, x2);;
    

Briefly justify your answers.

Question 11

Let v1 and v2 be pure expressions.

  1. Are the two following expressions equivalent (i.e., does evaluating them carry out the same computation?):

    let x1 = v1 and x2 = v2 in (x1, x2);;
    
    let x2 = v2 and x1 = v1 in (x1, x2);;
    
  2. Are the two following expressions equivalent (i.e., does evaluating them carry out the same computation?):

    let x1 = v1 in let x2 = v2 in (x1, x2);;
    
    let x2 = v2 in let x1 = v1 in (x1, x2);;
    

Briefly justify your answers.

Question 12

In mathematics, a function is said to be strict if it uses its argument, and non-strict if it does not. For example, in the syntax of OCaml, fun x -> x is such a strict function and fun _ -> 42 is such a non-strict function. So in mathematics, applying this non-strict function to any argument yields 42, always. Is that the case in OCaml too?

Feel free to revisit Question 02 and Question 05.

Resources

Version

Wordsmithed the narrative in the section about pure and impure expressions [04 Mar 2022]

Tuned a_char to make it print single quotes [03 Mar 2022]

Created [27 Feb 2022]