Mutable data in OCaml

The goal of this lecture note is to introduce mutable data in OCaml:

<type> ::= ... | <type> ref
<expression> ::= ... | ref <expression> | ! <expression> | <expression> := <expression>

References in OCaml

Fun fact: as in mathematics and unlike in imperative languages, variables do not vary in OCaml. There are, however, cases where we need a degree of mutability in the course of a computation, and that is why OCaml also features a type of mutable values: references. A reference is

  • constructed with ref,
  • dereferenced with !, and
  • mutated using the infix operator :=.

Like that of options and lists, the type of a reference is polymorphic. Its name is the same as its constructor, namely ref:

REFG |- e : t
G |- ref e : t ref
DEREFG |- e : t ref
G |- !e : t
MUTATEG |- e1 : t refG |- e2 : t
G |- e1 := e2 : unit

In words:

  • in any type environment G, the expression ref e has type t ref whenever in that environment, the expression e has type t;
  • in any type environment G, the expression !e has type t whenever in that environment, the expression e has type t ref; and
  • in any type environment G, the expression e1 := e2 has type unit whenever in that environment, the expression e1 has type t ref and the expression e2 has type t.

Constructing mutable values

First of all, let us look at the ref constructor:

# ref;;
- : 'a -> 'a ref = <fun>
#

We can see that it is a unary function that, given a value, returns a reference to this value:

  • Let us interactively construct the reference to an integer:

    # ref 33;;
    - : int ref = {contents = 33}
    #
    
  • Let us interactively construct the reference to a Boolean:

    # ref true;;
    - : bool ref = {contents = true}
    #
    
  • Let us interactively construct the reference to a string:

    # ref "hello";;
    - : string ref = {contents = "hello"}
    #
    

We can construct references to any value:

  • for example, pairs of integers:

    # ref (1, 10);;
    - : (int * int) ref = {contents = (1, 10)}
    #
    
  • for example, pairs of other ground values than integers:

    # ref (33, true);;
    - : (int * bool) ref = {contents = (33, true)}
    #
    
  • for example, functions:

    # ref (fun i -> i + 1);;
    - : (int -> int) ref = {contents = <fun>}
    #
    
  • for example, references:

    # ref (ref "hello");;
    - : string ref ref = {contents = {contents = "hello"}}
    #
    

Since references are values, they can occur in other values:

  • for example, here is a pair of references:

    # (ref false, ref 100);;
    - : bool ref * int ref = ({contents = false}, {contents = 100})
    #
    
  • for example, here is a reference to a pair of references:

    # ref (ref false, ref 100);;
    - : (bool ref * int ref) ref =
    {contents = ({contents = false}, {contents = 100})}
    #
    

References can occur in type declarations:

  • for example, here is the type of binary trees with mutable leaves:

    type 'a binary_tree_with_mutable_leaves =
      | Leaf of 'a ref
      | Node of 'a binary_tree_with_mutable_leaves * 'a binary_tree_with_mutable_leaves;;
    
  • for example, here is the type of binary trees with mutable nodes:

    type 'a binary_tree_with_mutable_nodes =
      | Leaf of 'a
      | Node of 'a binary_tree_with_mutable_nodes ref * 'a binary_tree_with_mutable_nodes ref;;
    

Dereferencing mutable values

Given a reference, the ! unary operator yields the value that is referred to.

  • For example, given the reference to an integer, we can obtain this integer by applying ! to it:

    # let r = ref 33 in !r;;
    - : int = 33
    #
    
  • For example, given the reference to the reference to an integer, we can obtain this integer by applying ! to it twice in a row:

    # let r = ref (ref 33) in !(!r);;
    - : int = 33
    #
    
  • For example, here is a function that given a pair of references, returns the pair of the values that are referred to in the given pair:

    # fun (r1, r2) -> (!r1, !r2);;
    - : 'a ref * 'b ref -> 'a * 'b = <fun>
    #
    

Mutating references

The infix operator :=, given a reference to a value of type t and a new value of type t, modifies the reference so that it refers to the new value. The result is of type unit.

For example, in the following expression, a reference to an integer, 1, is locally created and named r:

let r = ref 1              (* <-- creation of a reference *)
in let x = !r              (* <-- dereference *)
   in let () = r := 10 in  (* <-- mutation of the reference *)
      let y = !r           (* <-- dereference *)
      in x + y;;

In the scope of the declaration of r,

  1. the variable x is locally declared to denote the value that is currently referred to by the denotation of r (i.e., 1);
  2. the reference is mutated with another integer (namely 10);
  3. the variable y is locally declared to denote the value that is currently referred to by the denotation of r (i.e., 10); and
  4. the result of adding the denotations of x (i.e., 1) and y (i.e., 10) is returned.

To wit:

# let r = ref 1
  in let x = !r
     in let () = r := 10 in
        let y = !r
        in x + y;;
- : int = 11
#

More atomically, we can

  1. declare a reference:

    # let r = ref 1;;
    val r : int ref = {contents = 1}
    #
    
  2. mutate this reference:

    # r := 10;;
    - : unit = ()
    #
    
  3. dereference this reference:

    # !r;;
    - : int = 10
    #
    

The imperative simulation of String.mapi in terms of String.map

Back in Week 05 (the General interlude (continued) where an updated version of the String module was introduced), this updated module featured an imperative simulation of String.mapi in terms of String.map:

let string_mapi f s =
  let i = ref ~-1
  in String.map (fun c ->
                  let () = i := succ !i in
                  f !i c)
                s;;

Its rationale was that since String.map applies its first argument to each character of its second argument from left to right, we can locally define a reference to an index, and imperatively increment this index each time the first argument is applied. The first argument of string_mapi is then successively applied

  1. to 0 and the first character of the given string,
  2. to 1 and the second character of the given string,
  3. to 2 and the third character of the given string,

...

  1. to n-1 and the last character of the given string, assuming it has length n (represented as n).

Exercise 01

Adjust the imperative simulation just above so that given a string of length n (represented as n), the first argument of the adjusted function is successively applied

  1. to n-1 and the first character of the given string,
  2. to n-2 and the second character of the given string,
  3. to n-3 and the third character of the given string,

...

  1. to 0 and the last character of the given string.

Interlude

Pablito: We can compute the length of a string using String.map and a reference.

Halcyon: We can?

Pablito: Well yes – the mapped function is applied to each character of the given string, so if it increments a counter that was initialized with 0, the resulting counter will end with the number of characters in the string, i.e., its length. Look:

let string_length s =
  let n = ref 0
  in let _ = String.map (fun c -> let () = n := succ !n in c) s
     in !n;;

Halcyon: Just a sec. – yes, this definition passes the unit tests.

Pablito: Thank you.

Thunks by need, revisited

Let us revisit the section about Thunks by need in Week 11, using references rather than the Lazy library.

To map a thunk into a thunk that behaves as in call by need, we need to memoize its result the first time it is forced, and that is where references come handy.

The key idea of delay_need is to map a given thunk to a new thunk that is defined in the lexical scope of a reference initialized to None. The new thunk consults this reference to determine whether the given thunk has been forced yet, and then takes action:

  • if the given thunk has been forced already, a memoized result is in store, and this result is returned; and
  • if the given thunk has not been forced already, the new thunk forces it and names the result memoized_value; it then mutates the reference to Some memoized_value (Some to signify that the thunk has already been forced, and memoized_value to record the result of the first forcing); and finally it returns the result of forcing the thunk, i.e., the memoized value.

In OCaml:

let delay_need thunk =
 (* delay_need : (unit -> 'a) -> unit -> 'a *)
  let memory_cell = ref None
  in fun () -> match !memory_cell with
               | Some memoized_value ->
                  memoized_value
               | None ->
                  let memoized_value = force thunk
                  in (memory_cell := Some memoized_value;
                      memoized_value);;

For example, let us define the need counterparts of the lumberjack thunk and of the ok thunk that emit "I am a lumberjack" and "I am OK" before returning 10 and 1, respectively:

let lumberjack_thunk =
  fun () -> let () = Printf.printf "I am a lumberjack\n"
            in 10;;

let ok_thunk =
  fun () -> let () = Printf.printf "I am OK\n"
            in 1;;

let lumberjack_thunk_need =
  delay_need lumberjack_thunk;;

let ok_thunk_need =
  delay_need ok_thunk;;

Replaying the scenario from Week 11 still yields the answer 42, but it does so by emitting the messages I am a lumberjack and I am OK once, as it should with call by need:

# let a = force lumberjack_thunk_need
  and b = force lumberjack_thunk_need
  and c = force lumberjack_thunk_need
  and d = force lumberjack_thunk_need
  and e = force ok_thunk_need
  and f = force ok_thunk_need
  in a + b + c + d + e + f;;
I am a lumberjack
I am OK
- : int = 42
#

Resources

Version

Created [03 Nov 2022]