The map function over polymorphic lists

The goal of this lecture note is to introduce the polymorphic map function that given a function and a list, applies this function to each of the elements of this list and collects the results in a new list. We had seen an instance of the map function for strings in Week 05 and in the midterm project, String.map.

Anakin (opening really big eyes): Is it possible to use this expressive power?
The Supreme Chancellor: Not with a for loop.

Resources

Warmup #1: mapping the successor function

Our task is to implement an OCaml function that, given a list of integers, maps it to a list of integers containing the successors of all the elements of the given list, in the same order:

let test_list_map_succ candidate =
  (candidate [2; 1; 0] = [3; 2; 1])
  (* etc. *);;

Clearly this function should be specified by induction over its argument ns:

  • base case: ns = []

    the empty list containing no elements, this function should map it to the empty list

  • induction step: ns = n :: ns'

    assuming that the function maps ns' to the corresponding list of successors, which is the induction hypothesis, it should map n :: ns' to the list that starts with succ n and continues with this corresponding list of successors

Based on this specification, we can flesh out the unit-test function:

let test_list_map_succ candidate =
      (* an instance of the base case: *)
  let bc = (candidate [] = [])
      (* a few handpicked lists: *)
  and b1 = (candidate [0] = [1])
  and b2 = (candidate [1; 0] = [2; 1])
  and b3 = (candidate [2; 1; 0] = [3; 2; 1])
  and b4 = (candidate [3; 2; 1; 0] = [4; 3; 2; 1])
      (* an instance of the induction step: *)
  and is = (let n = 4
            and ns' = [3; 2; 1; 0]
            in let ih = candidate ns'
               in candidate (n :: ns') = succ n :: ih)
  (* etc. *)
  in bc && b1 && b2 && b3 && b4 && is;;

Since list_map_succ is specified inductively, it is to be programmed as a structurally recursive function. Here is its skeleton:

let list_map_succ ns =
  let rec visit ns =
    match ns with
    | [] ->
       ...
    | n :: ns' ->
       let ih = visit ns'
       in ...
  in visit ns;;

Rationale: the skeleton of visit follows the inductive structure of lists, i.e., it has a base case (the argument is the empty list) and an induction step (the argument is not the empty list).

Now all we need is to fill the blanks:

  • in the base case, the result is [], and
  • in the induction step, the result is obtained by cons’ing the successor of n to the result of the recursive call.

To wit:

let list_map_succ_v0 ns =
  let rec visit ns =
    match ns with
    | [] ->
       []
    | n :: ns' ->
       let ih = visit ns'
       in succ n :: ih
  in visit ns;;

This implementation is structurally recursive (since it follows the inductive structure of its input) and it passes the unit test:

# test_list_map_succ list_map_succ_v0;;
- : bool = true
#

In the induction step, the let-expression that declares ih, in the definition of visit, is cosmetic and so we can unfold it:

let list_map_succ_v1 ns =
  let rec visit ns =
    match ns with
    | [] ->
       []
    | n :: ns' ->
       (succ n) :: (visit ns')
  in visit ns;;

This simplified implementation is still structurally recursive and it passes the unit test:

# test_list_map_succ list_map_succ_v1;;
- : bool = true
#

Warmup #2: mapping the predecessor function

Our task is to implement an OCaml function that, given a list of integers, maps it to a list of integers containing the predecessors of all the elements of the given list, in the same order:

let test_list_map_pred candidate =
  let b0 = (candidate [3; 2; 1] = [2; 1; 0])
  and b1 = (candidate [2; 1; 0] = [1; 0; ~-1])
  (* etc. *)
  in b0 && b1;;

(See the resource file.)

Warmup #3: mapping the square function

Our task is to implement an OCaml function that, given a list of integers, maps it to a list of integers containing the squares of all the elements of the given list, in the same order:

let test_list_map_square candidate =
  (candidate [3; 2; 1] = [9; 4; 1])
  (* etc. *);;

(See the resource file.)

Warmup #4: mapping the list-singleton function

Our task is to implement an OCaml function that, given a list of integers, maps it to a list of integers containing the singleton list of all the elements of the given list, in the same order:

let test_list_map_singleton_int candidate =
  (candidate [3; 2; 1] = [[3]; [2]; [1]])
  (* etc. *);;

(See the resource file.)

Generalization: the map function for lists

Generalizing, let us write a generic version of the functions above, list_map, that is parameterized by the function to map. This generic function is still specified by induction over its argument ns, given a function f:

  • base case: ns = []

    the empty list containing no elements, this function should map it to the empty list

  • induction step: ns = n :: ns'

    assuming that the function maps ns' to the corresponding list of successors, which is the induction hypothesis, it should map n :: ns' to the list that starts with f n and continues with this corresponding list of successors

Since list_map is specified inductively, it is to be programmed as a structurally recursive function. Here is its skeleton:

let list_map f ns =
  let rec visit ns =
    match ns with
    | [] ->
       ...
    | n :: ns' ->
       let ih = visit ns'
       in ...
  in visit ns;;

Rationale: the skeleton of visit follows the inductive structure of lists, i.e., it has a base case (the argument is the empty list) and an induction step (the argument is not the empty list).

Now all we need is to fill the blanks:

  • in the base case, the result is [], and
  • in the induction step, the result is obtained by cons’ing the result of applying f of n to the result of the recursive call.

To wit:

let list_map_v0 f ns =
 (* list_map : ('a -> 'b) -> 'a list -> 'b list *)
  let rec visit ns =
    match ns with
    | [] ->
       []
    | n :: ns' ->
       let ih = visit ns'
       in f n :: ih
  in visit ns;;

This implementation is structurally recursive (since it follows the inductive structure of its input) and we can instantiate it to implement each of the warmup examples above:

let list_map_succ_v2 = list_map_v0 succ;;

let list_map_pred_v2 = list_map_v0 pred;;

let list_map_square_v2 = list_map_v0 (fun n -> n * n);;

let list_map_singleton_v2 = list_map_v0 (fun n -> [n]);;

All these implementations pass the corresponding unit tests:

# test_list_map_succ list_map_succ_v2;;
- : bool = true
# test_list_map_pred list_map_pred_v2;;
- : bool = true
# test_list_map_square list_map_square_v2;;
- : bool = true
# test_list_map_singleton_int list_map_singleton_v2;;
- : bool = true
#

This generic map function is available in OCaml’s List library:

# test_list_map_succ (List.map succ);;
- : bool = true
# test_list_map_pred (List.map pred);;
- : bool = true
# test_list_map_square (List.map (fun n -> n * n));;
- : bool = true
# test_list_map_singleton_int (List.map (fun n -> [n]);;
- : bool = true
#

A map function implements a homomorphism, i.e., it preserves the structure of its input.

Exercise 5

Exhibit an OCaml expression of type ('a * 'b) list -> ('b * 'a) list.

Exercise 6

Implement an OCaml function that unzips a list of pairs into a pair of lists, as in the zipper device for garnments (pouch, pants, etc.). For example, applying either function to [(1, 'a'); (2, 'b')] should yield ([1; 2], ['a'; 'b']). The first implementation should be structurally recursive over the given list, and the second should use List.map.

Subsidiary question: Implement a predicate that, given a list of pairs and a pair of lists, determines whether unzipping this list of pairs yields this pair of lists.

Hint about Exercise 6

Since the unzipping function is applied to a list of pairs, its domain should be ('a * 'b) list. And since it maps a list of pairs to the corresponding pair of lists, its codomain should be 'a list * 'b list. So all in all, its type should pe ('a * 'b) list -> 'a list * 'b list.

Now more than ever, one should first construct a unit-test function on successive growing lists of pairs:

                                      []

                              [(1, 'a')]

                    [(2, 'b'); (1, 'a')]

          [(3, 'c'); (2, 'b'); (1, 'a')]

[(4, 'd'); (3, 'c'); (2, 'b'); (1, 'a')]

Or again, more tellingly:

                                                []

                                    (1, 'a') :: []

                        (2, 'b') :: (1, 'a') :: []

            (3, 'c') :: (2, 'b') :: (1, 'a') :: []

(4, 'd') :: (3, 'c') :: (2, 'b') :: (1, 'a') :: []

The first line illustrates the base case, and the other lists illustrate the induction step, with the extra point that the tail of each non-empty list is the list written in the previous line. So, in practice, the result in the previous line is the result of the recursive call on the tail of the current list.

Concretely:

  • First the base case:

    let test_unzip_int_char candidate =
      let b0 = (candidate [] = result0)  (* for an appropriate result0 *)
      in b0;;
    

    What is an appropriate expression in the stead of result0? It should have type int list * char list.

  • Then a first instance of the induction step:

    let test_unzip_int_char candidate =
      let b0 = (candidate [] = result0)  (* for an appropriate result0 *)
      and b1 = (candidate [(1, 'a')] = result1)  (* for an appropriate result1 that combines 1, 'a', and result0 *)
      in b0 && b1;;
    

    The test b1 features an instance of the induction step since [(1, 'a')] is syntactic sugar for (1, 'a') :: [].

    What is an appropriate expression in the stead of result1? It should have type int list * char list and combine 1, 'a', and result0.

  • Then a second instance of the induction step:

    let test_unzip_int_char candidate =
      let b0 = (candidate [] = result0)  (* for an appropriate result0 *)
      and b1 = (candidate [(1, 'a')] = result1)  (* for an appropriate result1 that combines 1, 'a', and result0 *)
      and b2 = (candidate [(2, 'b'); (1, 'a')] = result2)  (* for an appropriate result2 that combines 2, 'b', and result1 *)
      in b0 && b1 && b2;;
    

    The test b2 features an instance of the induction step because [(2, 'b'); (1, 'a')] is syntactic sugar for (2, 'b') :: (1, 'a') :: [], i.e., for (2, 'b') :: [(1, 'a')].

    What is an appropriate expression in the stead of result2? It should have type int list * char list and combine 2, 'b', and result1.

etc.

Once one has figured out what to write in the stead of result0, result1, result2, etc., it is simple to implement unzip.

Exercise 7

Consider the two following functions:

let foo f g xs =
 (* foo : ('a -> 'b) -> ('c -> 'a) -> 'c list -> 'b list *)
  List.map f (List.map g xs);;

let bar f g xs =
 (* bar : ('a -> 'b) -> ('c -> 'a) -> 'c list -> 'b list *)
  List.map (fun x -> f (g x)) xs;;
  1. Are the functions denoted by foo and bar equivalent, when they are applied to functions that are pure (i.e., make no side effects) and total (i.e., defined on all their inputs)?
  2. Are the functions denoted by foo and bar equivalent, when they are applied to impure (e.g., tracing) functions or to partial (i.e., not total) functions?

Solution for Exercise 7

First of all, let us analyze foo and bar:

  • given two functions f and g and a list xs, the function denoted by foo first maps g over xs, yielding a new list, and then maps f over this new list, yielding a resulting list
  • given two functions f and g and a list xs, the function denoted by bar maps the composition of f and g over xs, yielding a resulting list.

Thus equipped, here are the answers to the questions:

  1. Let us pick two pure functions: one that increments its argument by 10, and one that increments its argument by 1:

    • by the analysis above, given these two functions and a list, the function denoted by foo

      1. traverses this list, applies the second function (the one that increments its argument by 1) to each element of this list, and constructs the list of the incremented results, which is a new list with the same length as the given list; and
      2. traverses this new list of incremented results, applies the second function (the one that increments its argument by 10) to each element of this new list, and constructs the list of the incremented results, which is the resulting list and has the same length as the new list;

      so all in all, given a list, the function denoted by foo returns a final list with the same length as the given list, but where all the elements are incremented by 1 + 10 = 11:

      # foo (fun j -> j + 10) (fun i -> i + 1) [0; 1; 2; 3; 4; 5];;
      - : int list = [11; 12; 13; 14; 15; 16]
      #
      
    • by the analysis above, given two functions and a list, the function denoted by bar

      1. manufactures a new function that composes the two functions; given an integer, this composite function first increments it by 1, and then increments the result by 10; in other words, this function increments its argument by 11;
      2. traverses the given list, applying the composite function to each element of this list, and constructs the list of the results; this list is the resulting list and has the same length as the given list;

      so all in all, given a list, the function denoted by bar returns a list with the same length as the given list, but where all the elements are incremented by 11:

      # bar (fun j -> j + 10) (fun i -> i + 1) [0; 1; 2; 3; 4; 5];;
      - : int list = [11; 12; 13; 14; 15; 16]
      #
      

    This leads us to conclude that the functions denoted by foo and bar are equivalent when applied to pure functions, because the order in which these pure functions are applied does not matter.

  2. As a corollary, even though they return the same result, the functions denoted by foo and bar are not equivalent when applied to impure functions, because the order in which these impure functions are applied does matter. Their effects are observationally distinct:

    # foo (fun j -> let () = Printf.printf "ten %s ->\n" (show_int j) in j + 10)
          (fun i -> let () = Printf.printf "one %s ->\n" (show_int i) in i + 1)
          [0; 1; 2; 3; 4; 5];;
    one 0 ->
    one 1 ->
    one 2 ->
    one 3 ->
    one 4 ->
    one 5 ->
    ten 1 ->
    ten 2 ->
    ten 3 ->
    ten 4 ->
    ten 5 ->
    ten 6 ->
    - : int list = [11; 12; 13; 14; 15; 16]
    # bar (fun j -> let () = Printf.printf "ten %s ->\n" (show_int j) in j + 10)
          (fun i -> let () = Printf.printf "one %s ->\n" (show_int i) in i + 1)
          [0; 1; 2; 3; 4; 5];;
    one 0 ->
    ten 1 ->
    one 1 ->
    ten 2 ->
    one 2 ->
    ten 3 ->
    one 3 ->
    ten 4 ->
    one 4 ->
    ten 5 ->
    one 5 ->
    ten 6 ->
    - : int list = [11; 12; 13; 14; 15; 16]
    #
    

    Analysis:

    • the function denoted by foo first applies the function that increments its argument by 1 (witness the successive traces of one), and then applies the function that increments its argument by 10 (witness the successive traces of ten), whereas
    • the function denoted by bar interleaves the function that increments its argument by 1 and the function that increments its argument by 10 (witness the interleaved traces of one and ten).

Exercise 8

  1. Implement the list analogue of String.mapi so that:

    # list_mapi (fun n v -> (n, v)) ['a'; 'b'; 'c'];;
    - : (int * char) list = [(0, 'a'); (1, 'b'); (2, 'c')]
    #
    
  2. Use this analogue to implement the polymorphic identity function over lists.

  3. Use this analogue to implement a polymorphic function that maps a list to a matching list of increasing indices. For example, this function should map the list ['a'; 'b'; 'c'] to [0; 1; 2].

  4. Use this analogue to implement a polymorphic function that maps a list to a matching list of decreasing indices. For example, this function should map the list ['a'; 'b'; 'c'] to [2; 1; 0].

The logical counterpart of the map function for lists

In the particular case where the function to map over a list is a predicate (i.e., a Boolean-valued function), we often want to compute not the list of its results but their conjunction or their disjunction:

let list_andmap p vs =
 (* list_andmap : ('a -> bool) -> 'a list -> bool *)
  let rec visit vs =
    match vs with
    | [] ->
       true
    | v :: vs' ->
       p v && visit vs'
  in visit vs;;

let list_ormap p vs =
 (* list_ormap : ('a -> bool) -> 'a list -> bool *)
  let rec visit vs =
    match vs with
    | [] ->
       false
    | v :: vs' ->
       p v || visit vs'
  in visit vs;;

Rationale:

  • list_andmap p [v1; v2; v3; ...; vN] corresponds to p v1 && p v2 && p v3 && ... && p vN && true, and
  • list_ormap p [v1; v2; v3; ...; vN] corresponds to p v1 || p v2 || p v3 || ... || p vN || false.

In this case, it is critical not to name the recursive call to visit so that list_andmap and list_ormap benefit from short-circuit evaluation:

# list_andmap (fun n -> if n = 0 then false else 1/0 = 0) [0; 1; 2; 3];;
- : bool = false
# list_ormap (fun n -> if n = 0 then true else 1/0 = 0) [0; 1; 2; 3];;
- : bool = true
#

Indeed, suppose we named the recursive call to visit, as in the incorrect definition below:

let list_andmap_incorrect p vs =
 (* list_andmap_incorrect : ('a -> bool) -> 'a list -> bool *)
  let rec visit vs =
    match vs with
    | [] ->
       true
    | v :: vs' ->
       let ih = visit vs'
       in p v && ih
  in visit vs;;

Then, if p denotes fun n -> if n = 0 then false else 1/0 = 0, desugaring list_andmap_incorrect p [0; 1] would yield:

let ih = let ih = true
         in p 1 && ih
in p 0 && ih;;

Evaluating this expression would give rise to the error Division_by_zero because p 1 is evaluated first, whereas evaluating p 0 && p 1 && true would yield false because p 0 is evaluated first and OCaml implements short-circuit Boolean evaluation. Indeed evaluating p 0 yields false, and therefore the result of evaluating p 0 && p 1 && true is false, without p 1 && true being evaluated (which would give rise to the error Division_by_zero).

Harald: I am so reminded of the midterm project.
Alfrothul: You and me both, Harald. You and me both.

The case of the last call in list_andmap and list_ormap

The desugared counterparts of list_andmap p [v1; v2; v3; ...; vN] and list_ormap p [v1; v2; v3; ...; vN] read p v1 && p v2 && p v3 && ... && p vN && true and p v1 || p v2 || p v3 || ... || p vN || false, but it would be more intuitive for them to read:

p v1 && p v2 && p v3 && ... && p vN

p v1 || p v2 || p v3 || ... || p vN

without the trailing && true and || false. Not only would it be more intuitive, but it would have the property that the calls p vN would be tail calls, i.e., the last thing that happens when applying list_andmap and list_ormap.

Long story short (but it is the same story as the one about unparsing lists): to define such tail-sensitive versions of list_andmap and list_ormap, one should define them by induction over the tail of their input list when this list is nonempty:

let list_andmap_tail_sensitive p vs =
 (* list_andmap_tail_sensitive : ('a -> bool) -> 'a list -> bool *)
  match vs with
  | [] ->
     true
  | v :: vs' ->
     let rec visit v vs =
       match vs with
       | [] ->
          p v
       | v' :: vs' ->
          p v && visit v' vs'
  in visit v vs';;

let list_ormap_tail_sensitive p vs =
 (* list_ormap_tail_sensitive : ('a -> bool) -> 'a list -> bool *)
  match vs with
  | [] ->
     true
  | v :: vs' ->
     let rec visit v vs =
       match vs with
       | [] ->
          p v
       | v' :: vs' ->
          p v || visit v' vs'
  in visit v vs';;

The point is that when the tail of a list is empty, its head is the last element of this list.

Resources

Version

Fixed an embarrassing typo in the Hint about Exercise 6, thanks to Joseph Goh’s kindly piercing eye [Fri 09 Apr]

Added the Hint about Exercise 6 [24 Mar 2021]

Created [14 Mar 2021]