Unparsing values and formatting strings

The goal of this lecture note is to describe how to map any given OCaml value to a string, with the convention that the content of this string is an OCaml expression such that evaluating the printed representation of this expression yields the given OCaml value. (So yes, in some sense, we want to implement an inverse of the OCaml evaluator for values.)

Resources

Unparsing values

Magritte: Sounds good to me!
Loki: A bit of a pipe dream, though, wouldn’t you say?
Halcyon (keeping a straight face): This is not a dream.

Ostensibly, our goal is to implement an unparsing function (traditionally called show) of type 'a -> string. For each given value, printing the resulting string should yield the representation of an OCaml expression such that evaluating this expression yields the given value. So in short, we want to map a value to the representation of a value.

In actuality, for an OCaml function to have a polymorphic type, it must not touch its argument in a type-specific way, so the goal of implementing show in OCaml is unreachable.

What we can do, however, is to implement a family of type-indexed unparsing functions that map OCaml values to strings. Let us first define them for values of ground type, and then for values that have a compound type, following the structure of types:

t ::= bool | char | string | int | t * t | ...

Unparsing Booleans

Let us implement an unparsing function for boolean values, of which there are only two. We should simply map true to the string "true" and false to the string "false":

let test_show_bool candidate =
  let b0 = (candidate true = "true")
  and b1 = (candidate false = "false")
  in b0 && b1;;

let show_bool b =
 (* show_bool : bool -> string *)
  if b
  then "true"
  else "false";;

This implementation passes the unit test:

# test_show_bool show_bool;;
- : bool = true
#

And our goal is reached:

# Printf.printf "%s\n" (show_bool true);;
true
- : unit = ()
# Printf.printf "%s\n" (show_bool false);;
false
- : unit = ()
#

Interlude about Booleans

Alfrothul: So far, so good.

Harald: Er... Yes?

Alfrothul: I mean that show_bool is equivalent to using Printf.sprintf.

Harald: What’s that now?

Alfrothul: Look:

let show_bool_using_sprintf b =
  Printf.sprintf "%B" b;;

Harald: I see. This implementation certainly passes the unit test:

# test_show_bool show_bool_using_sprintf;;
- : bool = true
#

Unparsing characters

Let us implement an unparsing function for characters. A character is written in OCaml as a single quote, followed by the actual character, followed by a single quote. So we should just manufacture a string containing these 3 characters:

let test_show_char candidate =
  let b0 = (candidate 'a' = "'a'")
  and b1 = (candidate 'B' = "'B'")
  and b2 = (candidate '5' = "'5'")
  and b3 = (candidate ' ' = "' '")
  in b0 && b1 && b2 && b3;;

In practice, we can map the given character to the corresponding string of length 1 using String.make, and prefix and postfix this string with a single quote, using string concatenation:

let show_char c =
 (* show_char : char -> string *)
  "'" ^ String.make 1 c ^ "'";;

This implementation passes the unit test:

# test_show_char show_char;;
- : bool = true
#

And our goal is reached:

# Printf.printf "%s\n" (show_char 'x');;
'x'
- : unit = ()
#
Calvin Russell: Only the characters.
Loki (gravely): I see what you mean.

Interlude about characters

Alfrothul: So far, so good too.

Harald: Right. We can do this too using Printf.sprintf:

let show_char_using_sprintf c =
  Printf.sprintf "'%c'" c;;

Alfrothul: This implementation passes the unit test too:

# test_show_char show_char_using_sprintf;;
- : bool = true
#

Vigfus: What about the single-quote character?

Alfrothul: You mean '\''?

Harald: I don’t know, let’s try:

# show_char_using_sprintf '\'';;
- : string = "'''"
#

Alfrothul: That doesn’t look right:

# Printf.printf "%s\n" (show_char_using_sprintf '\'');;
'''
- : unit = ()
#

Harald: Oops. OCaml won’t be able to read that.

Alfrothul: Right. It can only read '\''.

Harald: The quoting character should have the same problem. Let’s see:

# show_char_using_sprintf '\\';;
- : string = "'\\'"
# Printf.printf "%s\n" (show_char_using_sprintf '\\');;
'\'
- : unit = ()
#

Alfrothul: Same problem indeed. We need to special-case these cases:

let show_char_using_sprintf c =
  if c = '\'' || c = '\\'
  then Printf.sprintf "'\\%c'" c
  else Printf.sprintf "'%c'" c;;

Harald: Harumph. Let me see:

# show_char_using_sprintf '\\';;
- : string = "'\\\\'"
# show_char_using_sprintf '\'';;
- : string = "'\\''"
#

Vigfus: Yikes. What is this?

Magritte: Hi guys. It looks like you need a chairperson, doesn’t it.

Mimer: Mr. Magritte! Thanks for stopping by!

Magritte: You guys are looking at the representation of the representation of these characters. But here, what matters is to look at how this representation of representation is being printed into a representation:

# Printf.printf "%s\n" (show_char_using_sprintf '\'');;
'\''
- : unit = ()
# Printf.printf "%s\n" (show_char_using_sprintf '\\');;
'\\'
- : unit = ()
#

Alfrothul: These two representations look right.

Harald (suspicious): But the same problem should arise in the version of show_char that doesn’t use sprintf, shouldn’t it?

Brynja (jumping in): That is correct. And the first thing to do is to beef up the unit-test function:

let test_show_char candidate =
  let b0 = (candidate 'a' = "'a'")
  and b1 = (candidate 'B' = "'B'")
  and b2 = (candidate '5' = "'5'")
  and b3 = (candidate ' ' = "' '")
  and b4 = (candidate '\'' = "'\\\''")
  and b5 = (candidate '\\' = "'\\\\'")
  in b0 && b1 && b2 && b3 && b4 && b5;;

Harald: I buy that. But show_char?

Alfrothul: Bah, let’s just explicitly test for the special characters:

let show_char c =
 (* show_char : char -> string *)
  "'" ^ (if c = '\\' then "\\\\" else if c = '\'' then "\\\'" else String.make 1 c) ^ "'";;

Brynja: That does the job:

# test_show_char show_char;;
- : bool = true
# Printf.printf "%s\n" (show_char '\'');;
'\''
- : unit = ()
# Printf.printf "%s\n" (show_char '\\');;
'\\'
- : unit = ()
#

Halcyon: Well done, guys. This problem is behind us. Let’s move on.

Unparsing strings, or from the frying pan into the fire

Let us implement an unparsing function for strings. A string is written in OCaml as a double quote, followed by a series of characters, and ended by another double quote:

let test_show_string candidate =
  let b0 = (candidate "" = "\"\"")
  and b1 = (candidate "hello world" = "\"hello world\"")
  in b0 && b1;;

In practice, we prefix and postfix the given string with a double quote, using string concatenation:

let show_string s =
 (* show_string : string -> string *)
  "\"" ^ s ^ "\"";;

This implementation passes the unit test:

# test_show_string show_string;;
- : bool = true
#

And our goal is reached well enough, in that strings are printed as strings, though, ahem, without any quote character if one is needed for one of the characters in the given string:

# Printf.printf "%s\n" (show_string "some string we've got here");;
"some string we've got here"
- : unit = ()
# Printf.printf "%s\n" (show_string "\"\\");;
""\"
- : unit = ()
#

Caveat emptor.

Interlude about strings

Harald: Still doing good here, if we don’t mind any occurrence of a double quote or of a backslash in the middle of a string.

Alfrothul: Right. And we can still do this too using Printf.sprintf:

let show_string_using_sprintf s =
  Printf.sprintf "\"%s\"" s;;

Alfrothul: This implementation passes the unit test too:

# test_show_char show_char_using_sprintf;;
- : bool = true
#

Vigfus: Why are there two quoted double quotes in the result? I mean look:

# show_string "hello";;
- : string = "\"hello\""
#

Harald: Well, right, it is not a string, it is the representation of a string. And since the representation of a string contains 2 double quotes, here they are, quoted with \.

Vigfus: But what about the 2 other double quotes, the ones that are not quoted.

Harald: That’s because the representation itself is a string, and a string starts and ends with double quotes.

Alfrothul: So if we were looking at the representation of a representation of a string, there would be 2 more double quotes. Look:

# show_string (show_string "hello");;
- : string = "\"\"hello\"\""
#

Vigfus: But if we evaluate the resulting string, we get this string:

# "\"\"hello\"\"";;
- : string = "\"\"hello\"\""
#

Harald: Right. That is because we must remove the outer quotes.

Vigfus: But that doesn’t help, look:

# \"\"hello\"\";;

Alfrothul: Hum. Let’s tell OCaml to stop:

    C-c C-cInterrupted.
#

Vigfus: I am so confused.

Magritte: It looks like you guys still need a chairperson?

Mimer: Mr. Magritte! Thanks for coming back!

Magritte: The reason of being for \ in the string is that it is quoting the subsequent ". When removing the outer quotes, this reason of being disappears so we need to remove \ too. Look:

# "\"hello\"";;
- : string = "\"hello\""
#

Vigfus: I am looking.

Magritte: Now the result is the representation of a string, a string that we obtain by removing the two outer double quotes and the \ too, for the same reason. And now we have a string:

# "hello";;
- : string = "hello"
#

Vigfus: And we need all of this meta-level stuff why?

Mimer: At the end of the day, to print OCaml values. Look:

# Printf.printf "%s %s %s\n" (show_char 'c') (show_string "hello") (show_bool true);;
'c' "hello" true
- : unit = ()
#

Harald: But OCaml’s string-formatting facilities already give us that. Look:

# Printf.printf "%c %s %B\n" 'x' "hello" true;;
x hello true
- : unit = ()
#

Alfrothul: Wait. Where are the double quotes around hello?

Harald: Ah, right, I must put them in the string:

# Printf.printf "%c "%s" %B\n" 'x' "hello" true;;
Characters 19-20:
  Printf.printf "%c "%s" %B\n" 'x' "hello" true;;
                     ^
Error: Unbound value %
#

Harald: Now what.

Alfrothul: Huh, we need to quote the double quote since it occurs in a string, namely the first argument of Printf.printf:

# Printf.printf "%c \"%s\" %B\n" 'x' "hello" true;;
x "hello" true
- : unit = ()
#

Vigfus: I think I start to get the point of implementing our own unparsing functions.

The string is a stark data structure and everywhere it is passed there is much duplication of process.
It is a perfect vehicle for hiding information.

Alan Perlis‘s programming epigram #34

Unparsing integers

Let us implement an unparsing function for integers, wrapping negative integers between two cosy parentheses:

let test_show_int candidate =
  let b0 = (candidate 0 = "0")
  and b1 = (candidate 123 = "123")
  and b2 = (candidate (-123) = "(-123)")
  in b0 && b1 && b2;;

As for the implementation proper, let us simply use string_of_int:

let show_int n =
 (* show_int : int -> string *)
  if n < 0
  then "(" ^ string_of_int n ^ ")"
  else string_of_int n;;

This implementation passes the unit test:

# test_show_int show_int;;
- : bool = true;;
#

And our goal is reached:

# Printf.printf "%s\n" (show_int 123);;
123
- : unit = ()
# Printf.printf "%s\n" (show_int (-123));;
(-123)
- : unit = ()
#

Interlude about integers

Alfrothul: It’s a bit futile by now, but still, we can unparse integers using Printf.sprintf:

let show_int_using_sprintf n =
  Printf.sprintf "%i" n;;

Harald: Futile, I don’t know. This implementation does not pass the unit test:

# test_show_int show_int_using_sprintf;;
- : bool = false
#

Alfrothul: Oh, that must be because of the negative integers. Let me see:

# show_int_using_sprintf (-1);;
- : string = "-1"
#

Harald: Right. OK, we are better off with our own unparsing function here too.

Unparsing pairs of Booleans and strings

Moving on to pairs, let us implement an unparsing function, e.g., for pairs of Booleans and strings. We need to construct a string that starts with an opening parenthesis, continues with the unparsed counterpart of a Boolean, a comma and a space, and the unparsed counterpart of a string, and that ends with a closing parenthesis:

let test_show_bool_cross_string candidate =
  let b0 = (candidate (false, "hello") = "(false, \"hello\")")
  and b1 = (candidate (true, "true") = "(true, \"true\")")
  (* etc. *)
  in b0 && b1;;

In practice, we simply call show_bool and show_string in turn:

let show_bool_cross_string (b, s) =
  "(" ^ show_bool b ^ ", " ^ show_string s ^ ")";;

This implementation passes the unit test:

# test_show_bool_cross_string show_bool_cross_string;;
- : bool = true
#

And our goal is reached:

# Printf.printf "%s\n" (show_bool_cross_string (true, "fd"));;
(true, "fd")
- : unit = ()
#

To say it again,

  1. each given value is unparsed into a string,
  2. printing this string renders the text of the given value (modulo quoting characters in strings, which are currently missing).

Exercise 4

  1. Implement a function that unparses a triple containing an integer, a character, and a Boolean. Your function should pass the following unit test:

    let test_show_int_cross_char_cross_bool candidate =
      let b0 = (candidate (123, 'a', true) = "(123, 'a', true)")
      and b1 = (candidate (-123, 'b', false) = "((-123), 'b', false)")
      (* etc. *)
      in b0 && b1;;
    

    You are also asked to add a test to this unit-test function.

  2. Implement a function that unparses a pair containing

    • a pair containing an integer and a Boolean, and
    • a triple containing a character, a string, and a character.

    Your function should pass the following unit test (where op and cp respectively stand for “open parenthesis” and “close parenthesis” in its name):

    let test_show_op_int_cross_bool_cp_cross_op_char_cross_string_cross_char_cp candidate =
      let b0 = (candidate ((10, true), ('a', "hello", 'b')) = "((10, true), ('a', \"hello\", 'b'))")
      (* etc. *)
      in b0;;
    

    You are also asked to add a test to this unit-test function.

  3. Implement (1) a unit-test function for a function that unparses a triple of Booleans, and (2) an unparsing function that passes this unit test.

Unparsing polymorphic pairs

Let us expand our collection of type-indexed unparsing functions that map OCaml values to strings. Our measure of success is that for each given value, the resulting string can be printed into the text of an OCaml expression such that evaluating this expression yields the given value.

So far, we are equipped with show_bool, show_char, show_string, and show_int, and Exercise 4 was a feeble attempt at generalizing to monomorphic pairs and unparse them into a string.

In this section, we face the dragon and embark on unparsing pairs in general. To a first approximation, such an unparsing function is defined as:

let show_pair (v1, v2) =
  "(" ^ (show_??? v1) ^ ", " ^ (show_??? v2) ^ ")";;

But what is show_???? That depends on the type of the two components of the pair – which are not necessarily the same. So let us parameterize show_pair with two unparsing functions, one for each component of the pair:

let show_pair show_yourself1 show_yourself2 (v1, v2) =
 (* show_pair : ('a -> string) -> ('b -> string) -> 'a * 'b -> string *)
  "(" ^ (show_yourself1 v1) ^ ", " ^ (show_yourself2 v2) ^ ")";;

So for example, we can unparse pairs containing an integer and a Boolean by applying show_pair to show_int and show_bool:

# show_pair show_int show_bool;;
- : int * bool -> string = <fun>
# show_pair show_int show_bool (10, true);;
- : string = "(10, true)"
#

For another example, we can unparse pairs containing a Boolean and an integer by applying show_pair to show_bool and show_int:

# show_pair show_bool show_int;;
- : bool * int -> string = <fun>
# show_pair show_bool show_int (false, ~-1);;
- : string = "(false, ~-1)"
#

Thus equipped, we can seamlessly unparse, e.g., pairs of pairs, knowing the type of their components:

# show_pair (show_pair show_int show_bool) (show_pair show_bool show_int);;
- : (int * bool) * (bool * int) -> string = <fun>
# show_pair (show_pair show_int show_bool) (show_pair show_bool show_int) ((10, true), (false, 20));;
- : string = "((10, true), (false, 20))"
#

Unparsing polymorphic triples

To unparse triples, we parameterize the corresponding unparsing function with three unparsing functions, one for each component of the triple:

let show_triple show_yourself1 show_yourself2 show_yourself3 (v1, v2, v3) =
 (* show_triple : ('a -> string) -> ('b -> string) -> ('c -> string) -> 'a * 'b * 'c -> string *)
  "(" ^ (show_yourself1 v1) ^ ", " ^ (show_yourself2 v2) ^ ", " ^ (show_yourself3 v3) ^ ")";;

So for example, we can unparse triples containing an integer, a Boolean, and an integer by applying show_triple to show_int, show_bool and show_int:

# show_triple show_int show_bool show_int;;
- : int * bool * int -> string = <fun>
# show_triple show_int show_bool show_int (10, true, ~-10);;
- : string = "(10, true, (-10))"
#

Interlude

Harald: I am in awe of show_pair and show_triple. They really make OCaml’s polymorphism shine. Did you see their type?

Alfrothul: Yup. And they make it so easy to solve Exercise 4.

Harald: Too right. Look at its Part a.:

let test_show_int_cross_char_cross_bool candidate =
  let b0 = (candidate (123, 'a', true) = "(123, 'a', true)")
  and b1 = (candidate (-123, 'b', false) = "(~-123, 'b', false)")
  (* etc. *)
  in b0 && b1;;

let show_int_cross_char_cross_bool =
  show_triple show_int show_char show_bool;;

And it passes the unit test too:

# test_show_int_cross_char_cross_bool show_int_cross_char_cross_bool;;
- : bool = true
#

Alfrothul: Part b. is simple too:

let test_show_op_int_cross_bool_cross_triple_cp_cross_op_char_cross_string_cross_char_cp candidate =
  let b0 = (candidate ((10, true), ('a', "hello", 'b')) = "((10, true), ('a', \"hello\", 'b'))")
  (* etc. *)
  in b0;;

let show_op_int_cross_bool_cross_triple_cp_cross_op_char_cross_string_cross_char_cp
  show_pair (show_pair show_int show_bool) (show_triple show_char show_string show_char);;

And it passes the unit test as well:

# test_show_op_int_cross_bool_cross_triple_cp_cross_op_char_cross_string_cross_char_cp show_op_int_cross_bool_cp_cross_op_char_cross_string_cross_char_cp;;
- : bool = true
#

Harald: Part c. is the simplest one:

let show_bool_cross_bool_cross_bool =
  show_triple show_bool show_bool show_bool;;

Alfrothul: Huh, and the unit tests?

Harald: Nah.

Exercise 5

Implement a polymorphic function that unparses quadruples, explain it, and illustrate it with a couple of telling examples.

The end game: printing unparsed values

Composing an unparsing function with Printf.printf makes it possible to print the resulting OCaml expressions seamlessly:

# Printf.printf "%s\n" (show_bool true);;
true
- : unit = ()
# Printf.printf "%s\n" (show_char 'c');;
'c'
- : unit = ()
# Printf.printf "%s\n" (show_string "hello world");;
"hello world"
- : unit = ()
# Printf.printf "%s\n" (show_int 123);;
123
- : unit = ()
# Printf.printf "%s\n" (show_int (-123));;
(-123)
- : unit = ()
# Printf.printf "%s\n" (show_pair show_bool show_string (false, "hello"));;
(false, "hello")
- : unit = ()
#

Resources

Version

Fixed a typo in the section about unparsing pais of Booleans and strings, thanks to Syed Muhammad Maaz’s eagle eye [27 Feb 2020]

Touched up the narrative and added the Perlis quote [15 Feb 2020]

Updated [14 Feb 2020]

Streamlined the unit-test functions [08 Jun 2019]

Created [05 Mar 2019]