Unparsing values and formatting strings

The goal of this lecture note is to describe how to map any given OCaml value to a string, with the convention that the content of this string is an OCaml expression such that evaluating the printed representation of this expression yields the given OCaml value. (So yes, in some sense, we want to implement an inverse of the OCaml evaluator for values.)

Resources

Unparsing values

Magritte: Sounds good to me!
Loki: A bit of a pipe dream, though, wouldn’t you say?
Halcyon (keeping a straight face): This is not a dream.

Ostensibly, our goal is to implement an unparsing function (traditionally called show) of type 'a -> string. For each given value, printing the resulting string should yield the representation of an OCaml expression such that evaluating this expression yields the given value. So in short, we want to map a value to the representation of a value.

In actuality, for an OCaml function to have a polymorphic type, it must not touch its argument in a type-specific way, so the goal of implementing show in OCaml is unreachable.

What we can do, however, is to implement a family of type-indexed unparsing functions that map OCaml values to strings. Let us first define them for values of ground type, and then for values that have a compound type, following the structure of types:

t ::= bool | char | string | int | t * t | ...

Unparsing Booleans

Let us implement an unparsing function for boolean values, of which there are only two. We should simply map true to the string "true" and false to the string "false":

let test_show_bool candidate =
  let b0 = (candidate true = "true")
  and b1 = (candidate false = "false")
  in b0 && b1;;

let show_bool b =
 (* show_bool : bool -> string *)
  if b
  then "true"
  else "false";;

This implementation passes the unit test:

# test_show_bool show_bool;;
- : bool = true
#

And our goal is reached:

# Printf.printf "%s\n" (show_bool true);;
true
- : unit = ()
# Printf.printf "%s\n" (show_bool false);;
false
- : unit = ()
#

Interlude about Booleans

Alfrothul: So far, so good.

Anton: Er... Yes?

Alfrothul: I mean that show_bool is equivalent to using Printf.sprintf.

Anton: What’s that now?

Alfrothul: Look:

let show_bool_using_sprintf b =
  Printf.sprintf "%B" b;;

Anton: I see. This implementation certainly passes the unit test:

# test_show_bool show_bool_using_sprintf;;
- : bool = true
#

Unparsing characters

Let us implement an unparsing function for characters. A character is written in OCaml as a single quote, followed by the actual character, followed by a single quote. So we should simply manufacture a string containing these 3 characters:

let test_show_char candidate =
  let b0 = (candidate 'a' = "'a'")
  and b1 = (candidate 'B' = "'B'")
  and b2 = (candidate '5' = "'5'")
  and b3 = (candidate ' ' = "' '")
  in b0 && b1 && b2 && b3;;

In practice, we can map the given character to the corresponding string of length 1 using String.make, and prefix and postfix this string with a single quote, using string concatenation:

let show_char c =
 (* show_char : char -> string *)
  "'" ^ String.make 1 c ^ "'";;

This implementation passes the unit test:

# test_show_char show_char;;
- : bool = true
#

And our goal is reached:

# Printf.printf "%s\n" (show_char 'x');;
'x'
- : unit = ()
#

Interlude about characters

Alfrothul: So far, so good too.

Anton: Right. We can do this too using Printf.sprintf:

let show_char_using_sprintf c =
  Printf.sprintf "'%c'" c;;

Alfrothul: This implementation passes the unit test too:

# test_show_char show_char_using_sprintf;;
- : bool = true
#

Pablito: What about the single-quote character?

Alfrothul: You mean '\''?

Anton: I don’t know, let’s try:

# show_char_using_sprintf '\'';;
- : string = "'''"
#

Darcy Lewis: That doesn’t seem right:

# Printf.printf "%s\n" (show_char_using_sprintf '\'');;
'''
- : unit = ()
#

Anton: Oops. OCaml won’t be able to read that.

Alfrothul: Right. It can only read '\''.

Anton: The quoting character should have the same problem. Let’s see:

# show_char_using_sprintf '\\';;
- : string = "'\\'"
# Printf.printf "%s\n" (show_char_using_sprintf '\\');;
'\'
- : unit = ()
#

Alfrothul: Same problem indeed. We need to special-case these cases:

let show_char_using_sprintf c =
  if c = '\'' || c = '\\'
  then Printf.sprintf "'\\%c'" c
  else Printf.sprintf "'%c'" c;;

Anton: Harrumph. Let me see:

# show_char_using_sprintf '\\';;
- : string = "'\\\\'"
# show_char_using_sprintf '\'';;
- : string = "'\\''"
#

Pablito: Yikes. What is this?

Magritte: Hi guys. It looks like you need a chairperson, doesn’t it.

Mimer: Mr. Magritte! Thanks for stopping by!

Magritte: You guys are looking at the representation of the representation of these characters. But here, what matters is to look at how this representation of representation is being printed into a representation:

# Printf.printf "%s\n" (show_char_using_sprintf '\'');;
'\''
- : unit = ()
# Printf.printf "%s\n" (show_char_using_sprintf '\\');;
'\\'
- : unit = ()
#

Alfrothul: These two representations look right.

Anton (suspicious): But the same problem should arise in the version of show_char that doesn’t use sprintf, shouldn’t it?

Dana (jumping in): That is correct. And the first thing to do is to beef up the unit-test function:

let test_show_char candidate =
  let b0 = (candidate 'a' = "'a'")
  and b1 = (candidate 'B' = "'B'")
  and b2 = (candidate '5' = "'5'")
  and b3 = (candidate ' ' = "' '")
  and b4 = (candidate '\'' = "'\\\''")
  and b5 = (candidate '\\' = "'\\\\'")
  in b0 && b1 && b2 && b3 && b4 && b5;;

Anton: I buy that. But show_char?

Alfrothul: Bah, let’s just explicitly test for the special characters:

let show_char c =
 (* show_char : char -> string *)
  "'" ^ (if c = '\\' then "\\\\" else if c = '\'' then "\\\'" else String.make 1 c) ^ "'";;

Dana: That does the job:

# test_show_char show_char;;
- : bool = true
# Printf.printf "%s\n" (show_char '\'');;
'\''
- : unit = ()
# Printf.printf "%s\n" (show_char '\\');;
'\\'
- : unit = ()
#

Halcyon: Well done, guys. This problem is behind us. Let’s move on.

Dana: Just a sec.

Halcyon: Yes?

Dana: There are 256 characters in OCaml. How does OCaml print the first 32 of them?

Pablito (diligent): Let me check:

# Printf.printf "%s\n" (show_char '\000');;
"^@"
- : unit = ()
#

Pablito: Hovsa.

Mimer (amused): It’s the null character. If you program in C, you will meet it again at the end of files.

Dana: OK. That’s the logic of the ASCII table – the characters are rendered either using the character they represent if this character has a name, or using their ASCII code if this character has no name.

Calvin Russell: Only the characters.

Loki (gravely): I see what you mean.

Dana (resigned): More special-casing:

let show_char c =
 (* show_char : char -> string *)
  "'" ^ (let n = int_of_char c
         in if n < 10
            then "\\00" ^ string_of_int n
            else if n < 32
            then "\\0" ^ string_of_int n
            else if n >= 127
            then "\\" ^ string_of_int n
            else if c = '\\'
            then "\\\\"
            else if c = '\''
            then "\\\'"
            else String.make 1 c) ^ "'";;

Dana: Lemme see:

# Printf.printf "%s\n" (show_char (char_of_int 0));;
'\000'
- : unit = ()
# Printf.printf "%s\n" (show_char (char_of_int 9));;
'\009'
- : unit = ()
# Printf.printf "%s\n" (show_char (char_of_int 10));;
'\010'
- : unit = ()
# Printf.printf "%s\n" (show_char (char_of_int 31));;
'\031'
- : unit = ()
# Printf.printf "%s\n" (show_char (char_of_int 32));;
' '
- : unit = ()
# Printf.printf "%s\n" (show_char (char_of_int 126));;
'~'
- : unit = ()
# Printf.printf "%s\n" (show_char (char_of_int 127));;
'\127'
- : unit = ()
# Printf.printf "%s\n" (show_char (char_of_int 255));;
'\255'
- : unit = ()
#

Dana (mumbling to herself): Godt nok.

Loki: What about the backspace character ('\b'), the tabulation character ('\t'), the newline character ('\n'), and the carriage-return character ('\r')? The gods are in the details, Dana:

# char_of_int 8;;
- : char = '\b'
# char_of_int 9;;
- : char = '\t'
# char_of_int 10;;
- : char = '\n'
# char_of_int 13;;
- : char = '\r'
#

Dana (politely): Feel free to add more conditional expressions. Here is the unit-test function. The name of each test is indexed with the ASCII code that is being tested:

let test_show_char candidate =
  let b_000 = (candidate '\000' = "'\\000'")
  and b_007 = (candidate '\007' = "'\\007'")
  and b_008 = (candidate '\b' = "'\\b'")
  and b_009 = (candidate '\t' = "'\\t'")
  and b_010 = (candidate '\n' = "'\\n'")
  and b_011 = (candidate '\011' = "'\\011'")
  and b_012 = (candidate '\012' = "'\\012'")
  and b_013 = (candidate '\r' = "'\\r'")
  and b_014 = (candidate '\014' = "'\\014'")
  and b_031 = (candidate '\031' = "'\\031'")
  and b_032 = (candidate ' ' = "' '")
  and b_039 = (candidate '\'' = "'\\\''")
  and b_053 = (candidate '5' = "'5'")
  and b_066 = (candidate 'B' = "'B'")
  and b_092 = (candidate '\\' = "'\\\\'")
  and b_097 = (candidate 'a' = "'a'")
  and b_126 = (candidate '~' = "'~'")
  and b_127 = (candidate '\127' = "'\\127'")
  and b_255 = (candidate '\255' = "'\\255'")
  in b_000 && b_007 && b_008 && b_009 && b_010 && b_011 && b_012 && b_013 && b_014 && b_031 && b_032 && b_039 && b_053 && b_066 && b_092 && b_097 && b_126 && b_127 && b_255;;

Halcyon: Sure, we can do that. So that’s it?

Dana: Yes, Halcyon, that’s it.

Halcyon: Good, thanks. As I was saying, well done, guys. This problem is behind us. Let’s move on.

Unparsing strings, or from the frying pan into the fire

Let us implement an unparsing function for strings. A string is written in OCaml as a double quote, followed by a series of characters, and ended by another double quote:

let test_show_string candidate =
  let b0 = (candidate "" = "\"\"")
  and b1 = (candidate "hello world" = "\"hello world\"")
  in b0 && b1;;

In practice, we prefix and postfix the given string with a double quote, using string concatenation:

let show_string s =
 (* show_string : string -> string *)
  "\"" ^ s ^ "\"";;

This implementation passes the unit test:

# test_show_string show_string;;
- : bool = true
#

And our goal is reached well enough, in that strings are printed as strings, though, ahem, without any quote character if one is needed for one of the characters in the given string:

# Printf.printf "%s\n" (show_string "some string we've got here");;
"some string we've got here"
- : unit = ()
# Printf.printf "%s\n" (show_string "\"\\");;
""\"
- : unit = ()
#

Caveat emptor.

Interlude about strings

Anton: Still doing good here, if we don’t mind any occurrence of a double quote or of a backslash in the middle of a string.

Alfrothul: Right. And we can still do this too using Printf.sprintf:

let show_string_using_sprintf s =
  Printf.sprintf "\"%s\"" s;;

Alfrothul: This implementation passes the unit test too:

# test_show_char show_char_using_sprintf;;
- : bool = true
#

Pablito: Why are there two quoted double quotes in the result? I mean look:

# show_string "hello";;
- : string = "\"hello\""
#

Anton: Well, right, it is not a string, it is the representation of a string. And since the representation of a string contains 2 double quotes, here they are, quoted with \.

Pablito: But what about the 2 other double quotes, the ones that are not quoted.

Anton: That’s because the representation itself is a string, and a string starts and ends with double quotes.

Alfrothul: So if we were looking at the representation of a representation of a string, there would be 2 more double quotes. Look:

# show_string (show_string "hello");;
- : string = "\"\"hello\"\""
#

Pablito: But if we evaluate the resulting string, we get this string:

# "\"\"hello\"\"";;
- : string = "\"\"hello\"\""
#

Anton: Right. That is because we must remove the outer quotes.

Pablito: But that doesn’t help, look:

# \"\"hello\"\";;

Alfrothul: Hum. Let’s tell OCaml to stop:

    C-c C-cInterrupted.
#

Pablito: I am so confused.

W. V. Quine: Maybe you need another chairperson?

Mimer: Prof. Quine! Thanks for coming by!

W. V. Quine: The reason of being for \ in the string is that it is quoting the subsequent ". When removing the outer quotes, this reason of being disappears so we need to remove \ too. Look:

# "\"hello\"";;
- : string = "\"hello\""
#

Pablito: I am looking.

W. V. Quine: Now the result is the representation of a string, a string that we obtain by removing the two outer double quotes and the \ too, for the same reason. And now we have a string:

# "hello";;
- : string = "hello"
#

Pablito: And we need all of this meta-level stuff why?

Mimer: At the end of the day, to print OCaml values. Look:

# Printf.printf "%s %s %s\n" (show_char 'c') (show_string "hello") (show_bool true);;
'c' "hello" true
- : unit = ()
#

Anton: But OCaml’s string-formatting facilities already give us that. Look:

# Printf.printf "%c %s %B\n" 'c' "hello" true;;
c hello true
- : unit = ()
#

Alfrothul: Wait. Where are the single quotes around c and the double quotes around hello?

Anton: Ah, right, I must put them in the string:

# Printf.printf "'%c' "%s" %B\n" 'x' "hello" true;;
Characters 19-20:
  Printf.printf "'%c' "%s" %B\n" 'x' "hello" true;;
                       ^
Error: Unbound value %
#

Anton: Now what.

Alfrothul: Huh, we need to quote the double quote since it occurs in a string, namely the first argument of Printf.printf:

# Printf.printf "'%c' \"%s\" %B\n" 'c' "hello" true;;
'c' "hello" true
- : unit = ()
#

Pablito: I think I start to get the point of implementing our own unparsing functions.

Interlude about strings within strings

Han Solo (teasingly): You like me because I’m a scoundrel.
Princess Leia (outraged): What?!?
Han Solo (surprised): What, “What?”?
Luke Skywalker (entering the room): What, “What, “What?”?”?

—Star Wars apocryphal

Dana (unimpressed): OK. So Han Solo is flirting with Princess Leia by playing the “bad boy” card.

Alfrothul: And then he selectively quotes her.

Anton: And then Luke Skywalker is forcing the door open and quotes the quote.

Pablito: And since quotes are represented with strings...

Halcyon (poetically): ...we have a string within a string.

Edgar Allan Poe (tiptoeing in): Might you need a chairperson here?

Mimer: Mr. Poe! Thanks for coming by!

Poe Dameron: What am I doing here?

Dana (pragmatic): The ambiguity could be lifted by making the inner quote a single one, not a double one.

Halcyon: Yes, that is traditional in typography.

Anton: And a good thing too, to lift the ambiguity.

Alfrothul: Good point, Anton – the strings “What, ” and ”?” are not part of what Luke Skywalker means to say.

Halcyon: Which is why, in typography, there is a distinction between opening quotes and closing quotes.

Pablito: But what happens if, say, C-3PO enters the room now and asks what is going on? Or Poe Dameron, for that matter.

Anton: Is there a triple quote in typography?

Alfrothul (cleverly): Well, he could ask “What the what?”.

Pablito: Which is essentially what I was asking with the question about run PE_PE PE, in Exercise 07 from Week 02, if you remember.

Poe Dameron: Er... Me too?

Loki (sagely): What we need is a bootstrap, not a shoestring.

Dana: Thank you very much.

Halcyon: Actually, that’s also what Han Solo said to C-3PO, in the movie.

Executive summary

Mimer: When quoting quotes, make sure to delineate what is quoted and what is meta-quoted.

Loki (teasingly): Can I quote you on that?

Mimer (same same): And how would do that exactly? Pray tell.

Unparsing integers

Let us implement an unparsing function for integers, wrapping negative integers between two cosy parentheses:

let test_show_int candidate =
  let b0 = (candidate 0 = "0")
  and b1 = (candidate 123 = "123")
  and b2 = (candidate (-123) = "(-123)")
  in b0 && b1 && b2;;

As for the implementation proper, let us simply use string_of_int:

let show_int n =
 (* show_int : int -> string *)
  if n < 0
  then "(" ^ string_of_int n ^ ")"
  else string_of_int n;;

This implementation passes the unit test:

# test_show_int show_int;;
- : bool = true;;
#

And our goal is reached:

# Printf.printf "%s\n" (show_int 123);;
123
- : unit = ()
# Printf.printf "%s\n" (show_int (-123));;
(-123)
- : unit = ()
#

Interlude about integers

Alfrothul: It’s a bit futile by now, but still, we can unparse integers using Printf.sprintf:

let show_int_using_sprintf n =
  Printf.sprintf "%i" n;;

Anton: Futile, I don’t know. This implementation does not pass the unit test:

# test_show_int show_int_using_sprintf;;
- : bool = false
#

Alfrothul: Oh, that must be because of the negative integers. Let me see:

# show_int_using_sprintf (-1);;
- : string = "-1"
#

Anton: Right. OK, we are better off with our own unparsing function here too.

Unparsing pairs of Booleans and strings

Moving on to pairs, let us implement an unparsing function, e.g., for pairs of Booleans and strings. We need to construct a string that starts with an opening parenthesis, continues with the unparsed counterpart of a Boolean, a comma and a space, and the unparsed counterpart of a string, and that ends with a closing parenthesis:

let test_show_bool_cross_string candidate =
  let b0 = (candidate (false, "hello") = "(false, \"hello\")")
  and b1 = (candidate (true, "true") = "(true, \"true\")")
  (* etc. *)
  in b0 && b1;;

In practice, we simply call show_bool and show_string in turn:

let show_bool_cross_string (b, s) =
  "(" ^ show_bool b ^ ", " ^ show_string s ^ ")";;

This implementation passes the unit test:

# test_show_bool_cross_string show_bool_cross_string;;
- : bool = true
#

And our goal is reached:

# Printf.printf "%s\n" (show_bool_cross_string (true, "fd"));;
(true, "fd")
- : unit = ()
#

The situation so far

  1. Each given value is unparsed into a string.
  2. Printing this string renders the text of the given value (modulo quoting characters in strings, which are currently missing).

Exercise 05

  1. Implement a function that unparses a triple containing an integer, a character, and a Boolean. Your function should pass the following unit test:

    let test_show_int_cross_char_cross_bool candidate =
      let b0 = (candidate (123, 'a', true) = "(123, 'a', true)")
      and b1 = (candidate (-123, 'b', false) = "((-123), 'b', false)")
      (* etc. *)
      in b0 && b1;;
    

    You are also asked to add a test to this unit-test function.

  2. Implement a function that unparses a pair containing

    • a pair containing an integer and a Boolean, and
    • a triple containing a character, a string, and a character.

    Your function should pass the following unit test (where op and cp respectively stand for “open parenthesis” and “close parenthesis” in its name):

    let test_show_op_int_cross_bool_cp_cross_op_char_cross_string_cross_char_cp candidate =
      let b0 = (candidate ((10, true), ('a', "hello", 'b')) = "((10, true), ('a', \"hello\", 'b'))")
      (* etc. *)
      in b0;;
    

    You are also asked to add a test to this unit-test function.

  3. Implement (1) a unit-test function for a function that unparses a triple of Booleans, and (2) an unparsing function that passes this unit test.

Unparsing polymorphic pairs

Aragorn: Show yourself!

Let us expand our collection of type-indexed unparsing functions that map OCaml values to strings. Our measure of success is that for each given value, the resulting string can be printed into the text of an OCaml expression such that evaluating this expression yields the given value.

So far, we are equipped with show_bool, show_char, show_string, and show_int, and Exercise 05 was a feeble attempt at generalizing to monomorphic pairs and unparse them into a string.

In this section, we face the dragon and embark on unparsing pairs in general. To a first approximation, such an unparsing function is defined as:

let show_pair (v1, v2) =
  "(" ^ (show_??? v1) ^ ", " ^ (show_??? v2) ^ ")";;

But what is show_???? That depends on the type of the two components of the pair – which are not necessarily the same. So let us parameterize show_pair with two unparsing functions, one for each component of the pair:

let show_pair show_v1 show_v2 (v1, v2) =
 (* show_pair : ('a -> string) -> ('b -> string) -> 'a * 'b -> string *)
  "(" ^ (show_v1 v1) ^ ", " ^ (show_v2 v2) ^ ")";;

So for example, we can unparse pairs containing an integer and a Boolean by applying show_pair to show_int and show_bool:

# show_pair show_int show_bool;;
- : int * bool -> string = <fun>
# show_pair show_int show_bool (10, true);;
- : string = "(10, true)"
#

For another example, we can unparse pairs containing a Boolean and an integer by applying show_pair to show_bool and show_int:

# show_pair show_bool show_int;;
- : bool * int -> string = <fun>
# show_pair show_bool show_int (false, ~-1);;
- : string = "(false, (-1))"
#

Thus equipped, we can seamlessly unparse, e.g., pairs of pairs, knowing the type of their components:

# show_pair (show_pair show_int show_bool) (show_pair show_bool show_int);;
- : (int * bool) * (bool * int) -> string = <fun>
# show_pair (show_pair show_int show_bool) (show_pair show_bool show_int) ((10, true), (false, 20));;
- : string = "((10, true), (false, 20))"
#

Unparsing polymorphic triples

To unparse triples, we parameterize the corresponding unparsing function with three unparsing functions, one for each component of the triple:

let show_triple show_v1 show_v2 show_v3 (v1, v2, v3) =
 (* show_triple : ('a -> string) -> ('b -> string) -> ('c -> string) -> 'a * 'b * 'c -> string *)
  "(" ^ (show_v1 v1) ^ ", " ^ (show_v2 v2) ^ ", " ^ (show_v3 v3) ^ ")";;

So for example, we can unparse triples containing an integer, a Boolean, and an integer by applying show_triple to show_int, show_bool and show_int:

# show_triple show_int show_bool show_int;;
- : int * bool * int -> string = <fun>
# show_triple show_int show_bool show_int (10, true, ~-10);;
- : string = "(10, true, (-10))"
#

Interlude

Anton: I am in awe of show_pair and show_triple. They really make OCaml’s polymorphism shine. Did you see their type?

Alfrothul: Yup. And they make it so easy to solve Exercise 05.

Anton: Too right. Look at its Part a.:

let test_show_int_cross_char_cross_bool candidate =
  let b0 = (candidate (123, 'a', true) = "(123, 'a', true)")
  and b1 = (candidate (-123, 'b', false) = "((-123), 'b', false)")
  (* etc. *)
  in b0 && b1;;

let show_int_cross_char_cross_bool =
  show_triple show_int show_char show_bool;;

And it passes the unit test too:

# test_show_int_cross_char_cross_bool show_int_cross_char_cross_bool;;
- : bool = true
#

Alfrothul: Part b. is simple too:

let test_show_op_int_cross_bool_cross_triple_cp_cross_op_char_cross_string_cross_char_cp candidate =
  let b0 = (candidate ((10, true), ('a', "hello", 'b')) = "((10, true), ('a', \"hello\", 'b'))")
  (* etc. *)
  in b0;;

let show_op_int_cross_bool_cross_triple_cp_cross_op_char_cross_string_cross_char_cp
  show_pair (show_pair show_int show_bool) (show_triple show_char show_string show_char);;

And it passes the unit test as well:

# test_show_op_int_cross_bool_cross_triple_cp_cross_op_char_cross_string_cross_char_cp show_op_int_cross_bool_cp_cross_op_char_cross_string_cross_char_cp;;
- : bool = true
#

Anton: Part c. is the simplest one:

let show_bool_cross_bool_cross_bool =
  show_triple show_bool show_bool show_bool;;

Alfrothul: Huh, and the unit tests?

Anton: Nah.

Loki: Way to go, Anton!

Exercise 06

Implement a polymorphic function that unparses quadruples, explain it, and illustrate it with a couple of telling examples.

The end game: printing unparsed values

Composing an unparsing function with Printf.printf makes it possible to print the resulting OCaml expressions seamlessly:

# Printf.printf "%s\n" (show_bool true);;
true
- : unit = ()
# Printf.printf "%s\n" (show_char 'c');;
'c'
- : unit = ()
# Printf.printf "%s\n" (show_string "hello world");;
"hello world"
- : unit = ()
# Printf.printf "%s\n" (show_int 123);;
123
- : unit = ()
# Printf.printf "%s\n" (show_int (-123));;
(-123)
- : unit = ()
# Printf.printf "%s\n" (show_pair show_bool show_string (false, "hello"));;
(false, "hello")
- : unit = ()
#

Resources

Version

Expanded the interlude about unparsing characters [13 Feb 2023]

Created [09 Sep 2022]

«