The goal of this lecture note is to illustrate how to program with strings in OCaml.
So strings are built out of characters and are written between double quotes:
# "hello world";;
- : string = "hello world"
#
They are concatenated with the infix operator ^:
# "ab" ^ "ba";;
- : string = "abba"
# "hello" ^ " " ^ "world";;
- : string = "hello world"
#
And if we want to have a double quote in a string, we quote it with a backslash, i.e., \. This backslash is purely notational, in that it does not exist in the string, which we can verify by computing the length of the said string:
# "\"";;
- : string = "\""
# String.length "\"";;
- : int = 1
#
Likewise if we want to have a backslash in a string – we quote it too:
# "\\";;
- : string = "\\"
#
Besides String.length, a function of type string -> int to compute the length of a string, the string library String also contains, e.g, String.get : string -> int -> char, a function to index a character in a string, starting from 0 all the way up to (and excluding) the length of the string. Indexing a string with a negative integer or with an index that is greater or equal to the length of the string raises an error:
# let a_string_of_length_4 = "aBcD";;
val a_string_of_length_4 : string = "aBcD"
# String.length a_string_of_length_4;;
- : int = 4
# String.get a_string_of_length_4 (-1);;
Exception: Invalid_argument "index out of bounds".
# String.get a_string_of_length_4 0;;
- : char = 'a'
# String.get a_string_of_length_4 1;;
- : char = 'B'
# String.get a_string_of_length_4 2;;
- : char = 'c'
# String.get a_string_of_length_4 3;;
- : char = 'D'
# String.get a_string_of_length_4 4;;
Exception: Invalid_argument "index out of bounds".
# String.get a_string_of_length_4 max_int;;
Exception: Invalid_argument "index out of bounds".
#
Assume that strings are represented (i.e., implemented) as a contiguous area of memory (which they actually are). Such a contiguous area of memory is often called a “buffer”, and the fact that it cannot be indexed outside of its bounds (here: 0, inclusively, and the length of the string, exclusively) means that the implementation of OCaml does not suffer from the buffer overflow problem. Small causes, big effects.
Notationally, if the name s denotes a string and i denotes an integer, s[i] is syntactic sugar (i.e., is unparsed into) String.get s i.
In Exercise 0.1.h, for example, “adding 1 iteratively n times to 0” means that we start at 0, and then we add 1, which gives 1, and then we add 1, which gives 2, and then we add 1, which gives 3, etc., iteratively.
Magritte: I approve this mini-interlude.
Harald: Wow. s and s.
Alfrothul (o _ o): s and s.
Harald: One is the name of the string in OCaml, and the other is the string.
Alfrothul: Ah, right. Names and what these names denote.
Harald: Yes. A useful distinction to make, isn’t it?
Alfrothul: In this context, yes. Let me try: evaluating String.length s amounts to applying the function denoted by String.length to the string denoted by s. How am I doing?
Harald: You are doing good, I believe. How about String.length "aBcD"?
Alfrothul: Same same. Evaluating String.length "aBcD" amounts to applying the function denoted by String.length to the string denoted by "aBcD".
Vigfus: But the string denoted by "aBcD" is "aBcD", since "aBcD" is a literal. Wait. Actually no: one is the syntax of the string – how it’s written, and the other is its semantics – what it means, i.e., an OCaml value of type string. I think I’m getting the hang of this.
Since Version 4.00.0, the OCaml library function for strings features String.map, which has type (char -> char) -> string -> string. To quote its documentation, “String.map f s applies function f in turn to all the characters of s (in increasing index order) and stores the results in a new string that is returned.”
For example, String.map can be used to map a string to another string of the same length but containing only the character 'a', as captured in the following unit-test function:
let test_to_a candidate =
let b0 = (candidate "" = "")
and b1 = (candidate "x" = "a")
and b2 = (candidate "xy" = "aa")
and b3 = (candidate "xyz" = "aaa")
(* etc. *)
in b0 && b1 && b2 && b3;;
To implement to_a, we supply the constant function fun c -> 'a' to String.map and to the given string:
let to_a s =
String.map (fun c -> 'a') s;;
This implementation passes the unit test:
# test_to_a to_a;;
- : bool = true
#
In words, fun c -> 'a' is applied to each character of the given string, and the resulting characters 'a' all occur in the resulting string:
Also, since c is not used in the body of fun c -> 'a', we can make that manifest by writing _ instead:
let to_a s =
String.map (fun _ -> 'a') s;;
Alfrothul: Ah! Now we can write a generator of real random strings.
Harald: A generator of real random strings. We already had a generator of random strings, didn’t we?
Alfrothul: Yes we did, but they all contained the same character.
Harald: The same random character, yes.
Alfrothul: And now we can manufacture random strings containing random characters.
Harald: Pray explain.
Alfrothul: Well, we first generate a string of random length, and then we map each of its characters to a random one using String.map. Look:
let random_string n =
String.map (fun _ -> random_char ()) (String.make (Random.int n) ' ');;
Harald (prudently): I see that your function has type int -> string.
Alfrothul: It does. Given a non-negative integer, it first creates a string of random length containing the space character using String.make, and then it maps each occurrence of the space character into a random character using String.map. Look:
# random_string 10;;
- : string = "(+%- .62"
# random_string 10;;
- : string = " 3+>,"
# random_string 10;;
- : string = ""
# random_string 10;;
- : string = "0?8,."
#
Harald: Harumph.
Keeping in mind that int_of_char and char_of_int respectively map a character into its ASCII code and the ASCII code of a character into this character, define a function that capitalizes a string of lowercase letters and another function that maps a string of uppercase letters to the corresponding strings of lowercase letters:
let test_upper candidate =
let b0 = (candidate "abc" = "ABC")
(* etc. *)
in b0;;
let test_lower candidate =
let b0 = (candidate "ABC" = "abc")
(* etc. *)
in b0;;
Vigfus: We are supposed to extend these unit-test functions with our own tests, right?Mimer: Right.Brynja: Any other piece of advice?Mimer: Read the general interlude below.
(The input of upper is assumed to be a string of lowercase letters and the input of lower is assumed to be a string of uppercase letters.)
Subsidiary questions, keeping in mind that in Emacs, M-b is used to make the cursor jump to the beginning of the latest word (or of the current word), and M-f is used to make the cursor jump to the end of the next word (or of the current word):
Harald: But what if the given string does not exclusively contain lowercase letters?
Alfrothul: Well, we were asserted it does.
As a continuation of Exercise 1, define a function that capitalises an arbitrary string of letters and another function that maps an arbitrary string of letters to the corresponding strings of lowercase letters:
let test_opportunistic_upper candidate =
let b0 = (candidate "abc" = "ABC")
and b1 = (candidate "ABC" = "ABC")
and b2 = (candidate "Abc" = "ABC")
and b3 = (candidate "aBc" = "ABC")
and b4 = (candidate "abC" = "ABC")
(* etc. *)
in b0 && b1 && b2 && b3 && b4;;
let test_opportunistic_lower candidate =
let b0 = (candidate "ABC" = "abc")
and b1 = (candidate "abc" = "abc")
and b2 = (candidate "Abc" = "abc")
and b3 = (candidate "aBc" = "abc")
and b4 = (candidate "abC" = "abc")
(* etc. *)
in b0 && b1 && b2 && b3 && b4;;
(The input of opportunistic_upper is assumed to be a string of letters and the input of opportunistic_lower is assumed to be a string of letters.)
Alfrothul: Fun fact – since Version 4.02.0, the OCaml library function for strings also features a function String.mapi, which has type (int -> char -> char) -> string -> string. To quote its documentation, “String.mapi f s calls f with each character of s and its index (in increasing index order) and stores the results in a new string that is returned.”
Harald: So... We should be able to map any string to a string of the same length, but containing successive digits. Blam.
Alfrothul: Beg pardon?
Harald: Let me sketch a unit-test function:
let test_blam candidate =
let b0 = (candidate "" = "")
and b1 = (candidate "a" = "0")
and b2 = (candidate "ab" = "01")
and b3 = (candidate "abc" = "012")
and b4 = (candidate "abcd" = "0123")
(* etc. *)
in b0 && b1 && b2 && b3 && b4;;
Alfrothul: I see. Pretty straighforward indeed:
let blam s =
String.mapi (fun i _ -> char_of_int (i + 48)) s;;
Harald: i + 48, eh.
Alfrothul: That’s because the ASCII code of 0 is 48:
# int_of_char '0';;
- : int = 48
# char_of_int 48;;
- : char = '0'
# char_of_int (48 + 1);;
- : char = '1'
# char_of_int (48 + 2);;
- : char = '2'
#
Harald: I see. But there is no way I want to remember that the ASCII code of 0 is 48.
Alfrothul: What’s that now?
Harald: I mean we should be more explicit in the definition of blam:
let blam s =
String.mapi (fun i _ -> char_of_int (i + int_of_char '0')) s;;
Alfrothul: Mmmh... Right. Still, be that as it may, blam passes the unit tests too, which was the point:
# test_blam blam;;
- : bool = true
#
Harald: And what if blam is applied to a string that is longer than 10? Let me try:
# String.length "abcdefghijkl";;
- : int = 12
# blam "abcdefghijkl";;
- : string = "0123456789:;"
#
Alfrothul: Well, that’s the ASCII table for you. After the characters 8 and 9, the characters : and ;.
Be a pal and
Could you parameterize blam to yield a cycle that is shorter than 10? For example, when applied to a string and to 6, the result should be the cyclic string "012345012345012345...". Of course a new unit-test function will be needed too.
Subsidiary questions:
Vigfus: I’m sorry but String.mapi is undefined:
# String.mapi;;
Characters 0-11:
String.mapi;;
^^^^^^^^^^^
Error: Unbound value String.mapi
Did you mean map?
#
Brynja: What is your version of OCaml?
Vigfus: When OCaml starts, it displays:
OCaml version 4.01.0
#
Brynja: That’s why – mapi was only added to the strings library since Version 4.02.0.
Vigfus: OK?
Brynja: Can you upgrade your version of OCaml?
Vigfus (with a small voice): No?
Mimer: That’s OK. The accompanying resource file now contains a definition of mapi called Stringy.mapi, as well as a definition of init, called Stringy.init, so just add a y at the end of String in your programs and you are good to go.
Vigfus (clickety clickety click): Yay! I can now generate the 26 letters of the alphabet in a string.
Harald (O_o): Letters? Alphabet? String?
Vigfus: Well, yes. We can map a string of 26 characters to a string that contains all successive letters of the alphabet. Look:
# let lower_case_alphabet =
Stringy.mapi (fun i _ -> char_of_int (i + int_of_char 'a')) (String.make 26 ' ');;
val lower_case_alphabet : string = "abcdefghijklmnopqrstuvwxyz"
#
Mimer: Way to go, Vigfus!
Loki: Wow, Mimer.Mimer: Yes?Loki: That was a quick and dirty definition of Stringy.mapi.Mimer: Did you expect a reference implementation?Loki: Well, it is effective.Mimer: And this is just a side comment.Halcyon: There is nothing outside context, is there.Loki: What about the context of the context? There could be a whole hierarchy there.Jacques Derrida: Do you guys need a chairperson?Mimer: Prof. Derrida, thanks for stopping by!
Can String.map be defined in terms of String.mapi (or Stringy.mapi)?
Vigfus: On second thoughts, though...
Brynja: On second thoughts?
Vigfus: It would be a better idea to use Stringy.init here to generate the 26 letters of the alphabet in a string.
Brynja: Pray tell.
Vigfus (taking a deep breath): Well, since Version 4.02.0, the strings library also contains an init function of type int -> (int -> char) -> string that we can use to initialize a string. Given the length of the desired string and a function of type int -> char, this init function yields a string where each of the characters are the result of applying the given function to each of the successive valid indices in this string. And since Mimer has just made it available as Stringy.init, we can use it here. Look:
# Stringy.init 26 (fun i -> char_of_int (i + int_of_char 'A'));;
- : string = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
#
Brynja: Mimer is right. Way to go, Vigfus!
Alfrothul: Indeed. Well spotted, Vigfus: the generator of random strings too is better implemented with Stringy.init:
let random_string n =
Stringy.init (Random.int n) (fun _ -> random_char ());;
Integers can be converted to strings (in base 10) and vice-versa:
# string_of_int 123;;
- : string = "123"
# int_of_string "123";;
- : int = 123
#
The function converting a string to an integer is a left inverse of the function converting an integer to a string:
# int_of_string (string_of_int 543210);;
- : int = 543210
# int_of_string "hello world";;
Exception: Failure "int_of_string".
#
Negative integers can also be formatted into a string, but on the occasion one must help the OCaml parser by wrapping them between parentheses or prefixing their sign with ~, so that the negative sign is not read as the infix subtraction operator, which gives rise to, ah, unintuitive error messages:
# int_of_string "-321";;
- : int = -321
# string_of_int -123;; (* this expression is read as subtracting 123 to string_of_int *)
Characters 0-13:
string_of_int -123;;
^^^^^^^^^^^^^
Error: This expression has type int -> string
but an expression was expected of type int
# string_of_int (-123);; (* fix: isolate the negative integer between cosy parentheses *)
- : string = "-123"
# string_of_int ~-123;; (* alternative fix: mark the negative sign as such *)
- : string = "-123"
#
To generalize string_of_int, OCaml offers a library function to format strings, Printf.sprintf, that returns a formatted string. To a first approximation, its first argument is the string to print out (so there is nothing to format here):
# Printf.sprintf "Hello world";;
- : string = "Hello world"
#
This argument string is actually a control directive that determines the behavior of Printf.sprintf: if, for example, it contains an occurrence of %i, then Printf.sprintf expects not just the control string, but an integer too – in fact, as many integers as occurrences of %i in the control string. These integers are formatted into part of the resulting string:
# Printf.sprintf "We are in %ind Street." 42;;
- : string = "We are in 42nd Street."
# Printf.sprintf "We stand at the corner of %ind Street and %ith Avenue." 42 5;;
- : string = "We stand at the corner of 42nd Street and 5th Avenue."
#
Here, the control directive %i specifies that an integer is expected. Other control directives exist, e.g., %s for strings and %B for Booleans:
# Printf.sprintf "Is %s %B?\n" "that" true;;
- : string = "Is that true?\n"
#
There is more information in the OCaml manual.
OCaml also offers a library function to format strings and print them out, Printf.printf, that returns the unit-type value. The only difference between Printf.sprintf and Printf.printf is that one returns the formatted string whereas the other prints it out and returns the unit-type value. In other words, the following expressions are equivalent:
To wit:
# Printf.printf "Hello world\n";;
Hello world
- : unit = ()
# Printf.printf "%s" (Printf.sprintf "Hello world\n");;
Hello world
- : unit = ()
# Printf.printf "%s" (Printf.sprintf "%s" (Printf.sprintf "Hello world\n"));;
Hello world
- : unit = ()
# Printf.printf "%s" (Printf.sprintf "%s" (Printf.sprintf "%s" (Printf.sprintf "Hello world\n")));;
Hello world
- : unit = ()
# Printf.printf "We are in %ind Street.\n" 42;;
We are in 42nd Street.
- : unit = ()
# Printf.printf "%s" (Printf.sprintf "We are in %ind Street.\n" 42);;
We are in 42nd Street.
- : unit = ()
# Printf.printf "%s" (Printf.sprintf "%s" (Printf.sprintf "We are in %ind Street.\n" 42));;
We are in 42nd Street.
- : unit = ()
# Printf.printf "%s" (Printf.sprintf "%s" (Printf.sprintf "%s" (Printf.sprintf "We are in %ind Street.\n" 42)));;
We are in 42nd Street.
- : unit = ()
#
Added the food for thought [23 Feb 2020]
Added Stringy.init in the resource file and the ending of the general interlude [22 Feb 2020]
Added Stringy.mapi in the resource file and a continuation of the general interlude in the narrative [21 Feb 2020]
Added “inclusively” in the statement of Exercise 0 [17 Feb 2020]
Added “Put otherwise” in Exercise 0.1.i, Exercise 0.1.j, Exercise 0.1.k, and Exercise 0.1.l to remove any ambiguity [17 Feb 2020]
Added Exercise 0, the mini-interlude, and the microscopic (but helpful) interlude [15 Feb 2020]
Updated and aligned with what came before [14 Feb 2020]
Expanded the description of String.map [08 Jun 2019]
Factored out of the previous lecture note [08 Jun 2019]