Bootstrapping¶

Goal¶

This note is dedicated to compiling self-compilers.

Bootstrapping an ML compiler¶

Say that we agree with David MacQueen that ML is a wonderful domain-specific language to write compilers, in that it provides expressive support to implement a compiler that we find excellent. (Note: ML is really a family of programming languages and OCaml is a member of this family.) We also happen to have a brand new computer, for which somehow there is only an executable compiler from C to assembly language. Pictorially:

_images/ditaa-13002fca5de42f3ab47fd33cd50f65314bf034ee.png

Finally, we have an executable byte-code interpreter (i.e., a virtual machine) for this assembly language. Pictorially:

_images/ditaa-a96b47a876787e79f590c92a67bcde4c9c33373b.png

We want to write an ML compiler for our new computer. Naturally, we wish to write it in ML, not in C.

Everything should be built top-down,

except the first time.

—Alan Perlis‘s programming epigram #11

The rest of this section describes how to bootstrap our ML compiler so that we can run it on our new computer. (Bootstraps are straps at the top of cowboy boots: to put on his boots, a cowboy inserts his feet in his boots, sticks a finger in each strap, and pulls the boots towards him. Metaphorically, if the cowboy pulls vigorously enough, he can lift himself off the ground, a paradox that has been illustrated for time travel in Robert Heinlein‘s short stories By His Bootstraps and —All You Zombies— (now a movie), back in the days.)

Phase 0:

We write, in C, a throw-away compiler from ML to assembly language. This compiler is quick and dirty in that it does not compile very well (i.e., it does not generate good compiled code), and it is not efficient (i.e., it does not run fast). It must, however, be correct:

_images/ditaa-b4ea0e7ddbec8ae7a9aaee60455baf2f8a1449c3.png

We label this quick-and-dirty compiler with the number 0 (zero).

Phase 1:

Using the executable C compiler, we compile the quick-and-dirty compiler into assembly language. The result is a compiler from ML to assembly language written in assembly language:

_images/ditaa-1373094078a16265b623bfe82f9d9c2e57bc4aec.png

This compiled compiler is still quick and dirty: it does not work fast and it generates inefficient code. We label it with the number 1 (one).

Phase 2:

We write, in ML, a gorgeous compiler from ML to assembly language:

_images/ditaa-2b1db7ebfffe2b2be8e3924a227470dae8a34c0f.png

This gorgeous compiler is written to work fast and to generate efficient code. We label it with the number 2 (two).

Phase 3:

Using the compiled compiler from Phase 1 and the interpreter for A written in x86, we compile the gorgeous compiler:

_images/ditaa-5df455dde270aa599ff9fab4555818b0c46370f6.png

The result is a compiler from ML to assembly language written in assembly language. This compiler generates the same efficient code as the gorgeous compiler. It is however not fast because it was produced by a quick-and-dirty compiler that does not generate fast code. We label it with the number 3 (three).

Phase 4:

Using the new compiler (i.e., the result of Phase 3), we compile the gorgeous compiler:

_images/ditaa-dbe915eed74a6d7848660fbd25f507306e54e8a0.png

The result is a compiler from ML to assembly language written in assembly language. This compiler generates the same efficient code as the gorgeous compiler. It is also fast because it was produced by a good (if slow) compiler. We label it with the number 4 (four).

Phase 5:

As a verification, if we use the compiler from Phase 4 to compile the gorgeous compiler, we should obtain textually the same compiled code as in Phase 4.

Too much of a good thing can be wonderful.

—Mae West

_images/ditaa-e85b8e0e914c5b3dbd82e3b7564b0c5e1f5323a5.png

Indeed, the compiled compilers (4 and 4’) are obtained from the same gorgeous compiler and using the same algorithm that the gorgeous compiler implements. (We assume this algorithm to be deterministic: given two identical source programs, it produces two identical target programs.)

So all in all, the gorgeous ML compiler has been bootstrapped: given

an x86 microprocessor,
a compiler from C to A written in x86,
an interpreter for A written in x86,
a quick and dirty compiler from ML to A written in C, and
a gorgeous compiler from ML to A written in ML,

we have obtained a gorgeous compiler from ML to A written in A.

A word about correctness:

we assume that the x86 microprocessor is correct;
we assume that the C compiler (i.e., the compiler from C to A written in x86) is correct;
we assume that the virtual machine (i.e., the interpreter for A written in x86) is correct;
we depend on the quick-and-dirty compiler (Version 0) being correct; and
we depend on the gorgeous compiler (Version 2) being correct.

If a compiled program behaves incorrectly, it it because:

there is an error in the source program,
there is an error in the gorgeous compiler (Version 2),
there is an error in the quick-and-dirty compiler (Version 0),
there is an error in the virtual machine, or
there is an error in the x86 microprocessor.

Finding this error can take an arbitrarily long time.

Alfrothul: And if there are several errors?

Harald (sighing): Then it takes even longer to find them.

Alfrothul: How do you know there is an error somewhere?

Harald: Well, you have to look for it.

Loki: Or you use the strategy of the three code monkeys.

Harald: The what of the what?

Loki: The strategy of the three code monkeys: you don’t look and you don’t listen.

Harald: You don’t say.

Loki: Exactly.

Alfrothul: Er, guys? There is an exercise now and it is mandatory.

Loki: Attaboy.

Exercise 1¶

Suppose we don’t write a quick-and-dirty compiler from ML to assembly language written in C but a quick-and-dirty interpreter for ML written in C. Can we still bootstrap the gorgeous ML compiler? (In other words, given an x86 microprocessor, a compiler from C to A written in x86, an interpreter for A written in x86, a quick-and-dirty interpreter for ML written in C, and a gorgeous compiler from ML to A written in ML, can we obtain a gorgeous compiler from ML to A written in A?)

If yes, how do we do that?
If no, why?

Version¶

Polished the narrative [28 Jan 2020]

Created [14 Jan 2019]

Bootstrapping¶

Goal¶

Bootstrapping an ML compiler¶

Exercise 1¶

Version¶

Table Of Contents

Previous topic

Next topic

Navigation

Bootstrapping¶

Goal¶

Bootstrapping an ML compiler¶

Exercise 1¶

Version¶

Table Of Contents

Previous topic

Next topic

Quick search

Navigation