Languages and language processors

Goal

This lecture note introduces the ground notions of languages and of language processors.

Programming languages, programs, and data

A programming language is a notation to express computations. A program is something written according to this notation. It can be executed to perform a computation. A computation is some kind of processing operation over some data. Data are representations of information.

Here is an analogy. A natural language (Danish, for example) is a collection of words assembled into sentences. Words are composed of letters, and sentences are composed of words and punctuation marks. There is a common agreement about correct Danish words (spelling) and about correct Danish sentences (grammar). Spelling and grammar pertain to the syntax of Danish. Sentences are then communicated, either orally or in writing, from someone to someone else, to convey a meaning. Meaning pertains to the semantics of Danish. To stay within the analogy, let us only consider written communication in the rest of this paragraph. Improperly spelled words and improperly constructed sentences are misunderstood or not understood at all. Understandable sentences carry information.

T.T.T.

Piet Hein

Here is a comparison. A cooking recipe is a notation that conveys how to cook something. It specifies data (the ingredients, e.g., eggs, butter, salt, pepper, thyme), resources and tools (e.g., a stove and a pan), and an algorithm (a method to operate on the data, e.g., to beat the eggs towards cooking an omelet). To make a dish, a cook can then operate over the ingredients according to the recipe.

Notion of meta-language

Typically, a foreigner learns Danish using a textbook. Say that the foreigner is French. The textbook is likely to be written in French. The foreigner then discovers the Danish language through explanations written in French. For this foreigner, French is the meta-language of Danish: the language to talk about another language.

At school, a native Dane will learn the proper spelling of Danish words and the correct grammar of Danish with a textbook. This textbook is naturally written in Danish. For this Dane, Danish is the meta-language of Danish: they talk about Danish in Danish. This Dane will also learn English at school, also using a textbook. This textbook is of course written in Danish, and therefore, for this Dane, Danish will be the meta-language of English: they will talk about English in Danish.

Interpreters

An interpreter is a program for executing another program.

An interpreter is written in a meta-language (also known as: an implementation language). It is often depicted with an “I-diagram” (“I” for “interpreter”), where

  • S-L stands for the source language (of the interpreted programs) and
  • M-L stands for the meta-language:
_images/ditaa-a95feb256bad9462ac5cf844432040d1a14fbf07.png

Terminology: When interpreters for programs written in assembly language / byte code are implemented in hardware, they are called microprocessors. When they are implemented in software, they are called virtual machines.

Virtual machines are a topic of study in themselves.

  • For example, here are two artifacts (i.e., man-made constructs):

    1. an x86 microprocessor, depicted as follows:

      _images/ditaa-0f1047e36a92bd3eac68923fcae089f6ffd82545.png
    2. a program written in x86, depicted as follows:

      _images/ditaa-56c210c8dd582bf68a05d1a7c7d2ba9ff0b46c9f.png

    We depict the execution of the x86 program by putting the two pictures on top of each other:

    _images/ditaa-d91a005b660c64bc866427a6ca20df6eb0c92aef.png

    The x86 microprocessor executes the program written in x86.

    • Note how x86 is both the source language of the x86 microprocessor and the language in which the program is written.
  • For example, here are three artifacts:

    1. the same x86 microprocessor as above:

      _images/ditaa-0f1047e36a92bd3eac68923fcae089f6ffd82545.png
    2. an interpreter for OCaml written in x86, depicted as follows:

      _images/ditaa-af3cf25034e3e468151787d111bf9a668ec82240.png
    3. a program written in OCaml, depicted as follows:

      _images/ditaa-80d0be43f0e6102113afb555ebfa17f0666cd81b.png

    We depict the execution of the OCaml program by putting the three pictures on top of each other:

    _images/ditaa-24608475801963401c30ef93deb33c30772cb0a6.png

    The x86 microprocessor executes the interpreter for OCaml written in x86, and this interpreter executes the OCaml program.

    • Note how OCaml is both the source language of the interpreter and the language in which the OCaml program is written.
    • Note how x86 is both the source language of the x86 microprocessor and the implementation language of the interpreter.
  • For another example, here are four artifacts:

    1. the same x86 microprocessor as above:

      _images/ditaa-0f1047e36a92bd3eac68923fcae089f6ffd82545.png
    2. the same interpreter for OCaml written in x86 as above:

      _images/ditaa-af3cf25034e3e468151787d111bf9a668ec82240.png
    3. an interpreter for PHP written in OCaml, depicted as follows:

      _images/ditaa-2715ad9af001cb0dd9417e1866b9a2c309897de2.png
    4. a program written in PHP, depicted as follows:

      _images/ditaa-dfa433a0305ed913240f88d7a7ec0df123959a47.png

    We depict the execution of the PHP program by putting the four pictures on top of each other:

    _images/ditaa-646ad49585cbc760c4879649ba19adab52c85172.png

    The x86 microprocessor executes the interpreter for OCaml written in x86, this interpreter executes the interpreter for PHP written in OCaml, and this interpreter executes the PHP program.

    • Note how PHP is both the source language of the interpreter and the language in which the PHP program is written.
    • Note how OCaml is both the source language of the interpreter for OCaml written in x86 and the implementation language of the interpreter for PHP written in OCaml.
    • Note how x86 is both the source language of the x86 microprocessor and the implementation language of the interpreter for OCaml written in x86.

Compilers

A compiler is a program for translating another program from one language (the “source language”) to another (the “target language”). Given a source program (i.e., a program written in the source language), a compiler produces a target program (i.e., a program written in the target language). Compilers preserve the meaning of the programs they translate: the source program (i.e., the input of the compiler) and the target program (i.e., the output of the compiler) mean the same in the sense that executing either does the same as executing the other.

A compiler is written in an implementation language. It is often depicted with a “T-diagram” (“T” for “translator”), where

  • S-L stands for the source language,
  • T-L stands for the target language (of the compiled programs), and
  • I-L stands for the implementation language:
_images/ditaa-7822f155e36df5699f9241f11ae432681e7d373d.png

Compilers are a traditional topic of study in themselves.

  • For example, given (1) an x86 microprocessor, (2) a compiler from OCaml to x86 written in x86, and (3) a OCaml program, one can execute the OCaml program by (a) compiling it from OCaml to x86, resulting in a x86 program, and (b) executing this compiled program with the x86 microprocessor.

    • Here is a picture of Step (a):

      _images/ditaa-7a1f9cd6b034f520d1051981457a5efe0219dda6.png

      The x86 microprocessor executes the compiler from OCaml to x86, and this compiler translates the OCaml program to an x86 program. Note how x86 is both the source language of the x86 microprocessor and the implementation language of the compiler.

    • And here is a picture of Step (b):

      _images/ditaa-9bb0f6da03f2b07ff637fe3b2f41158fd3eb050b.png

    Executing the OCaml program and executing the x86 program do the same. For example, if the OCaml program, when given an integer, computes whether this given integer is prime, then the x86 program, when given an integer, also computes whether this given integer is prime.

  • For another example, given (1) an x86 microprocessor, (2) a compiler from OCaml to x86 written in x86, (3) a compiler from PHP to OCaml written in x86 and (4) a PHP program, one can execute the PHP program by (a) compiling it from PHP to OCaml, resulting in a compiled OCaml program, (b) compiling the resulting OCaml program from OCaml to x86, resulting in a compiled x86 program, and (c) executing the resulting x86 program with the x86 microprocessor.

    • Here is a picture of Step (a):

      _images/ditaa-e31f9f777d77612661d72da964d2293bcb1e25de.png
    • Here is a picture of Step (b):

      _images/ditaa-7a1f9cd6b034f520d1051981457a5efe0219dda6.png
    • And here is a picture of Step (c):

      _images/ditaa-9bb0f6da03f2b07ff637fe3b2f41158fd3eb050b.png

    Executing the PHP program, executing the OCaml program, and executing the x86 program do the same.

    Note how OCaml is both the target language of the first compiler and the source language of the second compiler: the point of this example is that the output program of the first compiler is the input program of the second compiler.

    For brevity, Steps (a) and (b) are often combined as follows, pictorially:

    _images/ditaa-2c9c31ee48276f323902ff9765167889c404a31f.png

Exercise 1

One is given:

  • an x86 microprocessor,
  • an interpreter for Python written in x86,
  • a compiler from Python to x86 written in x86, and
  • a program written in Python.

Questions:

  1. Can one execute the Python program? If yes, describe how to do that.
  2. Can one execute the Python program in more than one way? If yes, describe how to do that.
  3. In how many ways can one execute the Python program?

Practical suggestion: draw your solution with a pencil, take a low-resolution snapshot of it, and include this snapshot in your report.

Combining interpreters and compilers

Interpreters and compilers are often combined to obtain new interpreters and new compilers.

  • For example, given an interpreter for a source language L2 written in a meta-language L1, and an interpreter for the source language L3 written in the meta-language L2, one obtains an interpreter for L3 by interpreting the second interpreter with the first, given a L1 processor. Pictorially:

    _images/ditaa-0a9b3652e315275c974b43ee4631fabf6f358362.png

    The L1 processor executes the interpreter for L2 written in L1, and the interpreter for L2 written in L1 executes the interpreter for L3 written in L2. (Note how L2 is both the source language of the first interpreter and the meta-language of the second interpreter.)

    Macroscopically, the resulting interpreter for L3 looks like it is written in L1:

    _images/ditaa-d8f5979d9009ca0bfe5e94009b68206dbd97aecb.png

    Microscopically, inside this I-diagram, there are the two I-diagrams above:

    _images/ditaa-3759f81652157baec093f1fd123ffef5f81b3b05.png
  • For example, given an interpreter for an implementation language I-L written in a meta-language M-L, and a compiler from a source language S-L to a target language T-L written in the implementation language I-L, one obtains a compiler from S-L to T-L by executing the compiler with the interpreter. Pictorially:

    _images/ditaa-06535cc30aeb66ce3ab3bf3e5d2f46efe3828025.png

    Macroscopically, the resulting compiler from S-l to T-L looks like it is written in M-L. Pictorially:

    _images/ditaa-48a582a724287a96b435d018e65c1e18e1cc17ce.png

    Microscopically, inside this T-diagram, there are the T-diagram and the I-diagram above:

    _images/ditaa-c97ec6ae9ad8e285262b0328f86f0363882c4075.png
  • For example, given an interpreter for Scheme written in C and a compiler from C to, say a language A (e.g., byte code or assembly language), one obtains an interpreter for Scheme written in A by compiling it. Pictorially:

    _images/ditaa-fb57942f36b870823aa58478afe4a1feb91dc621.png

    The input of the compiler is a program written in C, and therefore the output is a program written in A since the compiler translates programs from C to A. The input of the compiler is an interpreter for Scheme, and therefore the output is also an interpreter for Scheme since a compiler preserves the meaning of the programs it translates. Therefore since the input is an interpreter for Scheme written in C, the output is an interpreter for Scheme written in A.

Food for thought:

  • How would one compile a compiler?
  • What would be the result?

(Hint: see Phase 1 in Bootstrapping an ML compiler later on.)

Vocabulary: executing and running

In practice, the terms “executing a program” and “running a program” are used interchangeably. Therefore an interpreter is often also said to be a program for running another program. Likewise, one uses the term “run time” to refer to what happens at execution time, i.e., when a program runs (i.e., when it is executed).

Self-interpreters and self-compilers

When an interpreter is written in (a subset of) its source language, it can interpret (a copy of) itself. Pictorially:

_images/ditaa-43bd4e87ae7b00ffe584b46ea118530a2778de49.png

An interpreter that can interpret (a copy of) itself is called a self-interpreter.

When a compiler is written in (a subset of) its source language, it can compile (a copy of) itself. Pictorially:

_images/ditaa-795f0845c76302cdd0e40e06092510421d78e608.png

The input of the compiler is a program written in L, and the output is a program written in A since the compiler translates programs from L to A. The input of the compiler is a compiler from L to A, and therefore the output is also a compiler from L to A since a compiler preserves the meaning of the programs it translates. Therefore since the input is a compiler from L to A written in L, the output is a compiler from L to A written in A.

A compiler that can compile (a copy of) itself is called a self-compiler. (See Bootstrapping an ML compiler later on.)

Exercise 2

One is given:

  • an x86 microprocessor,
  • an interpreter for Scheme written in C,
  • a compiler from C to x86 written in x86, and
  • a compiler from ML to C written in Scheme.

Can one execute a program written in ML, and if so how?

Hint: draw I- and T-diagrams, and then play with them as you would with LEGO bricks.

Solution for Exercise 2, Version 1: top down, depth first

Can we execute a program written in ML, and if so how?

Rhetorical question: Which language processor do we have for ML?

Answer: We only have one, namely

  • a compiler from ML to C written in Scheme.

If we can’t execute this compiler, the overall answer is negative, so let’s look at that now.

Rhetorical question: Which language processors do we have for Scheme?

Answer: We only have one, namely

  • an interpreter for Scheme written in C.

If we can’t execute this interpreter, the overall answer is negative, so let’s look at that now.

Rhetorical question: Which language processor do we have for C?

Answer: We only have one, namely

  • a compiler from C to x86 written in x86.

If we can’t execute this compiler, the overall answer is negative, so let’s look at that now.

Rhetorical question: Which language processor do we have for x86?

Answer: We only have one, namely

  • an x86 microprocessor.

So we can execute the compiler from C to x86 written in x86, using the x86 microprocessor.

So we can compile the interpreter for Scheme written in C, obtaining an interpreter for Scheme written in x86. Pictorially:

_images/ditaa-a6e3e3aaafee371056050c1fe116749a00bfae57.png

The result is an interpreter for Scheme written in x86.

Rhetorical question, revisited: Which language processors do we have for Scheme?

Answer: We now have two, namely

  • an interpreter for Scheme written in C, and
  • an interpreter for Scheme written in x86.

So we can compile a program written in ML into a program written in C, using the interpreter for Scheme written in x86 and the x86 microprocessor. We can then compile the resulting C program into a program written in x86, using the x86 microprocessor, and we can execute the resulting x86 program, again using the x86 microprocessor.

Pictorially:

_images/ditaa-e796e0955a8b5434b75454fffaacc8c986330bee.png

So overall the answer is yes: we can compile a program written in ML, as depicted just above.

Solution for Exercise 2, Version 2: top down, breadth first

Can we execute a program written in ML, and if so how?

Rhetorical question: Which language processors do we have for ML?

Answer: We only have one, namely

  • a compiler from ML to C written in Scheme.

Assuming we can somehow execute this compiler, the question would then reduce to the following one:

Can we execute a program written in C, and if so how?

Rhetorical question: Which language processors do we have for C?

Answer: We only have one, namely

  • a compiler from C to x86 written in x86.

Assuming we can somehow execute this compiler, the question would then reduce to the following one:

Can we execute a program written in x86, and if so how?

Rhetorical question: Which language processors do we have for x86?

Answer: We only have one, namely

  • an x86 microprocessor.

Therefore we can execute a program written in x86, using the x86 microprocessor.

Therefore we can execute a program written in C, using the compiler from C to x86 and then the x86 microprocessor, provided we can execute the compiler from C to x86 written in x86.

Therefore we can execute a program written in ML, provided we can execute the compiler from C to x86 written in x86 and the compiler from ML to C written in Scheme.

We can execute the compiler from C to x86 written in x86 since we have an x86 microprocessor. Pictorially:

_images/ditaa-d5bef92d7922a92641d9275711424d51623384c6.png

So the problem reduces to executing the compiler from ML to C written in Scheme.

Rhetorical question: Which language processors do we have for Scheme?

Answer: We only have one, namely

  • an interpreter for Scheme written in C.

Assuming we can somehow execute this interpreter, the question would reduce to the following one:

Can we execute a program written in C, and if so how?

We have already answered this question above: using the compiler from C to x86 written in x86 (and the x86 microprocessor to execute it).

So let us use this compiler from C to x86 to compile the interpreter written in C. Pictorially:

_images/ditaa-a6e3e3aaafee371056050c1fe116749a00bfae57.png

The result is an interpreter for Scheme written in x86.

We can now execute the program in ML by:

  1. interpreting the compiler from ML to C written in Scheme (using the interpreter for Scheme written in x86 and the x86 microprocessor) and using it to compile the program written in ML to a program written in C;
  2. executing the compiler from C to x86 (using the x86 microprocessor) and using it to compile the resulting program written in C to a program written in x86; and
  3. executing the resulting program written in x86 (using the x86 microprocessor).

Pictorially:

_images/ditaa-e796e0955a8b5434b75454fffaacc8c986330bee.png

Solution for Exercise 2, Version 3: bottom up

  1. Since we

    • have a compiler from C to x86 written in x86 (given) and
    • have an x86 microprocessor (given),

    we can compile C programs to x86.

  2. Since we

    • can compile C programs to x86 (Point a) and
    • have an x86 microprocessor (given),

    we can execute C programs.

  3. Since we

    • have an interpreter for Scheme written in C (given) and
    • can execute C programs (Point b),

    we can execute Scheme programs.

  4. Since we

    • have a compiler from ML to C written in Scheme (given) and
    • can execute Scheme programs (Point c),

    we can compile ML programs to C.

  5. Since we

    • can compile ML programs to C (Point d) and
    • can execute C programs (Point b),

    we can execute ML programs.

So the answer to the question is yes: we can execute a program written in ML (Point e).

Exercise 3

One is given:

  • an x86 microprocessor,
  • a compiler from Java Byte Code to x86 written in Java Byte Code,
  • an interpreter for Scheme written in Scheme,
  • a compiler from Scheme to Java Byte Code written in x86, and
  • a compiler from Schelog to Scheme written in x86.

Can one execute a program written in Schelog, and if so how?

Solution for Exercise 3, top down

Rhetorical question: Which language processor do we have for Schelog?

Answer: We have

  • a compiler from Schelog to Scheme written in x86.

This compiler can be used because to run it, we need a language processor for x86, and we have one: the x86 microprocessor.

So let us use this compiler from Schelog to Scheme to compile the program written in Schelog. Pictorially:

_images/ditaa-ab0caf8b468c78862fe0b8b753ffe43a9934569d.png

So the problem reduces to processing the resulting Scheme program.

Rhetorical question: Which language processor do we have for Scheme?

Answer: We have

  • an interpreter for Scheme written in Scheme, and
  • a compiler from Scheme to Java Byte Code written in x86.
Murph: Each iteration becomes an attempt to prove its own proof.
It’s recursive. Nonsensical.
Professor Brand: Are you calling my life’s work “nonsense”?

Interstellar

The interpreter cannot be used because to run it, we need a language processor for Scheme, and we do not have one. (This Scheme interpreter is useless because to run it, we need a Scheme interpreter. But to run this Scheme interpreter, we need a Scheme interpreter. And to run this Scheme interpreter, we need a Scheme interpreter – a circular argument. So all in all, we cannot interpret any Scheme program.)

The compiler can be used because to run it, we need a language processor for x86, and we have one: the x86 microprocessor.

So let us use this compiler from Scheme to Java Byte Code to compile the resulting program written in Scheme. Pictorially:

_images/ditaa-db7e04011199940b7ce9391b6b5ac8053b69a650.png

So the problem reduces to processing the resulting byte-code program.

Rhetorical question: Which language processor do we have for Java Byte Code?

Answer: we have

  • a compiler from Java Byte Code to x86 written in Java Byte Code.
Loki the Mischievous: But we do have one:
the compiler from Java Byte Code to x86 written in Java Byte Code.
(Boy, I love those circular arguments.)
Murph: Why? They are nonsensical.

This compiler cannot be used because to run it, we need a language processor for Java Byte Code, and we do not have one.

Therefore, we cannot process the resulting byte-code program.

So all in all, we cannot execute a program written in Schelog.

Solution for Exercise 3, bottom up

  1. Since we

    • have a compiler from Schelog to Scheme written in x86 (given) and
    • have an x86 microprocessor (given),

    we can compile Schelog programs to Scheme. Furthermore that is the only thing we can do with Schelog programs.

  2. Since we

    • have a compiler from Scheme to Java Byte Code written in x86 (given) and
    • have an x86 microprocessor (given),

    we can compile Scheme programs to Java Byte Code. Furthermore that is the only thing we can do with Scheme programs: the Scheme self-interpreter is useless to process programs written in Scheme.

  3. Since we

    • can compile Schelog programs to Scheme (Point a) and
    • can compile Scheme programs to Java Byte Code (Point b)

    we can compile Schelog programs to Java Byte Code.

  4. However, since we

    • cannot interpret Java Byte Code programs and
    • cannot compile Java Byte Code programs either,

    we cannot process the Java Byte Code programs.

So the answer to the question is no: we cannot execute a program written in Schelog (Point d).

Exercise 4

This exercise is a followup to Exercise 3. Under the same conditions:

  1. Can one execute a program written in Scheme?
  2. Can one execute a program written in Java Byte Code?
  3. Can one execute a program written in Java?
  4. Can one execute a program written in JavaScript?

Motivate your answers.

Partial solution for Exercise 4

  1. No.
  2. No.
  3. No.
  4. No.

Exercise 5

One is given:

  • an ARM microprocessor,
  • an interpreter for x86 written in ARM,
  • an interpreter for Perl written in x86,
  • a compiler from ML to C written in Perl,
  • a compiler from C to x86 written in ARM, and
  • a compiler from Prolog to ARM written in ML.

Can one execute a program written in Prolog, and if so how?

Practical suggestion: draw your solution with a pencil, take a low-resolution snapshot of it, and include this snapshot in your report.

Exercise 6

This exercise revisits Exercise 5, replacing the interpreter for x86 written in ARM by a compiler from x86 to ARM written in ARM.

Can one execute a program written in Prolog, and if so how?

Exercise 7

This exercise revisits Exercise 5, replacing the interpreter for x86 written in ARM by a compiler from x86 to ARM written in x86.

Can one execute a program written in Prolog, and if so how?

Version

Emphasized that compilers preserve the meaning of the programs they translate, following a question by Amanda Leong Qi [20 Jan 2020]

Added the practical suggestion in Exercise 1 and Exercise 5 [17 Jan 2020]

Created [14 Jan 2019]