The goal of this term project is to formalize
The capstone of this project is a formal proof of a commuting diagram which says that
give equivalent results. In other words, the input and the output of the compiler are observationally equivalent, i.e., the compiler is semantics preserving, just like in the CompCert project, though of course on a (much) smaller scale.
The target interpreter takes the form of a virtual machine – a stack-based one with a fetch-decode-execute loop.
An arithmetic expression is either a literal, an addition, or a subtraction:
Inductive arithmetic_expression : Type :=
Literal : nat -> arithmetic_expression
| Plus : arithmetic_expression -> arithmetic_expression -> arithmetic_expression
| Minus : arithmetic_expression -> arithmetic_expression -> arithmetic_expression.
The meaning of an arithmetic expression is either a natural number or an error message:
Inductive source_msg : Type :=
Source_msg_numerical_underflow : nat -> source_msg.
Inductive source_expressible_value : Type :=
Source_expressible_nat : nat -> source_expressible_value
| Source_expressible_msg : source_msg -> source_expressible_value.
An arithmetic expression is evaluated using an evaluation function which is specified as follows:
Definition specification_of_evaluate (evaluate : arithmetic_expression -> source_expressible_value) :=
(forall n : nat,
evaluate (Literal n) = Source_expressible_nat n)
/\
((forall (ae1 ae2 : arithmetic_expression)
(n1 n2 : nat),
evaluate ae1 = Source_expressible_nat n1 ->
evaluate ae2 = Source_expressible_nat n2 ->
evaluate (Plus ae1 ae2) = Source_expressible_nat (n1 + n2))
/\
(forall (ae1 ae2 : arithmetic_expression)
(n1 : nat)
(sm2 : source_msg),
evaluate ae1 = Source_expressible_nat n1 ->
evaluate ae2 = Source_expressible_msg sm2 ->
evaluate (Plus ae1 ae2) = Source_expressible_msg sm2)
/\
(forall (ae1 ae2 : arithmetic_expression)
(sm1 : source_msg),
evaluate ae1 = Source_expressible_msg sm1 ->
evaluate (Plus ae1 ae2) = Source_expressible_msg sm1))
/\
((forall (ae1 ae2 : arithmetic_expression)
(n1 n2 : nat),
evaluate ae1 = Source_expressible_nat n1 ->
evaluate ae2 = Source_expressible_nat n2 ->
n1 <? n2 = false ->
evaluate (Minus ae1 ae2) = Source_expressible_nat (n1 - n2))
/\
(forall (ae1 ae2 : arithmetic_expression)
(n1 n2 : nat),
evaluate ae1 = Source_expressible_nat n1 ->
evaluate ae2 = Source_expressible_nat n2 ->
n1 <? n2 = true ->
evaluate (Minus ae1 ae2) = Source_expressible_msg (Source_msg_numerical_underflow (n2 - n1)))
/\
(forall (ae1 ae2 : arithmetic_expression)
(n1 : nat)
(sm2 : source_msg),
evaluate ae1 = Source_expressible_nat n1 ->
evaluate ae2 = Source_expressible_msg sm2 ->
evaluate (Minus ae1 ae2) = Source_expressible_msg sm2)
/\
(forall (ae1 ae2 : arithmetic_expression)
(sm1 : source_msg),
evaluate ae1 = Source_expressible_msg sm1 ->
evaluate (Minus ae1 ae2) = Source_expressible_msg sm1)).
Errors originate in subtractions, since subtracting two natural numbers does not always yield a natural number: for example, 3 - 2 is defined but not 2 - 3.
Based on Chapter Arithmetic expressions with names in Week 12, determine whether the specified evaluation function operates from left to right or from right to left, and then rename specification_of_evaluate to
as appropriate.
Provide an example of an arithmetic expression that gives one result when evaluated from left to right and another when evaluated from right to left.
Prove that the renamed specification is not ambiguous.
Implement a unit-test function – naming it unit_test_for_evaluate_ltr or unit_test_for_evaluate_rtl, as appropriate – that provides code coverage.
Implement a function – naming it evaluate_ltr or evaluate_rtl, as appropriate – that satisfies the renamed specification and verify that it passes the unit tests.
Conclude that the renamed specification is not vacuous.
A source program is an arithmetic expression:
Inductive source_program : Type :=
Source_program : arithmetic_expression -> source_program.
Given a suitable evaluation function, the source interpreter maps a source program to a source expressible value:
Definition specification_of_interpret (interpret : source_program -> source_expressible_value) :=
forall evaluate : arithmetic_expression -> source_expressible_value,
specification_of_evaluate evaluate ->
forall ae : arithmetic_expression,
interpret (Source_program ae) = evaluate ae.
A byte-code instruction is either a push instruction, an add instruction, or a sub instruction:
Inductive byte_code_instruction : Type :=
PUSH : nat -> byte_code_instruction
| ADD : byte_code_instruction
| SUB : byte_code_instruction.
The meaning of a byte-code instruction is the result of decoding it and then executing it. This result is either a data stack or an error message:
Definition data_stack := list nat.
Inductive result_msg : Type :=
Result_msg_numerical_underflow : nat -> result_msg
| Result_msg_stack_underflow : byte_code_instruction -> result_msg.
Inductive result_of_decoding_and_execution : Type :=
OK : data_stack -> result_of_decoding_and_execution
| KO : result_msg -> result_of_decoding_and_execution.
A byte-code instruction is decoded and then executed using a decode-execute function with the following signature:
decode_execute : byte_code_instruction -> data_stack -> result_of_decoding_and_execution
For naming consistency, the decode-execute function should be named decode_execute_ltr or decode_execute_rtl , as appropriate. It is informally specified as follows:
Given a nat n and a data stack ds, evaluating:
decode_execute (PUSH n) ds
should successfully return a stack where n is pushed on ds.
Given a data stack ds that contains at least two nats, evaluating:
decode_execute ADD ds
should
If the data stack does not contain at least two nats, evaluating:
decode_execute ADD ds
should return the result message "Result_msg_stack_underflow ADD".
Given a data stack ds that contains at least two nats, evaluating:
decode_execute SUB ds
should
If the data stack does not contain at least two nats evaluating:
decode_execute SUB ds
should return the result message "Result_msg_stack_underflow SUB".
If the data stack contain at least two nats and if subtracting one nat from the other gives a negative result, evaluating:
decode_execute SUB ds
should return the result message "Result_msg_numerical_underflow n", where n is the same number as in the corresponding source message in the evaluator.
The fourth wall (worried): Huh, just keep it brief, OK?
Dana (to the point): This specification of the decode-execute function is informal. Why is that?
Pablito: True that, it is informal. But it is clear, no?
Dana: Ostensibly, yes.
Anton: In real life, specifications are informal, and most of them are a lot less clear than this one. For one, I am not complaining here. It’s a test of real life.
Dana: All the other specifications are formal.
Anton: But not this one. It’s a taste of real life.
Dana: It probably is.
Pablito: We could formalize it?
Dana: We could.
Anton: That sounds obvious, considering the clarity of the informal description.
Dana: Yes it does.
Anton: Then I’ll pass. I think I know what I am doing here. One has to draw a line.
The virtual machine is implemented with a fetch-decode-execute loop that is specified as follows:
Definition specification_of_fetch_decode_execute_loop (fetch_decode_execute_loop : list byte_code_instruction -> data_stack -> result_of_decoding_and_execution) :=
(forall ds : data_stack,
fetch_decode_execute_loop nil ds = OK ds)
/\
(forall (bci : byte_code_instruction)
(bcis' : list byte_code_instruction)
(ds ds' : data_stack),
decode_execute bci ds = OK ds' ->
fetch_decode_execute_loop (bci :: bcis') ds =
fetch_decode_execute_loop bcis' ds')
/\
(forall (bci : byte_code_instruction)
(bcis' : list byte_code_instruction)
(ds : data_stack)
(rm : result_msg),
decode_execute bci ds = KO rm ->
fetch_decode_execute_loop (bci :: bcis') ds =
KO rm).
For any lists of byte-code instructions bci1s and bci2s, and for any data stack ds, characterize the execution of the concatenation of bci1s and bci2s (i.e., bci1s ++ bci2s) with ds in terms of executing bci1s and of executing bci2s.
Anton: A Eureka lemma now?
The fourth wall (crestfallen): An interlude now? Another one?
Bong-sun: Just a small one, Fourth Wall, just a small one. I believe that arithmetic expressions are to be translated into byte code using list concatenation.
Halcyon: Yes they are – I just peeped forward to Task 6.
Dana: This Eureka lemma should prove handy for the Capstone theorem.
Halcyon: This is all so exciting.
The fourth wall (wiping his forehead): Phew, that was a small one.
A target program is a list of byte-code instructions:
Inductive target_program : Type :=
Target_program : list byte_code_instruction -> target_program.
The meaning of a target program is either a natural number or an error message:
Inductive target_msg : Type :=
Target_msg_numerical_underflow : nat -> target_msg
| Target_msg_stack_underflow : byte_code_instruction -> target_msg
| Target_msg_stack_O : target_msg
| Target_msg_stack_SS : target_msg.
Inductive target_expressible_value : Type :=
Target_expressible_nat : nat -> target_expressible_value
| Target_expressible_msg : target_msg -> target_expressible_value.
Given a suitable fetch-decode-execute loop, the target interpreter maps a target program to a target expressible value:
Definition specification_of_run (run : target_program -> target_expressible_value) :=
forall fetch_decode_execute_loop : list byte_code_instruction -> data_stack -> result_of_decoding_and_execution,
specification_of_fetch_decode_execute_loop fetch_decode_execute_loop ->
(forall (bcis : list byte_code_instruction),
fetch_decode_execute_loop bcis nil = OK nil ->
run (Target_program bcis) = Target_expressible_msg Target_msg_stack_O)
/\
(forall (bcis : list byte_code_instruction)
(n : nat),
fetch_decode_execute_loop bcis nil = OK (n :: nil) ->
run (Target_program bcis) = Target_expressible_nat n)
/\
(forall (bcis : list byte_code_instruction)
(n n' : nat)
(ds'' : data_stack),
fetch_decode_execute_loop bcis nil = OK (n :: n' :: ds'') ->
run (Target_program bcis) = Target_expressible_msg Target_msg_stack_SS)
/\
(forall (bcis : list byte_code_instruction)
(n : nat),
fetch_decode_execute_loop bcis nil = KO (Result_msg_numerical_underflow n) ->
run (Target_program bcis) = Target_expressible_msg (Target_msg_numerical_underflow n))
/\
(forall (bcis : list byte_code_instruction)
(bci : byte_code_instruction),
fetch_decode_execute_loop bcis nil = KO (Result_msg_stack_underflow bci) ->
run (Target_program bcis) = Target_expressible_msg (Target_msg_stack_underflow bci)).
N.B.: For simplicity, we shall use the infix notation ++ for the list-concatenation function that comes with the List library.
An arithmetic expression is translated using a translation function which is specified as follows:
Definition specification_of_translate (translate : arithmetic_expression -> list byte_code_instruction) :=
(forall n : nat,
translate (Literal n) = PUSH n :: nil)
/\
(forall ae1 ae2 : arithmetic_expression,
translate (Plus ae1 ae2) = translate ae1 ++ translate ae2 ++ ADD :: nil)
/\
(forall ae1 ae2 : arithmetic_expression,
translate (Minus ae1 ae2) = translate ae1 ++ translate ae2 ++ SUB :: nil).
Given a suitable translator for arithmetic expressions, the compiler maps a source program to a target program:
Definition specification_of_compile (compile : source_program -> target_program) :=
forall translate : arithmetic_expression -> list byte_code_instruction,
specification_of_translate translate ->
forall ae : arithmetic_expression,
compile (Source_program ae) = Target_program (translate ae).
Implement an alternative compiler using an auxiliary function with an accumulator and that does not use ++ but :: instead, and prove that it satisfies the specification.
Subsidiary question: Are your compiler and your alternative compiler equivalent? How can you tell?
Prove that
give a related result:
Theorem the_commuting_diagram_logically :
forall sp : source_program,
(forall n : nat,
interpret sp = Source_expressible_nat n ->
run (compile sp) = Target_expressible_nat n)
/\
(forall n : nat,
interpret sp = Source_expressible_msg (Source_msg_numerical_underflow n) ->
run (compile sp) = Target_expressible_msg (Target_msg_numerical_underflow n)).
Food for thought: Could the implications be reversed in Theorem the_commuting_diagram_logically?
Bong-sun: Wow.
Dana: Yes?
Bong-sun: As it happens, a source expressible value can be mapped into a target expressible value, look:
Definition target_expressible_value_from_source_expressible_value (sev : source_expressible_value) : target_expressible_value :=
match sev with
Source_expressible_nat n =>
Target_expressible_nat n
| Source_expressible_msg sm =>
Target_expressible_msg (match sm with
Source_msg_numerical_underflow n =>
Target_msg_numerical_underflow n
end)
end.
Dana: Aha. So we can state an equational version of the commuting diagram, look:
Theorem the_commuting_diagram_equationally :
forall sp : source_program,
target_expressible_value_from_source_expressible_value (interpret sp) =
run (compile sp).
Halcyon: Yes!
Alfrothul (surprised): Where is Fourth Wall?
Anton: He is just coming our way.
The fourth wall (facepalming): An unhinged interlude now? Here? Must we? We are in the middle of the term project. Someone needs to get a grip!
Pablito (ebullient): Never mind about that. Aren’t you guys a bit impatient here?
Bong-sun: I am.
Dana: Me too. Let’s check whether the composition of run and compile passes the unit tests for interpret.
Mimer (hovering by): Harrumph.
Alfrothul: That’s actually a good idea, because if the composition of run and compile does not pass the unit tests for interpret, the diagram does not commute.
Halcyon: We are so cool! In an impatient sort of way, but still, very cool. Please carry on.
Anton (pragmatic): Well, we face a type problem, because fun sp => run (compile sp) has type source_program -> target_expressible_value and test_interpret has type (source_program -> source_expressible_value) -> bool. These types are not compatible: We cannot apply test_interpret to fun sp => run (compile sp).
Pablito: Maybe we could convert the resulting value? Ah, bummer, target_expressible_value_from_source_expressible_value has type source_expressible_value -> target_expressible_value. Wrong direction.
Halcyon: Right, we can’t use it.
Bong-sun: How about we hack its converse?
Anton: Er, Dude – I mean Bong-sun – no can’t do, because there are more target error messages than source error messages.
Bong-sun: Right. That’s why I said “hack”.
Anton: OK?
Alfrothul: Well, we know for a fact – I just peeped later in the term project – that if we run the output of the compiler, the extra target error messages can’t occur.
Everybody: Seriously?
The Greek chorus: EVERYBODY PEEPS LATER IN THE TERM PROJECT.
Halcyon: Guys, I just peeped later in the term project, and Alfrothul is right, if we run the output of the compiler, the extra target error messages can’t occur.
Bong-sun: It’s hacking time! Let me see. Since we are implementing the converse of source_expressible_value_from_target_expressible_value, the target expressible nat can be mapped to a source expressible nat and the target numerical underflow can be mapped to a source numerical underflow. Easy money:
Definition source_expressible_value_from_target_expressible_value (tev : target_expressible_value) : source_expressible_value :=
match tev with
Target_expressible_nat n =>
Source_expressible_nat n
| Target_expressible_msg tm =>
Source_expressible_msg (match tm with
Target_msg_numerical_underflow n =>
Source_msg_numerical_underflow n
| _ =>
...
end)
end.
Pablito: But what about the other target errors?
Bong-sun (virtuous): Well, an error is an error, and since there is only one source error, here we go:
Definition source_expressible_value_from_target_expressible_value (tev : target_expressible_value) : source_expressible_value :=
match tev with
Target_expressible_nat n =>
Source_expressible_nat n
| Target_expressible_msg tm =>
Source_expressible_msg (match tm with
Target_msg_numerical_underflow n =>
Source_msg_numerical_underflow n
| _ =>
Source_msg_numerical_underflow ...
end)
end.
Pablito: Er... OK. I guess. But which source numerical underflow? The other target errors can’t occur but they are about the data stack.
Dana (definite): Let’s pick 0:
Definition source_expressible_value_from_target_expressible_value (tev : target_expressible_value) : source_expressible_value :=
match tev with
Target_expressible_nat n =>
Source_expressible_nat n
| Target_expressible_msg tm =>
Source_expressible_msg (match tm with
Target_msg_numerical_underflow n =>
Source_msg_numerical_underflow n
| _ =>
Source_msg_numerical_underflow 0
end)
end.
Anton: 0? But that’s not possible – 0 means that the two arguments of Minus are the same and then there is no numerical underflow. This error is not an error!
Dana: And so 0 is a perfect number to pick, since all the other errors can’t occur.
The fourth wall (who followed the conversation, willy nilly): So if we get Source_msg_numerical_underflow 0, which is impossible, it is because of an error that can’t happen?
Dana (beaming): Precisely.
Mimer: Harrumph. Again.
Pablito: Let me try, let me try:
Lemma testing_the_commuting_diagram_ahead_of_time :
test_interpret (fun sp => source_expressible_value_from_target_expressible_value (run (compile sp))) = true.
Proof.
compute.
reflexivity.
Qed.
Halcyon: And Coq accepts this lemma and this proof! Guys, we are so cool!
Anton (feebly): But what about the impossible errors?
Alfrothul (kindly): They can’t occur.
Anton: So the error that is not one...
Alfrothul: It can’t happen either.
Anton: Harrumph.
Mimer (patting him on the back, somehow): I know, right?
Loki: Ahead of Time testing, eh?
Dana: Yup.
The following verifier symbolically executes a byte-code program to check whether an underflow occurs during execution and whether when the program completes, there is one and one only natural number on top of the stack. The second argument of verify_aux is a natural number that represents the size of the stack:
Fixpoint verify_aux (bcis : list byte_code_instruction) (n : nat) : option nat :=
match bcis with
nil =>
Some n
| bci :: bcis' =>
match bci with
PUSH _ =>
verify_aux bcis' (S n)
| _ =>
match n with
S (S n') =>
verify_aux bcis' (S n')
| _ =>
None
end
end
end.
Definition verify (tp : target_program) : bool :=
let '(Target_program bcis) := tp
in match verify_aux bcis 0 with
Some n =>
match n with
1 =>
true
| _ =>
false
end
| _ =>
false
end.
Prove that the compiler emits code that is accepted by the verifier:
Theorem the_compiler_emits_well_behaved_code :
forall sp : source_program,
verify (compile sp) = true.
Subsidiary question: What is the practical consequence of this theorem?
You are expected, at the very least:
Beyond this absolute minimum, in decreasing importance, it would be good:
Also, feel free to expand the source language and the target language, e.g., with multiplication, quotient, and modulo, and/or with names and an environment, as in Chapter Arithmetic expressions with names in Week 12.
The fourth wall (facepalming with both hands): A postlude now?
Pablito (curious): What are you doing, Anton?
Anton (darkly): Proving a theorem:
Theorem source_numerical_underflows_cannot_be_zero :
forall ae : arithmetic_expression,
not (evaluate ae = Source_expressible_msg (Source_msg_numerical_underflow 0)).
Pablito: Funny, I was going for this one:
Theorem source_numerical_underflows_are_strictly_positive :
forall (ae : arithmetic_expression)
(n : nat),
evaluate ae = Source_expressible_msg (Source_msg_numerical_underflow n) ->
exists n' : nat,
n = S n'.
Anton: Yours works too.
Halcyon: This is so cool. Go for it, guys!
Revised, which included adding the ‘ludes [22 May 2025]
Finalized [14 Apr 2025]
Created [11 Apr 2025]