Matching regular expressions
Week 02 featured the following tagged and fully parenthesized syntactic
representation of regular expressions:
- <regexp> ::= (empty)
| (atom <integer>)
| (any)
| (seq <regexp> <regexp>)
| (disj <regexp> <regexp>)
| (star <regexp>)
| (plus <regexp>)
The goal here is to implement a regular-expression matcher that, given a
regular expression and a proper list of integers, determines whether this
list belongs to the set represented by this regular expression.
Semantics of regular expressions
- (mt) represents the singleton set that contains the empty list
- given an integer n,
(atom n) represents the singleton set that contains the singleton
list (n), wheren n represents n
- (any) represents the singleton set that contains a singleton list
- given re1 that represents a set RE1 and re2 that represents a set RE2,
(seq re1 re2) represents a set that contains the result of (append n1s n2s),
for each n1s that belongs to RE1 and for each n2s that belongs to RE2
- given re1 that represents a set RE1 and re2 that represents a set RE2,
(disj re1 re2) represents the union of RE1 and RE2
- given re that represents a set RE, and keeping in mind that append denotes a variadic procedure,
(star re) represents the set {(append), (append n1s), (append n21s n22s), (append n31s n32s n33s), ...},
where each of n1s, n21s, n22s, n31s, n32s, n33s, etc. represent a list that belongs to RE
- given re that represents a set RE, and keeping in mind that append denotes a variadic procedure,
(plus re) represents the set {(append n1s), (append n21s n22s), (append n31s n32s n33s), ...},
where each of n1s, n21s, n22s, n31s, n32s, n33s, etc. represent a list that belongs to RE
Here are some examples:
- (atom 10)
represents
{(10)}
- (atom 20)
represents
{(20)}
- (seq (atom 10) (atom 20))
represents
{(10 20)}
- (seq (any) (any))
represents
the set of all proper lists of integers of length two
- (disj (atom 10) (atom 20))
represents
{(10), (20)}
- (star (atom 10))
represents
{(), (10), (10 10), (10 10 10), ...}
- (plus (atom 10))
represents
{(10), (10 10), (10 10 10), ...}
- (seq (disj (atom 1) (atom 2)) (disj (atom 10) (atom 20)))
represents
{(1 10), (1 20), (2 10), (2 20)}
The last example illustrates the expressivity of regular expressions.
Implementation
The accompanying .scm file contains:
- an implementation of the predicate and accessor,
- a Version 0 that handles mt, atom, and seq, together with unit tests
- a missing Version 1 that extends Version 0 with any
- a Version 2 that extends Version 0 with disj, together with unit tests
- a variant of Version 2 that counts the number of possible matches
Exercise 01
- Flesh out Version 1, unit tests and all.
- Implement a Version 3 that extends Version 2 with any, unit tests and all.
- Implement a variant of Version 3 that counts the number of possible matches.
Make sure to motivate your new tests and to explain their significance.
Exercise 02
- Implement a Version 4 that expands Version 3 with star.
- Implement a counting version of Version 4.
Make sure to motivate your new tests and to explain their significance.
Exercise 03
- Implement a Version 5 that expands Version 4 with plus.
- Implement a counting version of Version 5.
Make sure to motivate your new tests and to explain their significance.
Version
Created
[08 Nov 2025]