Matching regular expressions¶

Week 02 featured the following tagged and fully parenthesized syntactic representation of regular expressions:

<regexp> ::= (empty) | (atom <integer>) | (any) | (seq <regexp> <regexp>) | (disj <regexp> <regexp>) | (star <regexp>) | (plus <regexp>)

The goal here is to implement a regular-expression matcher that, given a regular expression and a proper list of integers, determines whether this list belongs to the set represented by this regular expression.

Semantics of regular expressions¶

(mt) represents the singleton set that contains the empty list
given an integer n, (atom n) represents the singleton set that contains the singleton list (n), wheren n represents n
(any) represents the singleton set that contains a singleton list
given re1 that represents a set RE1 and re2 that represents a set RE2, (seq re1 re2) represents a set that contains the result of (append n1s n2s), for each n1s that belongs to RE1 and for each n2s that belongs to RE2
given re1 that represents a set RE1 and re2 that represents a set RE2, (disj re1 re2) represents the union of RE1 and RE2
given re that represents a set RE, and keeping in mind that append denotes a variadic procedure, (star re) represents the set {(append), (append n1s), (append n21s n22s), (append n31s n32s n33s), ...}, where each of n1s, n21s, n22s, n31s, n32s, n33s, etc. represent a list that belongs to RE
given re that represents a set RE, and keeping in mind that append denotes a variadic procedure, (plus re) represents the set {(append n1s), (append n21s n22s), (append n31s n32s n33s), ...}, where each of n1s, n21s, n22s, n31s, n32s, n33s, etc. represent a list that belongs to RE

Here are some examples:

(atom 10) represents {(10)}
(atom 20) represents {(20)}
(seq (atom 10) (atom 20)) represents {(10 20)}
(seq (any) (any)) represents the set of all proper lists of integers of length two
(disj (atom 10) (atom 20)) represents {(10), (20)}
(star (atom 10)) represents {(), (10), (10 10), (10 10 10), ...}
(plus (atom 10)) represents {(10), (10 10), (10 10 10), ...}
(seq (disj (atom 1) (atom 2)) (disj (atom 10) (atom 20))) represents {(1 10), (1 20), (2 10), (2 20)}

The last example illustrates the expressivity of regular expressions.

Implementation¶

The accompanying .scm file contains:

an implementation of the predicate and accessor,
a Version 0 that handles mt, atom, and seq, together with unit tests
a missing Version 1 that extends Version 0 with any
a Version 2 that extends Version 0 with disj, together with unit tests
a variant of Version 2 that counts the number of possible matches

Exercise 01¶

Flesh out Version 1, unit tests and all.
Implement a Version 3 that extends Version 2 with any, unit tests and all.
Implement a variant of Version 3 that counts the number of possible matches.

Make sure to motivate your new tests and to explain their significance.

Exercise 02¶

Implement a Version 4 that expands Version 3 with star.
Implement a counting version of Version 4.

Make sure to motivate your new tests and to explain their significance.

Exercise 03¶

Implement a Version 5 that expands Version 4 with plus.
Implement a counting version of Version 5.

Make sure to motivate your new tests and to explain their significance.

Resources¶

The Scheme code for the present lecture note (latest version: 08 Nov 2025).

Version¶

Created [08 Nov 2025]

Matching regular expressions¶

Semantics of regular expressions¶

Implementation¶

Exercise 01¶

Exercise 02¶

Exercise 03¶

Resources¶

Version¶

Table Of Contents

Previous topic

Next topic

Navigation

Matching regular expressions¶

Semantics of regular expressions¶

Implementation¶

Exercise 01¶

Exercise 02¶

Exercise 03¶

Resources¶

Version¶

Table Of Contents

Previous topic

Next topic

Quick search

Navigation