Matching regular expressions

Week 02 featured the following tagged and fully parenthesized syntactic representation of regular expressions:

<regexp> ::= (empty) | (atom <integer>) | (any) | (seq <regexp> <regexp>) | (disj <regexp> <regexp>) | (star <regexp>) | (plus <regexp>)

The goal here is to implement a regular-expression matcher that, given a regular expression and a proper list of integers, determines whether this list belongs to the set represented by this regular expression.

Semantics of regular expressions

  • (mt) represents the singleton set that contains the empty list
  • given an integer n, (atom n) represents the singleton set that contains the singleton list (n), wheren n represents n
  • (any) represents the singleton set that contains a singleton list
  • given re1 that represents a set RE1 and re2 that represents a set RE2, (seq re1 re2) represents a set that contains the result of (append n1s n2s), for each n1s that belongs to RE1 and for each n2s that belongs to RE2
  • given re1 that represents a set RE1 and re2 that represents a set RE2, (disj re1 re2) represents the union of RE1 and RE2
  • given re that represents a set RE, and keeping in mind that append denotes a variadic procedure, (star re) represents the set {(append), (append n1s), (append n21s n22s), (append n31s n32s n33s), ...}, where each of n1s, n21s, n22s, n31s, n32s, n33s, etc. represent a list that belongs to RE
  • given re that represents a set RE, and keeping in mind that append denotes a variadic procedure, (plus re) represents the set {(append n1s), (append n21s n22s), (append n31s n32s n33s), ...}, where each of n1s, n21s, n22s, n31s, n32s, n33s, etc. represent a list that belongs to RE

Here are some examples:

  • (atom 10) represents {(10)}
  • (atom 20) represents {(20)}
  • (seq (atom 10) (atom 20)) represents {(10 20)}
  • (seq (any) (any)) represents the set of all proper lists of integers of length two
  • (disj (atom 10) (atom 20)) represents {(10), (20)}
  • (star (atom 10)) represents {(), (10), (10 10), (10 10 10), ...}
  • (plus (atom 10)) represents {(10), (10 10), (10 10 10), ...}
  • (seq (disj (atom 1) (atom 2)) (disj (atom 10) (atom 20))) represents {(1 10), (1 20), (2 10), (2 20)}

The last example illustrates the expressivity of regular expressions.

Implementation

The accompanying .scm file contains:

  • an implementation of the predicate and accessor,
  • a Version 0 that handles mt, atom, and seq, together with unit tests
  • a missing Version 1 that extends Version 0 with any
  • a Version 2 that extends Version 0 with disj, together with unit tests
  • a variant of Version 2 that counts the number of possible matches

Exercise 01

  1. Flesh out Version 1, unit tests and all.
  2. Implement a Version 3 that extends Version 2 with any, unit tests and all.
  3. Implement a variant of Version 3 that counts the number of possible matches.

Make sure to motivate your new tests and to explain their significance.

Exercise 02

  1. Implement a Version 4 that expands Version 3 with star.
  2. Implement a counting version of Version 4.

Make sure to motivate your new tests and to explain their significance.

Exercise 03

  1. Implement a Version 5 that expands Version 4 with plus.
  2. Implement a counting version of Version 5.

Make sure to motivate your new tests and to explain their significance.

Resources

Version

Created [08 Nov 2025]

Table Of Contents

Previous topic

Lecture Notes, Week 11

Next topic

Exercises for Week 11