This note outlines the meaning of regular expressions.
Here is again the BNF for fully parenthesized regular expressions:
A regular expression represents a set of lists of integers. Matching a regular expression against a given list amounts to checking whether this list belongs to this set:
the constructor empty declares the empty regular expression, which matches the empty list of integers;
for example, matching (empty) and () succeeds, whereas matching (empty) and, e.g., (1 2 3) fails;
indeed, the regular expression (empty) represents the set {()};
the constructor atom declares a specific integer, and this atomic regular expression matches a singleton list that contains this integer;
for example, matching (atom 10) and (10) succeeds, whereas matching (atom 10) and () fails, and so does matching (atom 10) and (10 20);
indeed, the regular expression (atom 10) represents the set {(10)};
the constructor any specifies any integer, and this regular expression matches any singleton list containing an integer;
for example, matching (any) and (10) succeeds, and so does matching (any) and (20), but matching (any) and () fails, and so does matching (any) and (10 20);
indeed, the regular expression (any) represents the set {(n) | n is an integer}, i.e., the set of singleton lists of integers;
the constructor seq declares a sequence of two regular expressions, and matching this sequence matches a list
for example, matching (seq (atom 10) (atom 20)) and (10 20) succeeds because matching (atom 10) and (10) succeeds and matching (atom 20) and (20) also succeeds;
the constructor disj declares the disjunction (sometime called an alternation) of two regular expressions, and this disjunction matches any list that is matched by either regular expression;
for example, matching (disj (atom 10) (atom 20)) and (10) succeeds, and so does matching (disj (atom 10) (atom 20)) and (20);
the constructor star declares the repetition (zero or more times) of a regular expression, and matches a list that contains a (possibly empty) sequence of sublists, each of which is matched by the regular expression;
for example, matching (star (atom 10)) and () succeeds, and so do matching (star (atom 10)) and (10), matching (star (atom 10)) and (10 10), matching (star (atom 10)) and (10 10 10), etc.
the constructor plus declares the repetition (one or more times) of a regular expression, and matches a list that contains a (non-empty) sequence of sublists, each of which is matched by the regular expression;
for example, matching (plus (atom 10)) and () fails, but matching (plus (atom 10)) and (10) succeeds, and so do matching (plus (atom 10)) and (10 10), matching (plus (atom 10)) and (10 10 10), etc.
You can play online with this regular-expression matcher here (ignore the var constructor).
Examples:
Food for thought:
More examples:
Which lists does (seq (disj (atom 1) (empty)) (seq (disj (atom 2) (empty)) (disj (atom 3) (empty)))) match?
In how many ways does (seq (star (atom 10)) (star (atom 10))) match the following lists?
Why?
Which lists do the following regular expressions match?
What does that say about the relation between empty and seq?
Hint: to solve this exercise, use the traditional scientific method, namely
Which lists do the following regular expressions match?
What does that say about seq?
Hint: to solve this exercise, use the traditional scientific method, as in Exercise 12.
Which lists do the following regular expressions match?
What does that say about disj and seq?
Hint: to solve this exercise, use the traditional scientific method, as in Exercise 12 and Exercise 13.
Fixed a typo in the narrative, thanks to Yunjeong Lee’s eagle eye [30 Mar 2019]
Created [21 Jan 2019]