HW2 - Regular Language Processing
- Draw (eg by hand with a pen) NFAs to accept the languages
- number of a's is even and number of b's is multiple of three
(alphabet = {a, b})
- a comma-serparated list of legal C identifiers, with white space
allowed where appropriate. (By a legal C identifier I mean
any sequence of letters, digits, and '_', starting with a letter or an
underscore.)
Write Scheme functions to do each of the following. They should not
use any side effects, should be properly commented and indented, etc.
Your filename should have the extension .scm, for our convenience as
well as your own. (10 points each.)
- Write accepts? which takes a DFA in the form described
below, and a list of symbols representing a string that might be in
the language, and returns true iff the DFA accepts the given string.
- Write nfa-to-dfa which takes an NFA in the below syntax
and transforms it into a DFA, using the naive power-set method.
(There is no need for the resulting DFA to be minimized or optimized
in any fashion.)
- Write re-to-nfa which takes a regular expression in the
below syntax and returns an NFA which accepts the language described
by the given regular expression.
Examples:
(accepts? (nfa-to-dfa (re-to-nfa '(concat foo (* bar) foo)))
'(foo bar bar bar bar foo)) ; returns #t
(accepts? (nfa-to-dfa (re-to-nfa '(concat foo (* bar) foo)))
'(foo bar bar bar bar)) ; returns #f
Representations
All the representations are optimized for ease of manipulation in Scheme.
Symbols in the alphabet (ie in Sigma) can be any Scheme symbol.
DFA
List of four elements. The first is the alphabet (as a list). The
second is a list consisting of the start state. The third is a list
of accepting states. The fourth is a list of list of ``state
descriptors.'' Each state descriptor is a list whose first element is
the name of the state, and whose remaining elements are lists of
targets of transitions and the symbols associated with those
transitions.
For example, the ``number-of-a's minus number-of-b's equals two modulo
three'' DFA from class would be represented as:
((a b) ; Sigma (the alphabet)
(mod0) ; start state
(mod2) ; accepting states
((mod0 (mod1 a) (mod2 b)) ; transitions out of state MOD0
(mod1 (mod2 a) (mod0 b)) ; ... MOD1
(mod2 (mod0 a) (mod1 b)))) ; ... MOD2
There is an implicit ``sink state'' as described in class.
NFA
The representation of NFAs is the same as for DFAs, except that there
might be more than one listed transition out of a given state for a
given symbol, and the list of symbols in a transition can be empty, to
indicate an ``epsilon transition''.
Eg:
((a b c d e f g h i j k l m n o p q r s t u v w x y z)
(s)
(b0c b1c)
((s (b0a b) (b1a b))
(b0a (b0b o))
(b0b (b0c b))
(b0c (b0a b))
(b1a (b1b i))
(b1b (b1c b))
(b1c (b1a b))))
Regular Expression
This is a scheme-ey syntax for REs.
- A symbol is an RE.
- The s-expression (* re) is an RE, denoting the
Kleene closure of re.
- The s-expression (+ re) is an RE, denoting one or
more repetitions of re.
- The s-expression (concat re1 re2 ...) is an
RE, denoting the concatenation of the given re's.
- The s-expression (union re1 re2 ...) is an
RE, denoting the union of the given re's.
Due fifteen minutes before class: 3:45pm, Wed Sep 5.
We will take off as many points for a 23-hour-late assignment as
for a 24-second late one, so please don't skip class to work on it.
Turn in your code by running
~bap/bin/handin your-file.scm
on a regular UNM CS machine. (You should use whatever filename is
appropriate in place of your-file.scm. You can put multiple files on
the command line, or even directories. Directories will have their
entire contents handed in, so please be careful with them, and in
particular be sure to clean out any cruft.)
If you wish, the answers to problem 1 can be drawn by hand and turned
in before the start of class, in the classroom, on Wed Sep 5.
Barak Pearlmutter
<bap@cs.unm.edu>