Regular expressions
A language
in an alphabet is any collection of strings in its symbols.
-
A word is recognized by an automaton if there is
a path from a start state to an accepting state
labeled in order by its symbols.
-
A language is associated to an automaton
if its words are precisely those
recognized by it.
-
Any language is recognized by its tautologous automaton,
that whose nodes are the prefixes of words in that language.
The starting state is the empty string,
and the accepting states are those which are actually
words in the language. The ShortLex tree.
A language is regular if recognized by a finite automaton.
The basic operations:
-
A cat B (A . B): a word of A followed by a word of B
-
A star (A+): 0 or more words of A
-
A plus (A*): 1 or more words of A
-
A alt B (A | B): a word of A or a word of B
-
not A (!A): words with the same symbols but not in A
-
A and B (A & B): words in both A and B
These generate regular expressions.
The regular expression
( |+|-)(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*
recognizes decimal integers
expressed conventionally.
-
Regular expressions require only a finite amount of memory
to recognize.
-
The language of balanced parentheses ( ( ( ) ( ) ( ( ) ) ) ( ) ) is not regular,
roughly because recognizing all such expressions requires
a memory of arbitrary size.
|