String matching
with finite automata
Many string-matching algorithms build a finite automaton—a simple
machine for processing information—that scans the text string T for all
occurrences of the pattern P.
Finite automata
A finite automaton M,
is a 5-tuple (Q, q0,A,Ʃ,δ),
Where
-
Q is a finite set of states,
-
is the start state,
-
is a distinguished set of accepting
states,
-
Ʃ is a finite input
alphabet,
-
δ is a function from Q X Ʃ
into Q, called the transition function of M.
The finite automaton
begins in state q0 and reads the characters of its input string one
at a time. If the automaton is in state q and reads input character a, it moves
(“makes a transition”) from state q to state δ(q,a). Whenever its
current state q is a member of A, the machine M has accepted the
string read so far. An input that is not accepted is rejected.
For example, on input
abaaa, including the start state, this automaton enters the sequence of states <
0, 1, 0, 1, 0, 1 >, and so it accepts this input. For input abbaa, it enters
the sequence of states < 0, 1, 0, 0, 1, 0 >, and so it rejects this
input.
In order to specify the
string-matching automaton corresponding to a given pattern P [1. .m], we first
define function σ, called the suffix
function corresponding to P. The function σ
maps Ʃ* to {0, 1, . . . ,m} such
that σ(x) is the length of the longest prefix of P that is also a suffix of x:
We define δ(q,
a) = σ(Pqa)
because to keep track of the longest prefix of the pattern P that has matched
the text string T so far.
With this improved
procedure for computing δ, we can find all
occurrences of a length-m pattern in a length-n text over an alphabet Ʃ
with O(m|Ʃ|) preprocessing time and θ(n) matching time.
No comments:
Post a Comment