Design and Analysis of Algorithms: String matching with finite automata

String matching with finite automata

Many string-matching algorithms build a finite automaton—a simple machine for processing information—that scans the text string T for all occurrences of the pattern P.

Finite automata

A finite automaton M, is a 5-tuple (Q, q₀,A,Ʃ,δ),

Where

- Q is a finite set of states,

- is the start state,

- is a distinguished set of accepting states,

- Ʃ is a finite input alphabet,

- δ is a function from Q X Ʃ into Q, called the transition function of M.

The finite automaton begins in state q₀ and reads the characters of its input string one at a time. If the automaton is in state q and reads input character a, it moves (“makes a transition”) from state q to state δ(q,a). Whenever its current state q is a member of A, the machine M has accepted the string read so far. An input that is not accepted is rejected.

For example, on input abaaa, including the start state, this automaton enters the sequence of states < 0, 1, 0, 1, 0, 1 >, and so it accepts this input. For input abbaa, it enters the sequence of states < 0, 1, 0, 0, 1, 0 >, and so it rejects this input.

In order to specify the string-matching automaton corresponding to a given pattern P [1. .m], we first define function σ, called the suffix function corresponding to P. The function σ maps Ʃ^* to {0, 1, . . . ,m} such that σ(x) is the length of the longest prefix of P that is also a suffix of x:

We define δ(q, a) = σ(P_qa) because to keep track of the longest prefix of the pattern P that has matched the text string T so far.

With this improved procedure for computing δ, we can find all occurrences of a length-m pattern in a length-n text over an alphabet Ʃ with O(m|Ʃ|) preprocessing time and θ(n) matching time.

Design and Analysis of Algorithms

Saturday, 22 February 2020

String matching with finite automata

No comments:

Post a Comment

Updates