***Welcome to ashrafedu.blogspot.com * * * This website is maintained by ASHRAF***

Tuesday, 10 December 2019

The Rabin-Karp algorithm


The Rabin-Karp algorithm


Rabin and Karp proposed a string-matching algorithm that performs well in practice. The Rabin-Karp algorithm uses θ(m) preprocessing time, and its worst-case running time is θ((n-m+1)m). Based on certain assumptions, however, its average-case running time is better.

This algorithm makes use of elementary number-theoretic notions.

Given a pattern P [1. .m], let p denote its corresponding decimal value. Similarly, given a text T[1. . n], let ts denote the decimal value of the length-m sub string T [s +1. . s + m], for s = 0, 1, . . . , n - m.

Now, ts = p if and only if T [s +1. . s +m] = P[1. .m]; thus, s is a valid shift if and only if ts=p.
If p could be computed in time O(m) and all of the ti values in a total of O(n) time, then we could determine all valid shifts s in time O(n) by comparing p with each of the ts values.

The only problem arises when numbers p and ts can be so large that comparisons cannot be said to take constant time. This problem can be solved easily by computing p and ts modulo a suitable number q. Generally q is chosen as a prime so that d.q fits in a computer word, which allows all computation to be done in single precision.

In general, with a d-ary alphabet {0, 1, . . . , d – 1}, choose q so that d.q fits within a computer word.

Working with numbers modulo q has the drawback that p approximately equal to ts(mod q) does not imply that p = ts. This is called Spurious hit. Pattern P[1..m] with T[s + 1..s + m] has to be compared character by character to see if the strings really are identical.



























The inputs to the procedure are the text T , the pattern P, the radix d to use (which is typically taken to be | Ʃ | ), and the prime q to use

RABIN-KARP-MATCHER takes θ(m) preprocessing time, and its matching time is
θ((n - m + 1)m) in the worst case. Rabin-Karp is slow because of multiplications and mod operations, but becomes competitive for long patterns.

No comments:

Post a Comment