***Welcome to ashrafedu.blogspot.com * * * This website is maintained by ASHRAF***

Tuesday 10 December 2019

The Rabin-Karp algorithm


The Rabin-Karp algorithm


Rabin and Karp proposed a string-matching algorithm that performs well in practice. The Rabin-Karp algorithm uses θ(m) preprocessing time, and its worst-case running time is θ((n-m+1)m). Based on certain assumptions, however, its average-case running time is better.

This algorithm makes use of elementary number-theoretic notions.

Given a pattern P [1. .m], let p denote its corresponding decimal value. Similarly, given a text T[1. . n], let ts denote the decimal value of the length-m sub string T [s +1. . s + m], for s = 0, 1, . . . , n - m.

Now, ts = p if and only if T [s +1. . s +m] = P[1. .m]; thus, s is a valid shift if and only if ts=p.
If p could be computed in time O(m) and all of the ti values in a total of O(n) time, then we could determine all valid shifts s in time O(n) by comparing p with each of the ts values.

The only problem arises when numbers p and ts can be so large that comparisons cannot be said to take constant time. This problem can be solved easily by computing p and ts modulo a suitable number q. Generally q is chosen as a prime so that d.q fits in a computer word, which allows all computation to be done in single precision.

In general, with a d-ary alphabet {0, 1, . . . , d – 1}, choose q so that d.q fits within a computer word.

Working with numbers modulo q has the drawback that p approximately equal to ts(mod q) does not imply that p = ts. This is called Spurious hit. Pattern P[1..m] with T[s + 1..s + m] has to be compared character by character to see if the strings really are identical.



























The inputs to the procedure are the text T , the pattern P, the radix d to use (which is typically taken to be | Ʃ | ), and the prime q to use

RABIN-KARP-MATCHER takes θ(m) preprocessing time, and its matching time is
θ((n - m + 1)m) in the worst case. Rabin-Karp is slow because of multiplications and mod operations, but becomes competitive for long patterns.

The naive string-matching algorithm


The naive string-matching algorithm



The naive algorithm finds all valid shifts using a loop that checks the condition P [1. .m] = T [s + 1 . . s + m] for each of the n - m + 1 possible values of s.

















The above figure shows naïve string matching algorithm for the pattern P = aab and the text T = acaabc .

In figure (a)–(d) are the four successive alignments tried by the naive string matcher. In each part, vertical lines connect corresponding regions found to match (shown shaded), and a jagged line connects the first mismatched character found.

The algorithm finds one occurrence of the pattern, at shift s = 2, shown in part (c) of the figure
Procedure NAIVE-STRING-MATCHER takes time O((n - m + 1) m), and this bound is tight in the worst case. NAIVE-STRING-MATCHER is not an optimal procedure for this problem.

Naïve string matching algorithm is simple, it does not require any preprocessing. It always shifts the window by exactly one position to right.

String matching


String matching



String matching or searching algorithms try to find places where one or several strings (also called patterns) are found within a larger string (searched text).

Text-editing programs frequently need to find all occurrences of a pattern in the text. Efficient algorithms for this problem are called “string matching” algorithms that can greatly help the responsiveness of the text-editing program.

The string-matching problem is formalized as follows:

Assume that the text is an array T[1. . n] of length n and that the pattern is an array P[1. .m] of length m ≤ n. Assume that the elements of P and T are characters drawn from a finite alphabet Ʃ. The character arrays P and T are often called strings of characters.

The string-matching problem is the problem of finding all valid shifts with which a given pattern P occurs in a given text T.

If the pattern P occurs with shift s in T, then s is called valid shift; otherwise, s is an invalid shift.
Pattern P occurs with shift s in text T if 0 ≤ s ≤ n-m and T [s+1. . s +m] = P [1. .m].

Monday 9 December 2019

Multiplication of Large Integers

Multiplication of Large Integers

If the conventional algorithm for multiplying two n-digit integers is used, each of the n digits of the first number is multiplied by each of the n digits of the second number for the total of n2 digit multiplications.
The divide-and-conquer strategy can be used to design an algorithm with fewer than n2 multiplications.
The basic idea of the algorithm can be demonstrated using:
 Consider a case of two-digit integers, say, 23 and 14. These numbers can be represented as follows:
23 = 2 x 101 + 3 x 100 and 14 = 1 x 101 + 4 x 100.
Now multiplying them,
23 x 14 = (2 x 101 + 3 x 100 ) x (1 x 101 + 4 x 100.)
            = (2 x 1) 102 + (2 x 4 + 3 x 1) 101 + (3 x 4) 100.
For any pair of two-digit numbers a = a1a0 and b = b1b0, their product c can be computed by the formula
c = a x b = c2102 + c1101 + c0 ,
where
c2 = a1 x b1 is the product of first digits.
c1 = (a1 + a0) x (b1 + b0) – (c2 + c0) is the product of sum of a’s digits and sum of b’s digits minus                the sum of c2 and c0.
c0 = a0 x b0 is the product of second digits.
If n/2 is even, the same method can be applied for computing the products c2, c0, and c1. Thus, if n is a power of 2, a recursive algorithm can be written for computing the product of two n-digit integers. The recursion is stopped when n becomes 1. It can also be stopped when n is small enough to multiply the numbers of that size directly.
Since multiplication of n-digit numbers requires three multiplications of n/2-digit numbers, the
recurrence for the number of multiplications M(n) is
M(n) = 3M (n/2) for n > 1, M(1) = 1.
Solving it by backward substitutions for n=2k gives
M(2k) = 3M(2k-1) = 3[3M(2k-2)] = 32M(2k-2) = …… = 3iM(2k-i) = 3kM(2k-k) = 3k.
Since k = log2 n,


If A(n) be the number of digit additions and subtractions executed by the above algorithm in multiplying two n-digit decimal integers. Besides 3A(n/2) of these operations needed to compute the three products of n/2-digit numbers, the above formulas require five additions and one subtraction. Hence, the recurrence is
Applying the Master Theorem, A(n) approximately gives nlog23 , which means that the total number of additions and subtractions have the same asymptotic order of growth as the number of multiplications.




Sunday 8 December 2019

Greatest common divisor (Euclid’s algorithm)

Greatest common divisor (Euclid’s algorithm)

If d is a divisor of a and d is also a divisor of b, then d is a common divisor of a and b.

The greatest common divisor of two integers a and b, not both zero, is the largest of the common divisors of a and b; denote it by gcd(a, b).

Euclid’s algorithm is described for efficiently computing the greatest common divisor of two integers. Euclid’s algorithm for computing greatest common divisors relies on the following theorem.
Theorem (GCD recursion theorem): For any non negative integer a and any positive integer b, gcd(a, b) = gcd (b, a mod b).

Euclid’s algorithm
The Elements of Euclid (circa 300 B.C.) describes the following gcd algorithm, Euclid’s algorithm is  a recursive program based directly on GCD recursion theorem. The inputs a and b are arbitrary non negative integers. 
Example:
Computation of gcd (30, 21).
EUCLID (30, 21) 
= EUCLID (21, 9)
= EUCLID (9, 3)
= EUCLID (3, 0)
= 3.
The running time of Euclid’s algorithm
The worst-case running time of EUCLID can be analyzed as a function of the size of a and b. Assume that a > b ≥ 0. The overall running time of EUCLID is proportional to the number of recursive calls it makes.

The extended form of Euclid’s algorithm
An important property of common divisors is that d | a and d | b implies d | (a + b) and d | (a – b). 
The notation d | a (read “d divides a”).

More generally, if d | a and d | b implies d | (ax + by) for any integers x and y.

Extend Euclid’s algorithm to compute the integer coefficients x and y such that d = ax + by . or,
gcd(a, b) = ax + byNote that x and y may be zero or negative.









The procedure EXTENDED-EUCLID takes as input a pair of non negative integers and returns a triple of the form (d, x, y) that satisfies the equation d = ax + by.
Example:


Since the number of recursive calls made in EUCLID is equal to the number of recursive calls made in EXTENDED-EUCLID, the running times of EUCLID and EXTENDED-EUCLID are the same, to within a constant factor. That is, for a > b > 0, the number of recursive calls is O(lg b).

Saturday 7 December 2019

The DFT and FFT




The DFT:
To evaluate a polynomial 

of degree-bound n at 
      (that is, at the n complex nth roots of unity).
'


Assume that A is given in coefficient form: = (a0a1, . . . , an-1). Let us define the results yk, for k = 0, 1, . . . , n - 1, by  
The vector y = (y0y1, . . . , yn-1) is the Discrete Fourier Transform (DFT) of the coefficient vector a = (a0a1, . . . , an-1). It can also be written as, y =DFTn(a).


The FFT:
By using a method known as the Fast Fourier Transform (FFT), which takes advantage of the special properties of the complex roots of unity, DFTn(a) can be computed  in time bound(n lg n), as opposed to the bound(n2) time of the straightforward method.
The FFT method employs a divide-and-conquer strategy, using the even-index and odd-index coefficients of A(x) separately to define the two new degree-bound n/2 polynomials A[0](x) and A[1](x): 
A[0](x)  =  a0 + a2x + a4x2 + dot10 dot10 dot10 + an-2xn/2-1,
A[1](x)  =  a1 + a3x + a5x2 + dot10 dot10 dot10 + an-1xn/2-1.
Note: 
1.  A[0] contains all the even-index coefficients of A (the binary representation of the index ends                   in 0) and 
2. A[1] contains all the odd-index coefficients (the binary representation of the index ends in 1).
It follows that
            A(x) = A[0](x2) + xA[1](x2),………..           (equation 1)
so that the problem of evaluating A(x) at
 
1. evaluating the degree-bound n/2 polynomials A[0](x) and A[1](x) at the points
and then


2. Combining the results according to equation 1.

By the halving lemma, the polynomials A[0] and A[1] of degree-bound n/2 are recursively evaluated at the n/2 complex (n/2)th roots of unity. These sub problems have exactly the same form as the original problem, but are half the size. An n-element DFTn computation can be successfully into two n/2-element DFTn/2 computations. This decomposition is the basis for the following recursive FFT algorithm, which computes the DFT of an n-element vector a = (a0, a1, . . . , an - 1), where n is a power of 2.







Using FFT (Fast Fourier Transform) and its inverse to multiply Polynomials


Using FFT (Fast Fourier Transform) and its inverse to multiply Polynomials


The linear time multiplication method for polynomials in point-value form can be used to perform fast and efficient polynomial multiplication in coefficient form by converting a polynomial quickly from coefficient form to point value form(evaluate) and vice versa(interpolate).

Using Fast Fourier Transform (FFT) and its inverse, we can do evaluation and interpolation in time θ (𝑛log𝑛). The product of two polynomials of degree-bound 𝑛 can be computed in time θ (𝑛log𝑛), with both the input and output in coefficient form.

Given the FFT, we have the following θ(nlog n)-time procedure for multiplying two polynomials A(x) and B(x) of degree-bound n, where the input and output representations are in coefficient form. Assume that n is a power of 2; this requirement can be met by adding high-order zero coefficients.
  1. Double degree-bound: Create coefficient representations of A(x) and B(x) as degree-bound 2n polynomials by adding ‘n’ high-order zero coefficients to each.
  2. Evaluate: Compute point-value representations of A(x) and B(x) of length 2n by applying the FFT of order 2n on each polynomial. These representations contain the values of the two polynomials at the (2n)th roots of unity.
  3. Point wise multiply: Compute a point-value representation for the polynomial C(x) = A(x)B(x) by multiplying these values together point wise. This representation contains the value of C(x) at each (2n)th root of unity.
  4. Interpolate: Create the coefficient representation of the polynomial C(x) by applying the FFT on 2n point-value pairs to compute the inverse DFT.

Representing polynomials


Representing polynomials



A polynomial in the variable x over an algebraic field F represents a function A(x) as a formal sum:

, the values a0, a1, . . . ,an-1 the coefficients of the polynomial.


A polynomial A(x) has degree k if its highest nonzero coefficient is ak i.e, degree(A) = k. 
Any integer strictly greater than the degree of a polynomial is degree-bound of that polynomial.

Polynomials can be represented using

  •          the coefficient and
  •          point-value representations of polynomials.
1. Coefficient representation: A coefficient representation of a polynomial 


, of degree bound ‘n’ is a vector of coefficients a = (a0, a1, . . . ,an-1). 

Example: A(𝑥) = 6𝑥3+7𝑥2−10𝑥+9  coefficients of A(x) are (9,−10,7,6).

The coefficient representation is convenient for certain operations on polynomials.
The operation of evaluating the polynomial (𝑥) at point 𝑥0 consists of computing the value of 𝐴(𝑥). Evaluation takes time θ(𝑛) using Horner’s rule 𝐴(𝑥0)=𝑎0+𝑥0(𝑎1+𝑥0(𝑎2++𝑥0(𝑎𝑛−2+𝑥0 ­(𝑎𝑛−1 )))).

Adding two polynomials represented by the coefficient vectors 𝑎=(𝑎0,𝑎1,…,𝑎𝑛−1) and 𝑏=(𝑏0,𝑏1,…,𝑏𝑛−1) takes time Θ(𝑛). Sum is the coefficient vector 𝑐=(𝑐0,𝑐1,…,𝑐𝑛−1), where 𝑐𝑗=𝑎𝑗+𝑏𝑗 for 𝑗=0,1,…,𝑛−1.
Example:
𝐴𝑥= 6𝑥3 + 7𝑥2 − 10𝑥 + 9             (9,−10,7,6)
𝐵𝑥=− 2𝑥3 + 4𝑥 − 5                      (−5,4,0,−2)
----------------------------------------------------------
(𝑥)= 4𝑥3 + 7𝑥2 − 6𝑥 + 4              (4,−6,7,4)

2. Point-Value Representation: A point-value representation of a polynomial 𝐴(𝑥) of degree-bound 𝑛 is a set of 𝑛 point-value pairs {(𝑥0,𝑦0),(𝑥1,𝑦1),…,(𝑥𝑛−1,𝑦𝑛−1)} such that all of the 𝑥𝑘 are distinct and 𝑦𝑘=𝐴(𝑥𝑘) for 𝑘=0,1,…,𝑛−1.
 Example: A(𝑥) =𝑥3−2𝑥+1






Using Horner’s method, 𝒏-point evaluation takes time θ(𝑛2).
The inverse of evaluation—determining the coefficient form of a polynomial from a point-value representation—is interpolation.
For any set {(𝑥0,𝑦0),(𝑥1,𝑦1),…,(𝑥𝑛−1,𝑦𝑛−1)} of 𝑛 point-value pairs such that all the 𝑥𝑘 values are distinct, there is a unique polynomial 𝐴(𝑥) of degree-bound 𝑛 such that 𝑦𝑘=𝐴(𝑥𝑘) for 𝑘=0,1,…,𝑛−1.
Using Lagrange’s formula, interpolation takes time θ(𝑛2).
Lagrange’s formula




Example: Using Lagrange’s formula, we interpolate the point-value representation {(0,1),(1,0),(2,5),(3,22)}.












In point-value form, addition 𝐶(𝑥)=𝐴(𝑥)+𝐵(𝑥) is given by 𝐶(𝑥𝑘)=𝐴(𝑥𝑘)+𝐵(𝑥𝑘) for any point 𝑥𝑘.




𝐴 and 𝐵 are evaluated for the same 𝑛 points. The time to add two polynomials of degree-bound 𝑛 in point-value form is θ(𝑛).
Example:  Add (𝑥)=𝐴(𝑥)+𝐵(𝑥) in point-value form 








In point-value form, multiplication 𝐶(𝑥)=𝐴(𝑥)𝐵(𝑥)  is given by 𝐶(𝑥𝑘)=𝐴(𝑥𝑘)𝐵(𝑥𝑘) for any point 𝑥𝑘. If 𝐴 and 𝐵 are of degree-bound 𝑛, then 𝐶 is of degree-bound 2𝑛.
Need to start with “extended” point-value forms for 𝐴 and 𝐵 consisting of 2𝑛 point-value pairs each.




The time to multiply two polynomials of degree-bound 𝑛 in point-value form is θ(𝑛). 
Example:  Multiply 𝐶𝑥=𝐴(𝑥)𝐵(𝑥) in point-value form








Given two input polynomials in extended point-value form, the time to multiply them to obtain the point value form of the result is θ(n), which is much less than the time required to multiply polynomials in coefficient form θ(n2).

A graphical outline of an efficient polynomial-multiplication process is given below. Representations on the top are in coefficient form, while those on the bottom are in point-value form.