***Welcome to ashrafedu.blogspot.com * * * This website is maintained by ASHRAF***

Tuesday, 26 November 2019

Huffman codes


Huffman codes

Huffman codes compress data very effectively: savings of 20% to 90% are typical, depending on the 
characteristics of the data being compressed.

There are many options for how to represent such a file of information. 

Consider the problem of designing a binary character code in which each character is represented by
a unique binary string, called a code word. If the code word of fixed length code words are used, 
more bits are necessary to store compared to variable length code words.

Huffman codes uses Variable length code words for data compression. A variable-length code can 
do considerably better than a fixed-length code, by giving frequent characters short code words and 
infrequent characters long code words. These types of variable-length codes are called prefix codes. 
A prefix code can always achieve the optimal data compression among any character code. 
Prefix codes are desirable because they simplify decoding. Since no code word is a prefix of any other, the code
word that begins an encoded file is unambiguous.
Above figure is the tree corresponding to the fixed-length code of three bits. Each leaf is labeled with a character and its frequency of occurrence. Each internal node is labeled with the sum of the frequencies of the leaves in its sub tree.
Above figure is the tree corresponding to the optimal prefix code. An optimal code for a file is always represented by a full binary tree.
Binary code word for a character is the path from the root to that character where 0 means "go to left child"and 1 means "go to right child".

Ex: file contains "aabe" then the encoding is 0.0.101.1101, where '. ' denotes concatenation.  Decoding is also simple, since prefix code is used no other code word is prefix of any other code word.

Given a tree T corresponding to a prefix code, the number of bits required to encode a file can be easily computed. For each character ‘c’ in the alphabet C, let the attribute c.freq denote the frequency of ‘c’ in the file and let dT(c) denote the depth of c’s leaf in the tree.
The number of bits required to encode a file (cost of the tree T) is thus
Constructing a Huffman code

The algorithm builds the tree T corresponding to the optimal code in a bottom-up manner. It begins with a set of |C| leaves and performs a sequence of |C| - 1 “merging” operations to create the final tree. 
The algorithm uses a min-priority queue Q, keyed on the freq attribute, to identify the two least-frequent objects to merge together.













Line 2 initializes the min-priority queue Q with the characters in C. The for loop in lines 3–8 repeatedly extracts the two nodes x and y of lowest frequency from the queue, replacing them in the queue with a new node ‘z ‘representing their merger. The frequency of z is computed as the sum of the frequencies of x and y in line 7. The node z has x as its left child and y as its right child. After n - 1 mergers, line 9 returns the one node left in the queue, which is the root of the code tree.


The total running time of HUFFMAN on a set of n characters is O(n lg n).

Example:
Consider the characters in the file occur with the frequencies given by 
Arrange the frequencies in monotonically increasing order in a queue
Choose the two minimum frequencies and merge them
Now queue becomes
By repeating the same process, we get

choose two min frequencies and merge
By repeating the same process, we get

choose two min frequencies and merge













Monday, 25 November 2019

An activity-selection problem


An activity-selection problem

Let set S = {a1, a2, . . . ,an}is of ‘n’ proposed activities that wish to use a resource, which can serve only one activity at a time. 

Each activity ai has a start time si and a finish time fi, where 0 ≤ si < fi < ∞. Assume that the activities are sorted in monotonically increasing order of finish time: f1   f2 ≤ . . . ≤ fn-1   fn.


In activity-selection problem, aim is to select a maximum-size subset of mutually compatible activities. Activities ai and aj are compatible if the intervals [si, fi ) and [sj, fj ) do not overlap.

The optimal substructure of the activity-selection problem
Let us denote by Sij the set of activities that start after activity ai finishes and that finish before activity aj starts. 
To find a maximum set of mutually compatible activities in Sij , suppose that Aij is a maximum set , which includes some activity ak
By including ak in an optimal solution, there are two sub problems: 
i) finding mutually compatible activities in the set Sik (activities that start after activity ai finishes and that finish before activity ak starts) and, 
ii) finding mutually compatible activities in the set Skj (activities that start after activity ak finishes and that finish before activity aj starts).
Let Aik = Aij ∩ Sik and Akj = Aij ∩ Skj , so that Aik contains the activities in Aij that finish before ak starts and Akj contains the activities in Aij that start after ak finishes.
Thus, we have Aij = Aik U {ak} U Akj , and so the maximum-size set Aij of mutually compatible
activities in Sij consists of |Aij | = |Aik| + |Akj | + 1 activities.
If we denote the size of an optimal solution for the set Sij by c[i, j ], then we would have the recurrence 




If an optimal solution for the set Sij includes activity a
is not known then all activities in Sij has to be examined to find which one to choose, so that

A recursive algorithm can be developed and memoize it, or work bottom-up and fill in table entries. 

But another important characteristic of the activity-selection problem that is overlooked is to use the greedy choice.

Making the greedy choice
The greedy choice for the activity-selection is to choose an activity that leaves the resource available for as many other activities as possible.

The choice is to choose the activity in S with the earliest finish time, since that would leave the resource available for as many of the activities that follow it as possible. Since the activities are sorted in monotonically increasing order by finish time, the greedy choice is activity a1. If greedy choice is made, only one remaining sub problem to solve is finding activities that start after a1 finishes. There is no need to consider activities that finish before a1 starts.
The algorithm works like :
1) Select the first activity from the sorted array and put in a set A.
2) Do following for remaining activities in the sorted array.
       a) If the start time of second activity is greater than or equal to the finish time of                           previously selected activity then select this activity and put it in a set A.

The procedure GREEDY-ACTIVITY-SELECTOR assumes that the input activities are ordered by monotonically increasing finish time. It collects selected activities into a set A and returns this set when it is done. GREEDY-ACTIVITY-SELECTOR schedules a set of n activities in θ (n) time.

Example:



In above figure highlighted activities a1,a3,a6,a8 form the maximum subset of mutually compatible activities since there start finish times are greater than start times of already selected activities.


Optimal binary search trees

Optimal binary search trees (Dynamic programming)

An optimal binary search tree is a binary search tree for which the nodes are arranged on levels such that the tree cost is minimum.

Suppose “n” keys k1, k2, …, k n are stored at the internal nodes of a binary search tree. It is assumed that the keys are given in sorted order, so that k1< k2 < … < kn. For each key ki , we have a probability pi that a search will be for kiSome searches may be for values not in K, and so we also have n + 1 “dummy keys” d0, d1, d2, . . . ,dn representing values not in K. For each dummy key di , we have a probability qi that a search will correspond to di
Figure shows binary search tree for a set of n = 5 keys. Each key ki is an internal node, and each dummy key di is a leaf. Every search is either successful (finding some key ki ) or unsuccessful (finding some dummy key di), and so we have
Because binary search tree has probabilities of searches for each key and each dummy key, the expected cost of a search can be determined.

For a given set of probabilities, a binary search tree whose expected search cost is smallest can be constructed. And such a tree is called an optimal binary search tree.

The number of binary trees with n nodes is (4n / n3/2), and to examine an exponential number of binary search trees in an exhaustive search.

Solving this problem with dynamic programming:

Step 1: The structure of an optimal binary search tree

If an optimal binary search tree T has a sub tree Tl containing keys ki . . . . .kj , then this sub tree Tl must be optimal as well for the sub problem with keys ki . . . . ,kj and dummy keys di-1, . . . ,dj
The optimal substructure is used to show that an optimal solution can be constructed to the problem from optimal solutions to sub problems.

Step 2: A recursive solution

Pick subproblem domain as finding an optimal binary search tree containing the keys ki , . . . ,kj, where i ≥ 1, j ≤ n, and j = i - 1.
Let e[ i, j ] as the expected cost of searching an optimal binary search tree containing the keys ki , . . . ,kj .
The easy case occurs when j = i - 1. Then we have just the dummy key di-1. The expected search cost is e[i, i – 1] = qi-1.
When j ≥ i , there is a need to select a root kr from ki , . . . ,kj and then make an optimal binary search tree with keys ki , . . . ,kr-1 as its left sub tree and an optimal binary search tree with keys kr+1, . . . ,kj as its right sub tree.
For a sub
tree with keys ki , . . . ,kj , let us denote this sum of probabilities as

Thus, if kr is the root of an optimal subtree containing keys ki , . . . ,kj, then
Choose the root that gives the lowest expected search cost, giving final recursive formulation:

Step 3: Computing the expected search cost of an optimal binary search tree

The pseudo code that follows takes as inputs the probabilities p1, . . . ,pn and q0, . . . ,qn and the size n, and it returns the tables e and root.

The OPTIMAL-BST procedure takes θ (n3) time

Example:









Greedy Algorithms


Greedy Algorithms



A greedy algorithm always makes the choice that looks best at the moment. That is, it makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution.
Greedy algorithms do not always yield optimal solutions, but for many problems they do. The greedy method is quite powerful and works well for a wide range of problems.

Greedy is an algorithmic paradigm that builds up a solution piece by piece, always choosing the next piece that offers the most obvious and immediate benefit. Greedy algorithms are used for optimization problems. 

An optimization problem can be solved using Greedy if the problem has the following property: At every step, we can make a choice that looks best at the moment, and we get the optimal solution of the complete problem.

If a Greedy Algorithm can solve a problem, then it generally becomes the best method to solve that problem as the Greedy algorithms are in general more efficient than other techniques like Dynamic Programming. But Greedy algorithms cannot always be applied.

Elements of the greedy strategy

Greedy algorithms are designed according to the following sequence of steps:
1. Cast the optimization problem as one in which we make a choice and are left with one sub problem to solve.
2. Prove that there is always an optimal solution to the original problem that makes the greedy choice, so that the greedy choice is always safe.
3. Demonstrate optimal substructure by showing that, having made the greedy choice, what remains is a sub problem with the property that if we combine an optimal solution to the sub problem with the greedy choice we have made, we arrive at an optimal solution to the original problem.

Elements of the greedy strategy are
2.      Optimal substructure

1)      Greedy-choice property: A globally optimal solution can be arrived at by making a locally optimal (greedy) choice. Here is where greedy algorithms differ from dynamic programming. In dynamic programming, we make a choice at each step, but the choice may depend on the solutions to sub problems. In a greedy algorithm, we make whatever choice seems best at the moment and then solve the sub problems arising after the choice is made.

2)     Optimal substructure: If an optimal solution to the problem contains within it optimal solutions to sub problems


Sunday, 24 November 2019

Longest Common Subsequence

Longest Common Subsequence

In the longest-common-subsequence problem, given two sequences X = < x1, x2, . . , xm > and Y = < y1, y2, . . . , yn > and wish to find a maximum length common subsequence of X and Y.

Given two sequences X and Y , we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y . 

For example, if X = < A, B,C, B,D,A,B > and Y = < B,D,C,A, B,A >, the sequence < B,C,A> is a common subsequence of both X and Y . The sequence < B,C,A > is not a longest common subsequence (LCS) of X and Y , however, since it has length 3 and the sequence < B,C, B,A >, which is also common to both X and Y , has length 4. The sequence < B, C, B, A> is an LCS of X and Y, as is the sequence < B, D, A, B>,since X and Y have no common subsequence of length 5 or greater.


Solving the LCS problem using dynamic programming

Step 1: Characterizing a longest common subsequence
The LCS problem has an optimal-substructure property.
Theorem (Optimal substructure of an LCS)
Let X = < x1, x2, . . . , xm > and Y = <y1, y2, . . . , yn > be sequences, and let Z = < z1,z2, . . . ,zk > be any LCS of X and Y .
1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1.
2. If xm ≠ yn, then zk ≠ xm implies that Z is an LCS of Xm-1 and Y.
3. If xm ≠ yn, then zk ≠ yn implies that Z is an LCS of X and Yn-1.
The way that Theorem characterizes longest common subsequences tells us that an LCS of two sequences contains within it an LCS of prefixes of the two sequences. Thus, the LCS problem has an optimal-substructure property.
Step 2: A recursive solution
Finding an LCS of X = < x1, x2, . . . , xm > and Y = < y1, y2, . . . , yn >. 
If xm = yn, we must find an LCS of Xm-1 and Yn-1. Appending xm = yn to this LCS yields an LCS of X and Y . 
If xm ≠ yn, then we must solve two subproblems: finding an LCS of Xm-1 and Y and finding an LCS of X and   Yn-1. Whichever of these two LCSs is longer is an LCS of X and Y.

To find an LCS of X and Y, we may need to find the LCSs of X and Yn-1 and of Xm-1 and Y. But each of these subproblems has the subsubproblem of finding an LCS of Xm-1 and Yn-1.
The optimal substructure of the LCS problem gives the recursive formula




C [i, j ] to be the length of an LCS of the sequences Xi and Yj
Step 3: Computing the length of an LCS
Based on equation, an exponential-time recursive algorithm can be written to compute the length of an LCS of two sequences. Since the LCS problem has only θ(mn) distinct subproblems, however,dynamic programming can be used to compute the solutions bottom up.


















Procedure LCS-LENGTH takes two sequences X = < x1, x2, . . . , xm > and Y = < y1,y2, . . . ,yn > as inputs. It stores the c[ i, j ] values in a table c [ 0 . . m, 0 . . n]. The procedure also maintains the table b [1 . . m, 1 . . n ] to help us construct an optimal solution. The procedure returns the b and c tables; c [m, n] contains the length of an LCS of X and Y.

Step 4: Constructing an LCS
The b table returned by LCS-LENGTH enables us to quickly construct an LCS of X = < x1, x2, . . . , xm > and Y = < y1, y2, . . . , yn >. We simply begin at b [m, n] and trace through the table by following the arrows. Whenever we encounter a 
 in entry b[i, j ], it implies that xi = yj is an element of the LCS that LCS-LENGTH found.

With this method, we encounter the elements of this LCS in reverse order. The following recursive procedure prints out an LCS of X and Y in the proper, forward order. The initial call is PRINT-LCS(b,X,X.length, Y.length).


Example: