Then I don't need to divide by $\mathbf{100}$ right?

5 votes

A language uses an alphabet of six letters, $\left\{a, b, c, d, e, f\right\}$. The relative frequency of use of each letter of the alphabet in the language is as given below: $$\begin{array}{|c|c|}\hline \text{LETTER} & \text{RELATIVE FREQUENCY OF USE}\\\hline a & 0.19 \\\hline b & 0.05 \\\hline c & 0.17 \\\hline d & 0.08 \\\hline e & 0.40 \\\hline f & 0.11 \\\hline \end{array}$$Design a prefix binary code for the language which would minimize the average length of the encoded words of the language.

9 votes

Best answer

$a-0.19,b-0.05, c-0.17,d-0.08,e-0.40,f-0.11$

Since it is relative frequency we can multiply each with $100$ and result remains the same.

$a-19,b-5, c-17,d-8,e-40,f-11$

For Assigning prefix binary code lets first create the Huffman tree

$(1) \ a-19,{\color{Red} {b-5} },c-17,{\color{Red} {d-8} },e-40,f-11$

$(2) \ a-19,{\color{Red} {(b,d)-13} },c-17,e-40,{\color{Red} {f-11} }$

$(3) \ {\color{Red} {a-19} },(b,d,f)-24,{\color{Red} {c-17} },e-40$

$(4) \ {\color{Red} {(a,c)-36,(b,d,f)-24}},e-40$

$(5) \ {\color{Red} {(a,b,c,d,f)-60,e-40}}$

Now put $0$ on each of the left edges and $1$ on each of the right edges as shown below

$\text{Prefix code}$ = Traverse from the root node to the leaf node and write the symbol $(1 \ or \ 0)$ present on each edge.

- For $a$ the prefix code is $111$ i.e. $3$ bits
- For $b$ the prefix code is $1010$ i.e. $4$ bits
- For $c$ the prefix code is $110$ i.e. $3$ bits
- For $d$ the prefix code is $1011$ i.e. $4$ bits
- For $e$ the prefix code is $0$ i.e. $1$ bits
- For $f$ the prefix code is $100$ i.e. $3$ bits

And the average length of encoded words $=\dfrac{(19\times 3)+(5\times 4)+(17\times 3)+(8\times 4)+(40\times 1)+(11\times 3)}{100}$

$\quad =\dfrac{57+20+51+32+40+33}{100}$

$\quad =\dfrac{233}{100}$

$\quad =2.33$

0

What if I directly take the probability and multiply with the no. of bits??

Then I don't need to divide by $\mathbf{100}$ right?

Then I don't need to divide by $\mathbf{100}$ right?

0

It may be a silly doubt but I sometimes feel that why aren't be dividing by $\mathbf 6$, that is, the number of symbols?

3 votes

Huffman coding is to find prefix of letters,is to compress the message size over ASCII code.

To find length of encoded message find Huffman tree and sum of all internal nodes gives the message length.

First find minimum tress in the forest and remove it add to huffman tree.Add two minimum trees and add in the forest.

Repeat this procedure until forest become single tree.

Length of message=1+0.60+0.24+0.36+0.13

=2.33