Log In
28 votes
A message is made up entirely of characters from the set $X=\{P, Q, R, S, T\}$. The table of probabilities for each of the characters is shown below:$$\begin{array}{|c|c|}\hline \textbf{Character}  &  \textbf{Probability } \\\hline  \text{$P$} & \text{$0.22$} \\\hline  \text{$Q$} & \text{$0.34$} \\\hline  \text{$R$} & \text{$0.17$} \\\hline \text{$S$} & \text{$0.19$} \\\hline  \text{$T$} & \text{$0.08$} \\\hline \text{Total} & \text{$1.00$} \\\hline  \end{array}$$If a message of $100$ characters over $X$ is encoded using Huffman coding, then the expected length of the encoded message in bits is ______.
in Algorithms
edited by

@Bikram sir whts wrong with this m just multiplying 100 .



Your Tree is Not correct ..

After 25 become root node ..there nodes are at that point --> 25, 19, 22, 34 and 100 .

Rearrange them in increasing order, it become 19, 22, 25, 34 , 100 .

so 19 and 22 make another sub tree . 25 and 34 make another sub tree ....

hope you get where you did mistake, you forget to rearrange them once again.. after each subtree make we need to rearrange each avaailable nodes .

Why this question is not in GO book? I downloaded the latest version but did not find it.
this is in network ..vol 3
ROOT node should be of 100 because sum of all character i.e P,Q,R,S,T is 100 and it totally wrong that you r getting 200..

4 Answers

35 votes
Best answer

$X = \{ P, Q, R, S, T\}$

$∴ \text{Expected length of an encoded character}$
$\qquad \qquad= (0.22 \times 2) + (0.34 \times 2) + (0.17 \times 3) + (0.19 \times 2) + (0.08 \times 3) \hspace{0.1cm} \text{ bits }$
$\qquad \qquad= 0.44 + 0.68 + 0.51+ 0.38 + 0.24 \hspace{0.1cm} \text{ bits}$
$ \qquad \qquad= 2.25 \hspace{0.1cm} \text{ bits } $

$\therefore \text{Expected length of a encoded message of $100$ characters in bits} = 100 \times 2.25 = 225$

edited by
ur tree structure is different from the best answer !!! is it ok ??

When constructing huffman tree , everytime we delete two shortest elements from the min in that way convention is like choose shortest and then next shortest and now add those and insert this as a new node in our min heap.So we can observe that every time we are choosing the shortest element so hence always shortest element will become left sibling og next shortest node..

You can see the code here

During huffman tree construction the calls are like

left = extractMin(minHeap);
right = extractMin(minHeap);
So the tree constructed by "2018" is wrong , as it can't happen that sometime your algorithm is making shortest node as left child and sometime right.Although the answer won't be effected as hight of nodes won't effect in that manner and in that question only hight of nodes matter.
Hoffman tree constructed by akash is right for me.
why is not the expected length 225/100 = 2.25 bits. expected means we need to calculate average length of code, right?

 the expected length here does not mean the avg no of bits required to encode 1 character, it is total bits required to encode 100 characters.

@amitqy I think you're right, but generally expected means average. and when a question is NAT, it's difficult to guess that what do they mean?


it's very silly i think, how avg length of 100 character would be 2.25, if the characters are distinct. For one character you need at least 1 character to encode, isn't it. Some character vary by their prefix so it always be > 100. And if they ask avg length required by per character then this is a different context , 

hence it would be (weighted external path length / total frequency of each character )

Thank you sir
Your point important most of us make this mistake
35 votes

so ans is 225

how d bits r calculated or tree is been prepared
in short:
1. arrange all in increasing order of there frequency.
2. extract 2 with min frequency and make them as a leaf node and root node will be obtained by summing both leaf node. rearrange all remaining frequency including newly created root node and go to step 2
4. repeat above steps.
I think the diagram is not correct, However, the calculation is fine.

0.41 should be left child of 1 and 0.59 will be right child of 1. Apart from that, it looks fine.

@Arjun Sir, Can you please suggest?

I am so sorry... T is 100

This one is correct :-) 




I understood your tree solution .

Please help me in understanding what is difference between average length and average lengthy per character and how do we get to know in question we have to find which one avg lenght or avg length per character.
@mayank I also have the same doubt if the expected length is asked then answer should be 225/100 = 2.25, expected means average value.
@Rupendra @akash

Rupendra's comment should be added in the best answer.

As it is the mistake done by 90% people including me.
7 votes

To create the Huffman tree, always increase the nodes in ascending order, and merge the first two nodes.

Step 1)

Given: $\begin{bmatrix} P & Q &R &S &T \\ 0.22 &0.34 &0.17 &0.19 &0.08 \end{bmatrix}$ → $\begin{bmatrix} T & R &S &P &Q \\ 0.08 &0.17 &0.19 &0.22 &0.34 \end{bmatrix}$ → $\begin{bmatrix} TR &S &P &Q \\ 0.25 &0.19 &0.22 &0.34 \end{bmatrix}$


Step 2)

$\begin{bmatrix} TR &S &P &Q \\ 0.25 &0.19 &0.22 &0.34 \end{bmatrix}$ → $\begin{bmatrix} S &P&TR &Q \\ 0.19 &0.22 &0.25 &0.34 \end{bmatrix}$ → $\begin{bmatrix} SP&TR &Q \\ 0.41 &0.25 &0.34 \end{bmatrix}$


Step 3)

$\begin{bmatrix} SP&TR &Q \\ 0.41 &0.25 &0.34 \end{bmatrix}$ → $\begin{bmatrix} TR&Q &SP \\ 0.25 &0.34 &0.41 \end{bmatrix}$ → $\begin{bmatrix} TRQ &SP \\ 0.59 &0.41 \end{bmatrix}$


Step 4)

Finally merge these two.

Hence, the Huffman Tree would look like this:-


Length  2.25 per character. Given, there are 100 characters in the message, so,



Detailed and clean <3
3 votes

edited by
225 is right ans as they have asked for expected length for endcoded msg and msg conatins 100 char.

Related questions

44 votes
7 answers
Consider the following snippet of a C program. Assume that swap $(\&x, \&y)$ exchanges the content of $x$ and $y$: int main () { int array[] = {3, 5, 1, 4, 6, 2}; int done =0; int i; while (done==0) { done =1; for (i=0; i<=4; i++) { if (array[i] < array[i+1] ... if (array[i] > array[i-1]) { swap(&array[i], &array[i-1]); done =0; } } } printf( %d , array[3]); } The output of the program is _______
asked Feb 14, 2017 in Programming Arjun 8.3k views
90 votes
16 answers
Two transactions $T_1$ and $T_2$ are given as $T_1:r_1(X)w_1(X)r_1(Y)w_1(Y)$ $T_2:r_2(Y)w_2(Y)r_2(Z)w_2(Z)$ where $r_i(V)$ denotes a $\textit{read}$ operation by transaction $T_i$ on a variable $V$ and $w_i(V)$ denotes a $\textit{write}$ operation by transaction $T_i$ on a variable $V$. The total number of conflict serializable schedules that can be formed by $T_1$ and $T_2$ is ______
asked Feb 14, 2017 in Databases Madhav 35.4k views
32 votes
6 answers
If the characteristic polynomial of a 3 $\times$ 3 matrix $M$ over $\mathbb{R}$ (the set of real numbers) is $\lambda^3 – 4 \lambda^2 + a \lambda +30, \quad a \in \mathbb{R}$, and one eigenvalue of $M$ is 2, then the largest among the absolute values of the eigenvalues of $M$ is _______
asked Feb 14, 2017 in Linear Algebra Madhav 6.5k views
26 votes
9 answers
Consider a machine with a byte addressable main memory of $2^{32}$ bytes divided into blocks of size 32 bytes. Assume that a direct mapped cache having 512 cache lines is used with this machine. The size of the tag field in bits is _______
asked Feb 14, 2017 in CO and Architecture Madhav 5.2k views