We can reform this problem in another problem like 'n total slots and n total keys and we have to find expected number of colliding pairs in uniform hashing'.

Dark Mode

1,217 views

9 votes

There are $n$ balls $b_1, \dots ,b_n$ and $n$ boxes. Each ball is placed in box chosen independently and uniformly at random. We say that $(b_i, b_j)$ is a $\textit{colliding pair}$ if $i<j$, and $b_i$ and $b_j$ are placed in the same box. What is the expected number of $\textit{colliding pairs}$?

- $\frac{n-1}{2}$
- $0$
- $1$
- $\frac{n}{4}$
- $\begin{pmatrix} n \\ 2 \end{pmatrix}$

7 votes

Best answer

Correct Option: **A**

Let $X_{ij}$ be an indicator random variable, such that

$X_{ij}=\left\{ \begin{array}{rcl} 1 & \text{if collide} \\ 0 & \text{otherwise} \end{array}\right. $

$\therefore$ Probability that $i$ and $j$ both hash to the same slot.

$P_r(X_{i,j})=\frac{1}{n}$

$\implies E[X_{i,j}]=\frac{1}{n}$

Now,

$\text{E[No. of Colliding pairs]}= E\left[\displaystyle\sum_{i=1}^n \sum_{j=i+1}^n X_{i,j}\right]$

$\qquad \qquad \qquad = \displaystyle\sum_{i=1}^n \sum_{j=i+1}^n E[X_{i,j}]$

$\qquad \qquad \qquad =\displaystyle\sum_{i=1}^n \sum_{j=i+1}^n\frac{1}{n}$

$\qquad \qquad \qquad =\dfrac{n(n-1)}{2n}$

$\qquad \qquad \qquad =\dfrac{(n-1)}{2}$

4 votes

The balls b1,b2,....,bn are put in the box uniformly and randomly.

b1 has no chance of collison.

probability of b2 colliding=1/n.

prob of b3 colliding =2/n (j=3; i=1 or 2).

prob of b4 colliding = 3/n.. So on..

.

.

prob of bn colliding = (n-1)/n.

Therefore, expected no of colliding pairs

= 1/n+2/n+3/n+....+(n-1)/n = 1/n*[1+2+3+..+(n-1)] =1/nX[n*(n-1)/2]=(**n-1)/2 (A)**

1 vote

Actually I have a different understanding of the problem.

I felt the other approaches discussed here misses an important point though they arrive at the correct answer.

Here is how I understood the problem and have arrived at the solution.

Lets model the problem as follows:

"ABCD" means b1 is given to box A b2 is given to box B b3 is given to box C and b4 to box D.In this case there is no collision pair.

"AABB" means b1 is given to box A b2 is given to box A b3 is given to box B and b4 to box B.In this case there are two collision pairs.One from Box A and other from Box B.

"AAAA" has 4*3/2 collision pairs

and so on.

So different strings lead to different collision pairs.The point other approaches are missing is that for a string like "AABB" collision pairs can be coming from two different boxes too.

The question asked here is not about the "expected number of collision pairs for a single box" but for the

entire distribuion of n balls across n boxes.In other words if I take a random string what is the expected number of collision pairs that could come from all boxes.

Also if there are r balls in a box then number of collision pairs that could come is r*(r-1)/2=rC2

So E(CP)=Total of Cp from each random distibution/total number of possible distribution

Total number of possible distribution=n^n

Total of Cp from each random distibution is done by choosing 1 alphabet out of n alphabets after which selcting a r positions out of n positions and placing that alphabet.After which the remaining n-1 alpahbets are filled.

Since there are r repetitions of the chosen alphabet so there are r*(r-1)/2 CP with respect to chosen alphabet.

(Observe that the permutation like AABB will be counted twice one wrt A and other wrt to B and in each case 1 will be added making total Cp as 2)

Total of Cp from each random distibution=

$$ (\sum_{i=2}^n(nCr*(n-1)^{(n-r)}*rC2))*n$$

$$(\sum_{i=2}^n(n*(n-1)*n-2Cr-2*(n-1)^{(n-r)}))*n*1/2$$

$$(\sum_{i=2}^n(n-2Cr-2*(n-1)^{(n-r)}))*n*n*(n-1)*1/2$$

$$((n-1+1)^{n-2})*n*n*(n-1)*1/2) [Bionomial theorem]$$

$$ So E(CP)=(n^{(n-2)})/n^n *n*n*(n-1)*1/2 $$

$$ E(CP)=(n-1)/2 $$

I felt the other approaches discussed here misses an important point though they arrive at the correct answer.

Here is how I understood the problem and have arrived at the solution.

Lets model the problem as follows:

"ABCD" means b1 is given to box A b2 is given to box B b3 is given to box C and b4 to box D.In this case there is no collision pair.

"AABB" means b1 is given to box A b2 is given to box A b3 is given to box B and b4 to box B.In this case there are two collision pairs.One from Box A and other from Box B.

"AAAA" has 4*3/2 collision pairs

and so on.

So different strings lead to different collision pairs.The point other approaches are missing is that for a string like "AABB" collision pairs can be coming from two different boxes too.

The question asked here is not about the "expected number of collision pairs for a single box" but for the

entire distribuion of n balls across n boxes.In other words if I take a random string what is the expected number of collision pairs that could come from all boxes.

Also if there are r balls in a box then number of collision pairs that could come is r*(r-1)/2=rC2

So E(CP)=Total of Cp from each random distibution/total number of possible distribution

Total number of possible distribution=n^n

Total of Cp from each random distibution is done by choosing 1 alphabet out of n alphabets after which selcting a r positions out of n positions and placing that alphabet.After which the remaining n-1 alpahbets are filled.

Since there are r repetitions of the chosen alphabet so there are r*(r-1)/2 CP with respect to chosen alphabet.

(Observe that the permutation like AABB will be counted twice one wrt A and other wrt to B and in each case 1 will be added making total Cp as 2)

Total of Cp from each random distibution=

$$ (\sum_{i=2}^n(nCr*(n-1)^{(n-r)}*rC2))*n$$

$$(\sum_{i=2}^n(n*(n-1)*n-2Cr-2*(n-1)^{(n-r)}))*n*1/2$$

$$(\sum_{i=2}^n(n-2Cr-2*(n-1)^{(n-r)}))*n*n*(n-1)*1/2$$

$$((n-1+1)^{n-2})*n*n*(n-1)*1/2) [Bionomial theorem]$$

$$ So E(CP)=(n^{(n-2)})/n^n *n*n*(n-1)*1/2 $$

$$ E(CP)=(n-1)/2 $$