63 views
Which data structure is most efficient to find the top 10 largest items out of 1 million items stored in file? The answer given is Min-heap anybody please explain?
in DS | 63 views
0
What I think is..

Min heap with 1 million nodes contain approx 20 levels. There are approximately half a million Leaf nodes. Highest one must be present at the last level. Second highest must be present in either 2nd last or last level likewise top 10 elements must be present in level 10-20( if items are repeated). This will be the worst case as most no. Of comparisons are done here. So many nodes are involved in this!!

But I think max heap would have been better here. The root node will contain the max element. Extract it and then maxheapify the heap. Then the 2nd highest would appear at the root node. Repeat this process 10 times. Maxheapify takes logn which is approximately 20. So, 10logn=200.

Feel free to suggest if anyone finds something wrong.

Suppose sorted file contains following numbers

0,0,0,0 (many times) , 1,1,1,1.... (many times), 2,2,2,2(many times).... 100,100,100,100 (many times)

Now if we want to find top 10 largest numbers then obviously it lies on last 10 indexes of sorted list.

So by sorted list, we can get the answer in O(1) time.

In Min heap, following operations need to be performed

1) Build min heap of size 10 ---> O(10) - constant

2) n-10 comparisons and each time heapify algorithm in the worst case. --

n-10 comparisons + n-10 times heapify algorithms

O(n-10) + O(n-1*log 10) = O(n)+O(n*log 10) = O(n)
by Active

+1 vote