2k views
Give an optimal algorithm in pseudo-code for sorting a sequence of $n$ numbers which has only $k$ distinct numbers ($k$ is not known a Priori). Give a brief analysis for the time-complexity of your algorithm.

edited | 2k views
+4

Watch it ...

Answer should be counting sort which will take $O(n+k)$ time.

See here:

http://www.geeksforgeeks.org/counting-sort/

by
edited
+8

Counting sort is a sorting technique based on keys between a specific range. It works by counting the number of objects having distinct key values (kind of hashing)."  source GFG

also counting sort needs key values within some range . This assumption is not given in the question !!

But somehow i think O(n+k) is time complexity of correct algorithm .

+1
I think by saying   O(n+k) he means   ...  'k' is the maximum value....  but here already one k is there.. So
counting sort takes O(n+m)....where m is the maximum value.....  but still I do not know whats the answer.
+1
As here k is given in question, i finally go with counting sort with O(n+k) time complexity.

As the definition of counting sort says "Counting sort is an algorithm that takes an array A of n elements in the range {1, 2, ..., k} and sorts the array in O(n + k) time " reference MIT slide .

An extra array ( auxiliary array with k elements) maintain by counting sort .
+1
Yes..Now I agree with the answer....but here in qsn k means something else. So should not a different variable be used like m instead of k? @Bikram sir.
+1
@ashwan

No , here k means different kind of elements among n elements . Also counting sort definition says the same "Counting sort is an algorithm that takes an array A of n elements in the range {1, 2, ..., k} " here range means different type of elements like 1,2,3,..,k  ..k different elements .

+2
@bikram sir,  So this is not a possible input sequence here?     5,100,1000,5,100,10000   where k=4 distinct elements but maximum value=10000
+1
@ahwan

Yes, this is possible sequence.

This question indirectly ask about counting sort..see the definition of counting sort an this question, you will understand, you may read that MIT slide..google it
+1
@Bikram sir.I don't agree with the selected answer.Here k is not the range so how can we use counting sort?

If my array is :- 1,2,3,1000.

Now my k=4 as 4 distinct numbers but maximum number is 1000. K can be 1 or it can be n(total elements),so it is a general case ,array can have 1 distinct element or it can have n distinct elements and it has nothing to do with range.So i will suggest answer as quicksort here as it is the most practical used algorithm
0
@rahul

can u chk my ans once
0
@ahwan

1 to k here index number of the array. That may be array element or maynot. That is not told in question
0
So, complexity should be nlogn
+1

Counting sort operate on k empty lists

but in this question asking about only k elements , then how counting sort will be answer? Ref here

+1
counting sort is applicable when we already know the range of the values of those elements which we have to sort but given question is not giving any information related to it . Then how counting sort is applicable here. anyone please explain.
0

@ankitgupta.1729

last part of counting sort not clear to me. Can u explain me once how do we reducing counting sort array again from $n$ elements to $k$ elements?

Why range is important? If we know the value of $n$ and $k$, we can apply counting sort here. Isit not??

0
can this not be done in o(klogk + n) time, using perfect hashing and any comparison sort algorithm? We hash all the n numbers, storing count of the numbers during hashing. Then sorting only the k distinct numbers using any comparison sort algorithm, after getting the sorted array, then recreating the whole sorted array from the hash using the count stored.
0

@srestha

We need to know the range in counting sort. Reference: https://stackoverflow.com/questions/45489527/what-is-k-in-counting-sort-onk-time-complexity

@Aayush Tripathi

I think we can use your approach if we know the values of these 'k' distinct elements in prior.

As in the question they have given 'k' is not known prior, I assume that the values of these 'k' distinct elements is also not known in prior. I think the best approach here would be to use any comparison based sort that gives complexity as O(n log n).

0

@rohith1001, no we don't need to know the k distinct elements in prior, that's why we are using perfect hashing.

use a hash map with chaining to sort all the numbers in $\mathcal{O}(n)$ time and constant space.
0
What would be the size of hash table...we have not given the value of k
+1
Even if we will use 'n' slots in hashing..... that mod n can send all value to one slot...irrespective of what is the value of k...  So hashing should not be used @Bikram Sir.
0
yes correct. hashing can not be used here.

Size for hash Table is n , it depends upon slots.
O(n) is the answer here .

Here k<= n . worst case k = n possible.
+1
@Bikrm Sir, how it can be done in O(n) ? No comparison based sorting can perform better than O(nlogn)..
Only counting, radix, bucket can do that..
But they are used for special cases .
For example counting can be used when the elements are in small range only.
So answer is O(n+m) where m is the maximum value.
So how it can be done in O(n)?
0
yes,  no comparison based sorting can perform better than O(nlogn).. but here in worst case you have to check all n elements right ? that's why O(n)

and in your answer from where ' m ' comes ? it is not mention in question.

say numbers are in this sequence given 1,3,6,3,6  then n=5 and k =3 so what is 'm 'here ? do you mean m= 6 here ?
0
yes sir, I mean m is the maximum element...
+1 vote

We can write code like this

int distinctValue(int a[ ] , int n)
{
int k=0,value,distinct;
for(i=0 ; i<10 ; i++)
{
if(max<value[i])
max=value[i];
}
for(i=0; i<max; i++)
{
distinct[value[i]]=value[i];
}
for(i=0; i<max; i++)
{
distinct[i]=distinct[i-1]+distinct[i];
}
return 0;
}


So, complexity will be $O\left ( n \right )$

by
edited by
+2
@ sreshta it is not sorted. there is a sequence of n numbers which has only k distinct numbers.

ex: 1,2,1,3,4,2,1,2,3, here n = 9 & k = 4
+1
Answer must be O(n+k) as for any k distinct numbers k always < and equal to ( in worst case ) n .

By using counting sort it gives O(n+k)

But elements are not sorted , we need to make a sorted sequence.
0

@srestha

@ankitgupta.1729

Here k is not the range.so how will the value of k affect the time complexity?

Counting sort is more efficient   when we have single digits(otherwise it uses large space even for small values of n and the value of k in O(n+k) increases.)
For numbers with more than 1 digit we use radix sort..

the maximum value can also be n^3  ,like 01,01,27.So there are k distinct values`.Now the maximum number of digits is log m(base 10).[m is the maximum value in the sequence].

so using radix sort we could actually use  log m(n+constant)=>n log m

now the time complexity depends on the highest number. if m=n^2 then it is O(n logn)
if m=2^n then it is  O(n^2)
if m=n^n then it  is O(n^2 log n)

Then the time complexity depends on the highest value.

So we could actually use mergesort /heapsort which gives O(n log n)every time.

0
Yes modification required here.

Can u tell me Counting sort in algo format, because when I was writing the code getting difficult in code.
0

@Doraemon

More than one digit case , also counting sort works. but it mainly works for repetition of same digits. right??

0

@srestha See this code snippet from CORMEN.

O(n+k) is the time complexity .It takes O(k ) time to calculate the cumulative sum.(steps 7,8,9).

It works for numbers having any number of digits and for any value of n.

But it is not very efficient when we consider numbers where the value of n is less but the value of k is large and the difference between the numbers is large .like {1,4,5,50,3125} here the size of C will be 1000 even though the number of terms is 5...

Then time complexity in the above case is (n+k)here k=n^5 hence the time complexity of counting sort would be  O(n^5).which is large.

More than one digit case , also counting sort works. but it mainly works for repetition of same digits. right??

Counterexample :{2,2,2,2,2,2,2,2,2,10000} here n=10,k=10000
Here 2 is repeated many times and only 1 time we will have 10000 but still, the size of C will be  10000 which is O(n^4)
and the time complexity will be  O(n^4).
Again counting sort works well when the values are near to each other.(or concentrated within a specified range and the range is small).

0
But in the given question it says k distinct numbers..It can be (0,1,2,3,1000) or (0,1,2,3,4). So by just knowing #distinct numbers how can we apply counting sort?