Also can you explain in words what the algorithm is doing and its complexity?

The Gateway to Computer Science Excellence

+2 votes

#include<stdio.h> void kunique(char s[], int k) { int i,j,n,m,u=0; int last=0,current=0,flag=0; int maxlength=0; for(i=0;i<n;i++) { for(j=0;j<i;j++) { if(s[i]==s[j]) //if in the time of 2nd for loop any element of 1st for loop matches continue; // then skipping that iteration and continue with next value of j else{ if(u<(k-1)) //when both charecter is different or unique (u) and u less than maximum if(j<=(i-1)) //window size then increasing u value and for whole j loop iteration flag=1;//if no charecter matches with a[j], only then u will increase else { last=(i-1); //when u>=(k-1) i.e. unique charecter more than window size , then last charecter in the window is set if((current-last)> maxlength)//if current window size is previous window size then we update maxlength maxlength=(current-last); for(m=current;m<last;m++) { if(s[current]=s[m])//here we are updating current pointer i.e. current=m++;//previously current pointer poining to first charecter of "abcabcdeef" //now it is pointing to first charecter of "bcdeef" u--;//number of unique charecter reduces by 1 , as window shifted here } } } if(flag==1) u++; printf("%s",s[current-last]); } int main() { char s[20]="abcabcdeef"; int k=3; kunique(s[],k); return 0; }

0

Good. But whats the continue doing in the code?

Also can you explain in words what the algorithm is doing and its complexity?

Also can you explain in words what the algorithm is doing and its complexity?

+2

You remind a lot of me doing programming or even commenting :)

But it will be good to have a summary of the approach to make a naive user understand what's going. Complexity is correct - it is $O(n^2)$ which is bad for real cases- consider $n = 1,000,000$. If we consider ASCII we can have only $256$ unique characters. Anyway to make the algorithm better? - You can post as a different answer as this is a good working solution.

PS: This is a Google interview question, so answer should be perfect.

But it will be good to have a summary of the approach to make a naive user understand what's going. Complexity is correct - it is $O(n^2)$ which is bad for real cases- consider $n = 1,000,000$. If we consider ASCII we can have only $256$ unique characters. Anyway to make the algorithm better? - You can post as a different answer as this is a good working solution.

PS: This is a Google interview question, so answer should be perfect.

+2 votes

O(kn), if k is constant

=O(n)

Algorithm working->

1. Move in the string from 0th index,. every time next character is encountered then increment index of A[] and add that character and increment the values of B[] (that is counting no. of repeated characters to same index value and distinct character to next index).

2.To find the max span for k unique character. After finding k distinct characters from starting, all values of B array are added to find length of the span and that value is compared with the maximum value returned by remaining array ( i.e. value returned by max_span(i+1,k,j)). Elements which we are considering in the same span have same span_no[] value so that we can add B[] array values for those indexes to find length of the span.

3. Give span_no[] array same value for the indexes which are in same span. now when we move to next span, add B[] values for the indexes which have same span_no[] values and move to next span. But here, add the last element of previous span in the next span ex - "aaaabbaccccc" for 2 unique characters, this string will break to 2 spans "aaaabba" and "accccc" so for next span we will assign span_no[] another (unique) value for all elements in that span.

Taking help of 3 arrays

A[] -> to store all unique characters in an array, array size is in worst case = number of characters in string ( i.e. if all consecutive elements are different if "abcdef" then array size will be 6) if "aaaabba" then 'a' before 'b' is different from 'a' after b so array will store A[0]='a', A[1]='b', A[2]= 'a',

B[] -> number of times a distinct character present in the string , for "aaaabba" this will store -> B[0] =4 ,B[1] = 2,B[2] = 1

span_no[] -> first span possible is "aaaabba",span_no[] will help to tell the elements which are present in current span, so span_no will store -> span_no[0]=0, span_no[1]=0, span_no[2] =0.

int len; // number of unique characters char A[n]; int B[n]; //<- initialize them to 0 int span_no[n], j=0; int max_span(int i,int k, int j) // i = current index in string, k = remaining distinct characters in this span, j= index position for contracted array { if(i==0) { A[j]=string[i]; B[j]++; return max_span(i+1,k-1,j); } else if (i==n) return 0; else if(string[i] == string[i-1]) { A[j]=string[i]; B[j]=+1; return max_span(i+1,k,j); } else //then search for the term which is equal to this character in current span in O(k) { ++j, B[j]+=1; A[j]=string[i]; int temp_i =j-1,found=0; ++j; while((temp_i!=0)&&(span_no[ j ]==span_no[temp_i])) // if that span contains same character earlier also then count this, element in the same span otherwise it will be taken as distinct element in that case if distinct character length is full (i.e. k==0), then add this element in next span { if(A[ j ]==A[temp_i]) found=1; --temp_i; } if(found==1 && k!=0) // if found then add to that span only { span_no[j]=span_no[j-1]; max_span(i+1,k,j); } else if(!found && k==0) //if not found and k is reached 0 then add to next span { temp_j=j-1; add=0; while(span_no[--temp_j]==span_no[j-1]) //count all characters in this span and move to next span { add+=B[temp_j]; } span_no[j-1]+=1; //include last element also in the next span span_no[j]=span_no[j-1]; int u=max_span(i+1,len,j); return max(add , u); } else //if not found and distinct character count is remaining in that span then add to the current span only { span_no[j]=span_no[j-1]; max_span(i+1,k-1,j); } } }

0 votes

Please see here for a similar but simpler one for detailed explanation.

So, here we are interested in a substring which means all the interested characters must be continuous. So, this makes a dynamic programming problem formulation easy. Consider the array index $i$ and let $S[i-1[$ be the maximum length with $k$ unique characters $i-1$. Also let $B[1..k]$ and $C[1..k]$ denote two circular queue of length $k$ containing the last $k$ unique characters and their respective starting indices. Now,

$S[i] = \begin{cases} S[i-1] + 1 \text{if } B \text{ is not full or } S[i] \text{ in } B \\S[i-1] +1 - C[1] \text{ otherwise } \end{cases}$

We also have to do

- For case 1: if(S[i] == S[i-1]) C[k] ++; where $k$ is the index of $S[i]$ in array $C$ and
- For case 2: dequeue(C); dequeue(B); enqueue (B, S[i]), enqueue (C, 1)

0 votes

#include<stdio.h> #define MAX_CHAR 26 /* Here, we have assumed string can only have characters from a-z, i.e. 26 characaters. */ int w[1000]; /* stores max substring length with atmost k unique characters at each element likely to be seen in future. */ int count[MAX_CHAR]; /* keeps count of several characters. */ void kUnique(char[], int, int); /* function that finds max length of substring with k unique characters. */ int main() { char a[1000]; int i, n, k; scanf("%d", &n); /* enter length of string*/ scanf("%s", a); /* enter string */ scanf("%d", &k); /* enter no. of unique characters to be taken under consideration */ kUnique(a, n, k); return 0; } void kUnique(char a[], int n, int k) { int i, j, cur, lastSeen, window, temp, msf; for(i=0; i<n; i++) { if(count[a[i]-'a']==0) { window++; } count[a[i]-'a']++; } if(window<k) { printf("Not enough unique characters"); return; } /* The code segment from line 44-53 will detect if no. of unique characters in the string is less than k. If yes, it will print ERROR msg and return. */ for(i=0; i<n; i++) { for(j=0; j<MAX_CHAR; j++) { /* At beginning of every loop, we set every element of count to 0. */ count[j] = 0; } window = 0; /* At the beginning of each loop , wet window to 0 because we have not seen any substring yet. */ temp = k; /* Also, set temp to 0 because we have not seen any substring yet */ for(j=i; temp>=0 && j<n; j++) { if(count[a[j]-'a']==0) { temp--; if(temp==-1) { break; } else { count[a[j]-'a']++; } } else { count[a[j]-'a']++; } window++; } w[i] = window; } /* Whenever, we see a new character not seen till now, the corresponding value of count will be zero. Then, we will simply decrement temp. Now, if temp becomes -1, then it means that we have seen k unique characters, We then simply break out from the loop. Also, window is used to keep track of length of substring. After coming out of the inner (jth) lop, we set w[i] = window where window is the length of the substring seen just now.*/ msf = w[0]; for(i=1; i<n; i++) { if(w[i]>msf) { msf = w[i]; } } /* At the end, we basically search entire w array to find and print w[i] where w[i] is the max value and denotes the length max substring with k unique characters */ printf("%d", msf); }

**TIME COMPLEXITY**

In worst case, for every *a[i]*, we can search from *a[i,.., n]* to find sub string with *k* unique characters. As a result, the time complexity will be **O(n^2)**.

__SPACE COMPLEXITY__

The space complexity of above program is O(n+26) = **O(n) **as we use an extra array *count* and *w* (to keep track of length of max sub string with *k* unique characters). However, we can remove *w* and simply keep track of max substring length with k unique characters seen till now. As a result, we will then need an extra space of O(26) or O(256) (if we consider that any of 256 ASCII characters may be present in the input string).

- All categories
- General Aptitude 1.9k
- Engineering Mathematics 7.5k
- Digital Logic 2.9k
- Programming and DS 4.9k
- Algorithms 4.4k
- Theory of Computation 6.2k
- Compiler Design 2.1k
- Databases 4.1k
- CO and Architecture 3.4k
- Computer Networks 4.2k
- Non GATE 1.4k
- Others 1.4k
- Admissions 595
- Exam Queries 573
- Tier 1 Placement Questions 23
- Job Queries 72
- Projects 18

50,737 questions

57,382 answers

198,530 comments

105,323 users