590 views
Given an input string of length $n$, find the maximum length of the substring containing maximum $k$ unique characters. For example, for "abbcdaadcd" and $k=2$ answer will be 4 for the substring "daad".
| 590 views

#include<stdio.h>

void kunique(char s[], int k)
{
int i,j,n,m,u=0;
int last=0,current=0,flag=0;
int maxlength=0;
for(i=0;i<n;i++)

{
for(j=0;j<i;j++)
{
if(s[i]==s[j]) //if in the time of 2nd for loop any element of 1st for loop matches
continue;   // then skipping that iteration and continue with next value of j
else{
if(u<(k-1)) //when both charecter is different or unique (u) and u less than maximum
if(j<=(i-1)) //window size then increasing u value and for whole j loop iteration
flag=1;//if no charecter matches with a[j], only then u will increase
else
{
last=(i-1); //when u>=(k-1) i.e. unique charecter more than window size , then last charecter in the window is set

if((current-last)> maxlength)//if current window size is previous window size then we update maxlength
maxlength=(current-last);

for(m=current;m<last;m++)
{
if(s[current]=s[m])//here we are updating current pointer i.e.
current=m++;//previously current pointer poining to first charecter of "abcabcdeef"
//now it is pointing to first charecter of "bcdeef"

u--;//number of unique charecter reduces by 1 , as window shifted here

}
}
}
if(flag==1)
u++;
printf("%s",s[current-last]);
}
int main()
{
char s[20]="abcabcdeef";
int k=3;
kunique(s[],k);
return 0;
}
by Veteran (119k points)
edited by
0
Good. But whats the continue doing in the code?

Also can you explain in words what the algorithm is doing and its complexity?
+1

Complexity will be depend on window size rt?

then it will depend on i and j loop So, O(n2)

+2
You remind a lot of me doing programming or even commenting :)

But it will be good to have a summary of the approach to make a naive user understand what's going. Complexity is correct - it is $O(n^2)$ which is bad for real cases- consider $n = 1,000,000$. If we consider ASCII we can have only $256$ unique characters. Anyway to make the algorithm better? - You can post as a different answer as this is a good working solution.

PS: This is a Google interview question, so answer should be perfect.
+1
when you write a summary of your approach in English - you should find it easier to optimize it.

O(kn), if k is constant

=O(n)

Algorithm working->

1. Move in the string from 0th index,. every time next character is encountered then increment index of A[] and add that character and increment the values of B[] (that is counting no. of repeated characters to same index value and distinct character to next index).

2.To find the max span for k unique character. After finding k distinct characters from starting, all values of B array are added to find length of the span and that value is compared with the maximum value returned by remaining array ( i.e. value returned by max_span(i+1,k,j)). Elements which we are considering in the same span have same span_no[] value so that we can add B[] array values for those indexes to find length of the span.

3. Give span_no[] array same value for the indexes which are in same span. now when we move to next span, add B[] values for the indexes which have same span_no[] values and move to next span. But here, add the last element of previous span in the next span ex - "aaaabbaccccc" for 2 unique characters, this string will break to 2 spans "aaaabba" and "accccc" so for next span we will assign span_no[] another (unique) value for all elements in that span.

Taking help of 3 arrays

A[] -> to store all unique characters in an array, array size is in worst case = number of characters in string ( i.e. if all consecutive elements are different if "abcdef" then array size will be 6)  if  "aaaabba" then  'a' before 'b' is different from 'a' after b so array will store  A[0]='a', A[1]='b', A[2]= 'a',

B[] -> number of times a distinct character present in the string , for "aaaabba" this will store -> B[0] =4 ,B[1] = 2,B[2] = 1

span_no[] ->   first span possible is "aaaabba",span_no[] will help to tell the elements which are present in current span, so span_no will store -> span_no[0]=0, span_no[1]=0, span_no[2] =0.

int len;            // number of unique characters

char A[n];

int B[n]; //<- initialize them to 0

int span_no[n], j=0;

int max_span(int i,int k, int j)    // i = current index  in string,  k = remaining distinct characters in this span, j= index position for contracted array
{
if(i==0)
{
A[j]=string[i];
B[j]++;
return max_span(i+1,k-1,j);
}
else if (i==n)
return 0;
else if(string[i] == string[i-1])
{
A[j]=string[i];      B[j]=+1;
return max_span(i+1,k,j);
}
else              //then search for the term which is equal to this character in current span in O(k)
{
++j, B[j]+=1; A[j]=string[i];
int temp_i =j-1,found=0; ++j;
while((temp_i!=0)&&(span_no[ j ]==span_no[temp_i]))            // if that span contains same character earlier also then count this, element in the same span otherwise it will be taken as distinct element in that case if distinct character length is full (i.e. k==0), then add this element in next span
{
if(A[ j ]==A[temp_i])
found=1;  --temp_i;
}
if(found==1 && k!=0)                                                        // if found then add to that span only
{
span_no[j]=span_no[j-1];
max_span(i+1,k,j);
}
else if(!found && k==0)                                            //if not found and k is reached 0 then add to next span
{
while(span_no[--temp_j]==span_no[j-1])    //count all characters in this span and move to next span
{
}
span_no[j-1]+=1;                                //include last element also in the next span
span_no[j]=span_no[j-1];
int u=max_span(i+1,len,j);
}
else                                            //if not found and distinct character count is remaining in that span then add to the current span only
{
span_no[j]=span_no[j-1];
max_span(i+1,k-1,j);
}
}
}
by Junior (501 points)
edited
+1
@Arjun sir... Done
+1
That's awesome. I got your idea of array splitting. Let me give my solution and then we can see in detail. Are you giving GATE 2017?
+1
Yes sir, I am preparing for Gate 2017.
0
okay. But I could not verify your solution - still looks complex :O You have verified the correctness?

Please see here for a similar but simpler one for detailed explanation.

So, here we are interested in a substring which means all the interested characters must be continuous. So, this makes a dynamic programming problem formulation easy. Consider the array index $i$ and let $S[i-1[$ be the maximum length with $k$ unique characters  $i-1$. Also let $B[1..k]$ and $C[1..k]$ denote two circular queue of length $k$ containing the last $k$ unique characters and their respective starting indices. Now,

$S[i] = \begin{cases} S[i-1] + 1 \text{if } B \text{ is not full or } S[i] \text{ in } B \\S[i-1] +1 - C[1] \text{ otherwise } \end{cases}$

We also have to do

1. For case 1: if(S[i] == S[i-1]) C[k] ++; where $k$ is the index of $S[i]$ in array $C$ and
2. For case 2: dequeue(C); dequeue(B); enqueue (B, S[i]), enqueue (C, 1)
by Veteran (431k points)
#include<stdio.h>
#define MAX_CHAR 26 /* Here, we have assumed string can only have characters from a-z, i.e. 26 characaters. */

int w[1000]; /* stores max substring length with atmost k unique characters at each element likely to be seen in future. */
int count[MAX_CHAR]; /* keeps count of several characters. */
void kUnique(char[], int, int); /* function that finds max length of substring with k unique characters. */

int main() {
char a[1000];
int i, n, k;
scanf("%d", &n); /* enter length of string*/
scanf("%s", a); /* enter string */
scanf("%d", &k); /* enter no. of unique characters to be taken under consideration */
kUnique(a, n, k);
return 0;
}

void kUnique(char a[], int n, int k) {
int i, j, cur, lastSeen, window, temp, msf;

for(i=0; i<n; i++) {
if(count[a[i]-'a']==0) {
window++;
}
count[a[i]-'a']++;
}
if(window<k) {
printf("Not enough unique characters");
return;
}
/* The code segment from line 44-53 will detect if no. of unique characters in the string is less than k. If yes, it will print ERROR msg and return. */

for(i=0; i<n; i++) {
for(j=0; j<MAX_CHAR; j++) { /* At beginning of every loop, we set every element of count to 0. */
count[j] = 0;
}
window = 0; /* At the beginning of each loop , wet window to 0 because we have not seen any substring yet. */
temp = k; /* Also, set temp to 0 because we have not seen any substring yet */
for(j=i; temp>=0 && j<n; j++) {
if(count[a[j]-'a']==0) {
temp--;
if(temp==-1) {
break;
} else {
count[a[j]-'a']++;
}
} else {
count[a[j]-'a']++;
}
window++;
}
w[i] = window;
}
/* Whenever, we see a new character not seen till now, the corresponding value of count will be zero. Then, we will simply decrement temp.
Now, if temp becomes -1, then it means that we have seen k unique characters, We then simply break out from the loop. Also, window is used to keep track of length of substring.
After coming out of the inner (jth) lop, we set w[i] = window where window is the length of the substring seen just now.*/

msf = w[0];
for(i=1; i<n; i++) {
if(w[i]>msf) {
msf = w[i];
}
}
/* At the end, we basically search entire w array to find and print w[i]  where w[i] is the max value and denotes the length max substring with k unique characters */
printf("%d", msf);

}

TIME COMPLEXITY

In worst case, for every a[i], we can search from a[i,.., n] to find sub string  with k unique characters. As a result, the time complexity will be O(n^2).

SPACE COMPLEXITY

The space complexity of above program is O(n+26) = O(n) as we use an extra array count and w (to keep track of length of max sub string with k unique characters). However, we can remove w and simply keep track of  max substring length with k unique characters seen till now. As a result, we will then need an extra space of O(26) or O(256) (if we consider that any of 256 ASCII characters may be present in the input string).

by Junior (839 points)
edited

+1 vote