Answer to first query :
The extra bits will go into tag bits because set index bits and block offset bits wont be affected as block size is same and cache size is also same hence number of sets will also be same as well .
Hence we will have 2 excess bits in the tag field due to specification mentioned in the question .
Answer to second query :
Here number of words that are in 1 block = Block size / Word size
= 64 B / 4 B = 16
Number of cache lines = Cache size / Block size
= 4 KB / 64 B
= 64
Number of sets hence = 64 / 4 = 16 sets
Also number of words which are possible to be accomodated in cache till it is full = 4 K / 4 = 1024
Now we have to know about the access sequence of words i.e. a word access results in hit or miss :
a) For word number 0 - 15 , block number = 0 ..
Thus set number = block number % number of sets = 0 mod 16 = 0
b) For word number 16 - 31 , block number = 1
Thus set number = 1
Thus continuing in the same manner till word no 1023 (at this point cache is full) , let us mention the block number contained by each set :
Set number 0 : 0 16 32 48
Set number 1 : 1 17 33 49
Set number 2 : 2 18 34 50
and so on till .....
Set number 15 : 15 31 47 63
Now when the word numbers 1024-1039 comes i.e. block number 64 comes , cache is full hence capacity miss occurs . Hence LRU replacement comes into picture . So block number 0 in set 0 is least recently used , hence it is replaced.Hence set 0 blocks appear now as : 64 16 32 48
Similarly set 1 will be updated when block number 65 is accessed as : 65 17 33 49
set 2 will be updated when block number 66 is accessed as : 66 18 34 50
set 3 will be updated when block number 66 is accessed as : 67 19 35 51
In this way all words (0 - 1087) are covered by covering 68 blocks .
Thus number of cache misses in 1st iteration = 64(compulsory) + 4(capacity)
= 68
Now in subsequent nine iterations where access of the words (0 - 1087) are done again , set number ( 4 - 15 ) will remain unaffected as the words are already in cache which belong to these sets and these sets wont face any block replacement as shown below . So from now on we focus on first 4 sets only where replacement occurs due to cache miss.
Now when the word numbers 0-15 comes i.e. block number 0 comes , cache is full hence capacity miss occurs . Hence LRU replacement comes into picture . So block number 16 in set 0 is least recently used , hence it is replaced.Hence set 0 blocks appear now as : 64 0 32 48
Similarly set 1 will be updated as : 65 1 33 49
set 2 will be updated as : 66 2 34 50
set 3 will be updated as : 67 3 35 51
Similarly when block number 16 comes , set 0 will be updated as : 64 0 16 48
block number 17 comes , set 1 will be updated as : 65 1 17 49
block number 16 comes , set 2 will be updated as : 66 2 18 50
set 3 will be updated as : 67 3 19 51
This way further misses will occur in these sets
Hence number of misses = 4 * 5 = 20
Likewise number of misses will be same for subsequent iterations .
Hence total number of misses in 10 iterations = 68(for first iteration) + 9*20 (for next 9 iterations)
= 68 + 180
= 248 misses
Thus speedup obtained = Timewithout cache / Timewith cache
= ( Number of blocks accessed * Number of iterations * 10 * Cache access time ) / ( Number of blocks * (Cache access time + Main memory access time + (Number of iterations - 1) * (Number of cache misses per iteration *(Cache access time + Main memory access time) + (68 - Number of cache misses) * Cache access time))
[ As main memory access time = 10 * cache memory access time given ]
= (68 * 10 * 10) / ((1 * 68 * 11) + 9 * (20 * 11 + 48))
= 2.15
Thus the performance improvement(speedup) = 2.15