retagged by
3,440 views
3 votes
3 votes

Problem: A computer system uses 32-bit memory addresses and it has a main memory consisting of 1G bytes. It has a 4K-byte cache organized in the block-set-associative manner, with 4 blocks per set and 64 bytes per block.
(a) Calculate the number of bits in each of the Tag, Set, and Word fields of the memory address.
(b) Assume that the cache is initially empty. Suppose that the processor fetches 1088 words of four bytes each from successive word locations starting at location 0. It then repeats this fetch sequence nine more times. If the cache is 10 times faster than the memory, estimate the improvement factor resulting from the use of the cache. Assume that the LRU algorithm is used for block replacement.

This is a solved example 8.3 from Carl Hamacher book.

My doubt is, while we have 32 bit for the memory address, we need only 30 bit because the main memory consists of only 1G bytes. So why should those extra 2 bits go to tag field?

And I am not able to understand the solution given for part(b). Can anyone provide a better explanation which gives visual clues on how the blocks are being assigned to different sets. And when they are being replaced. Please keep it simple, it's very confusing for me.

retagged by

1 Answer

Best answer
8 votes
8 votes

Answer to first query : 

The extra bits will go into tag bits because set index bits and block offset bits wont be affected as block size is same and cache size is also same hence number of sets will also be same as well . 

Hence we will have 2 excess bits in the tag field due to specification mentioned in the question .

Answer to second query : 

Here number of words that are in 1 block   =  Block size / Word size

                                                             =  64 B / 4 B     =   16

Number of cache lines                             =  Cache size / Block size

                                                              =  4 KB / 64 B

                                                              =  64

Number of sets hence                              =  64 / 4     =      16 sets

Also number of words which are possible to be accomodated in cache  till it is full  =  4 K /  4   = 1024

Now we have to know about the access sequence of words i.e. a word access results in hit or miss :

a) For word number 0 - 15 , block number = 0 ..

    Thus set number    =   block number %  number of sets  =   0 mod 16  =  0

b) For word number 16 - 31 , block number = 1

    Thus set number   =    1

Thus continuing in the same manner till word no 1023 (at this point cache is full) , let us mention the block number contained by each set : 

    Set number 0   :   0     16      32      48

    Set number 1   :   1     17      33      49

    Set number 2   :   2     18      34      50

    and so on till .....

    Set number 15  :  15    31      47      63

Now when the word numbers 1024-1039 comes i.e. block number 64 comes , cache is full hence capacity miss occurs . Hence LRU replacement comes into picture . So block number 0 in set 0 is least recently used , hence it is replaced.Hence set 0 blocks appear now  as     :     64     16     32    48

Similarly set 1 will be updated when block number 65 is accessed as :       65      17      33       49

             set  2 will be updated when block number 66 is accessed as :       66      18      34       50

              set 3 will be updated when block number 66 is accessed as :       67      19      35       51

In this way all words (0 - 1087) are covered by covering 68 blocks .

Thus number of cache misses in 1st iteration      =     64(compulsory)  +  4(capacity)

                                                                        =     68

Now in subsequent nine iterations where access of the words (0 - 1087) are done again , set number ( 4 - 15 ) will remain unaffected as the words are already in cache which belong to these sets and these sets wont face any block replacement as shown below . So from now on we focus on first 4 sets only where replacement occurs due to cache miss.

Now when the word numbers 0-15 comes i.e. block number 0 comes , cache is full hence capacity miss occurs . Hence LRU replacement comes into picture . So block number 16 in set 0 is least recently used , hence it is replaced.Hence set 0 blocks appear now  as     :     64     0     32    48

Similarly set 1 will be updated as   :    65     1     33     49

              set 2  will be updated as :     66     2     34     50

              set 3  will be updated as :     67     3     35     51

Similarly when block number 16 comes  , set 0 will be updated as   :   64      0     16     48

                      block number 17 comes  , set 1 will be updated as   :    65     1     17    49

                      block number 16 comes ,  set 2  will be updated as :     66     2     18     50

                                                             set 3  will be updated as :     67     3     19    51

This way further misses will occur in these sets 

Hence number of misses                            =           4 * 5          =    20

Likewise number of misses will be same for subsequent iterations .

Hence total number of misses in 10 iterations        =      68(for first iteration) + 9*20 (for next 9 iterations)

                                                                          =      68 + 180

                                                                          =      248 misses

Thus speedup obtained                                        =      Timewithout cache / Timewith cache

                                                                          =     ( Number of  blocks accessed *  Number of iterations * 10 * Cache                                                                                        access time ) /  ( Number of blocks * (Cache access time + Main memory                                                                                access time + (Number of iterations - 1) * (Number of cache misses per                                                                                   iteration *(Cache access time + Main memory access time) + (68 - Number                                                                               of cache misses) * Cache access time))

                                                                    [ As main memory access time = 10 * cache memory access time given ]

                                                                          =     (68 * 10 * 10)  /  ((1 * 68 * 11)  +   9 * (20 * 11 + 48))

                                                                          =     2.15

Thus the performance improvement(speedup)        =     2.15                                             

selected by

Related questions