# GATE2008-71

8.7k views

Consider a machine with a $2$-way set associative data cache of size $64$ $Kbytes$ and block size $16$ $bytes$. The cache is managed using $32$ $bi$t virtual addresses and the page size is $4$ $Kbytes$. A program to be run on this machine begins as follows:

double ARR[1024][1024];
int i, j;
/*Initialize array ARR to 0.0 */
for(i = 0; i < 1024; i++)
for(j = 0; j < 1024; j++)
ARR[i][j] = 0.0;

The size of double is $8$ bytes. Array $ARR$ is located in memory starting at the beginning of virtual page $0xFF000$ and stored in row major order. The cache is initially empty and no pre-fetching is done. The only data memory references made by the program are those to array $ARR$.

The total size of the tags in the cache directory is:

1. $32$ $Kbits$
2. $34$ $Kbits$
3. $64$ $Kbits$
4. $68$ $Kbits$

edited
1
Sir why is the cache block size and page size different
8
:O Why should they be same?
0

so this is a virtually indexed virtually tagged cache @Arjun sir

1
question belongs to  "virtually tagged virtually indexed cache"
1
@Arjun Sir
if cache address is managed using physical address then block(main memory)size  and  line size must be same or it can also varies?

Number of sets $=\dfrac{\text{cache size}}{\text{size of a set}}$

$=\dfrac{64\ KB}{(16\ B\times 2)}$ (two blocks per set)

$=2\ K=2^{11}$

So, we need $11\text{-bits}$ for set indexing.

Number of WORD bits required $=4$ as a cache block consists of $16\text{ bytes}$ and we need $4\text{-bits}$ to address each of them.

So, number of tag bits $=32-11-4=17$

Total size of the tag$= 17\times \text{Number of cache blocks}$

$= 17\times 2^{11}\times 2$ (since each set has $2$ blocks)

$=68\text{ Kbits}$

Answer is option D) 68 Kbits

We use the top $17\text{-bits}$ for tag and the next $11\text{-bits}$ for indexing and next $4$ for offset. So, for two addresses to have the same cache index, their $11$ address bits after the $4$ offset bits from right must be same.

$ARR[0][0]$ is located at virtual address $\text{0x FF000 000. (FF000 is page address and 000 is page offset).}$
So, index bits are $00000000000$

Address of $ARR[0][4]=\text{0xFF000}+4\times \text{sizeof (double)}$
$=\text{0xFF000 000}+4\times 8=\text{0xFF000 020 (32 = 20 in hex)}$ (index bits differ)

Address of $ARR[4][0] =\text{0xFF000}+4\times 1024\times \text{sizeof (double)}$
[since we use row major storage]
$=\text{0xFF000 000}+4096\times 8=\text{0xFF000 000 + 0x8000 = 0xFF008 000}$
( index bits matches that of $ARR [0][0]$ as both read $\text{000 0000 0000}$ )

Address of $ARR[0][5] =\text{0xFF000} + 5\times \text{sizeof (double) = 0xFF000 000}$$+ 5\times 8 =\text{0xFF000 028 (40 = 28 in hex)}$ (index bits differ)

Address of $ARR[5][0] =\text{0xFF000} + 5\times 1024\times \text{ sizeof (double)}$ [since we use row major storage]
$=\text{0xFF000 000}+5120\times 8=\text{0xFF000 000 + 0xA000 = 0xFF00A 000}$ (index bits differ)

So, only $ARR[4][0]$ and $ARR[0][0]$ have the same cache index.

The inner loop is iterating from $0$ to $1023,$ so consecutive memory locations are accessed in sequence. Since cache block size is only $16\text{ bytes}$ and our element being double is of size $8\text{ bytes},$ during a memory access only the next element gets filled in the cache. i.e.; every alternative memory access is a cache miss giving a hit ratio of $50%.$ (If loops $i$ and $j$ are reversed, all accesses will be misses and hit ratio will become $0$ ).

edited
0

Sir, Why the cache block size is not first changed into bits i.e. 16x8 bits, so in that order, we would need 7 bits to address each of them?

1
Because computers use byte addressing meaning the smallest unit in which memory can be accessed is a byte. That is why in order to change some bits we need to perform bit wise operations.

Word addressing (where a multiple of bytes are addressed) is also there. But by default we can always assume byte addressing.
1
okay, i got it. :)
1
sir how the 32 bit virtual address can be used in finding the mapping in cache i.e the tags blocks etc........should we not first covert it into physical address ?????/
5

Cache can be virtually tagged or physically tagged.

"The cache is managed using 32 bit virtual addresses"

shows that it is virtually tagged for this question.

2
OK i got it....thank u ...bt here in the question they are telling that the virtual address is of 32 bit...then how can they refer to the address of virtual page as 0xFF000 which is 20 bits??
1

Thats a good question. Actually that have given just page address and we have to add the page offset (see the term "at the beginning of virtual page" in the question. It is also given page is of size 4KB meaning 12 address bits for page offset. I have now corrected this. Even though I took the address wrong earlier, answer remains the same :)

1
Ya now its ok...got it fully..thanks once again for ur help...
0
size of double = 8 bytes.

size of cache block = 16 bytes

#elements/block = 2

so should not we say that WORD offset is require to just address these 2 elements, once we reach the correct block.
Hence I think #bits for WORD offset = 1
Therefore #Tag bits = 20(no option)
Where am I wrong in my thinking
0
word offset is for addressing word/byte and is not data type specific.
0

Can we calculate Physical Address bit size with given information (assuming Physically tagged cache)  ?

I think Main memory detail (# frames OR # Blocks in MM OR Size of MM) is required for Physical Address bit size calculation.

0
Well explained sir!
0
i am having problem in tag bit calculation , page size is provided in the question as 4KB(12 bits) , so the calculation of tag bit should go like this

32-12(page size)-11(index bits)=9 bits for tag
0
Sir,how to find the miss rate in this set associatve cache problem
3
i have a big but silly doubt that how is this translation of addresses actually take place? This is what i know-

when virtual address is converted  to physical address the page number is converted to frame no and the offset remains unchanged. That is the physical address. But when cache comes into picture again this physical address is broken into TAG, Block no, Offset. Now this cache offset and that page offset of virtual memory can be different right?? Can they be different?? Please some one resolve this confusion.
5
@sushmita Why silly? It is really awkward not to have that doubt. But 99.9% of aspirants do not have doubts :)

And, they can be different or always different unless pagesize = cache block size. For problem solving always consider them separate.

And also, we do not always need physical address for cache access- it depends on wheter cache is virtually addressed or physically addressed. Also, the most common type of cache is virtually indexed and physically tagged - the name says it.
1
Thanks a lot arjun sir for explaianing this concept. I was so much into doubt  but now mind is relaxed. Thankss a lot.

Sir virtually indexed i understood as given in the question but what is physically tagged, it means conversion of virtual address to physical address right??
0

@Arjun SIR , What is the importance of this statement

32 bit virtual addresses and the page size is 4 Kbytes

4

@sushmita

Virtually indexed means we can index to a cache using virtual address (no need to wait for virtual to physical address conversion).

Physically tagged means we can compare the tag in a cache block only using physical address - comparing using virtual address is obviously faster but has its own problems and is more complex to handle.

So, virtually indexed and physically tagged cache is the most common as TLB look-up and cache indexing can happen simultaneously.

@gokou Just for fun :)

@sittian

ARR[0][0] is located at virtual address 0x FF000 000. (FF000 is page address and 000 is page offset). So, index bits are 00000000000

Why considering page offset here? That is relevant for taking a page from hard disk to memory. Here we are asked for cache only. So,

ARR[0][0] is located at virtual address 0x FF000 000. (FF00 0 (17 bits) is cache block tag, next 11 bits are index bits and 4 least bits are offset bits)

ARR[0][1] - also has same index bits and is located at virtual address 0x FF 000 008

ARR[0][2] - index bits becomes 1 as virtual address is 0x FF 000 010

and so on.

0
Thanx arjun sir. can u plz tell from where to study this indexing part?? Like VIPT, coloring, synonym problem etc??
0

Why these lines are mentioned in question?

" The cache is initially empty and no pre-fetching is done "

" The only data memory references made by the program are those to array ARR. "

What is meaning of these lines? And why these mentioned in question?

and " The size of double is 8 bytes. " it is mentioned because otherwise we take the size of int in calculation .rt?

3
This was a linked question and these data are needed for the remaining questions.

"Cache is initially empty" to make it clear that no data is already in cache (first access to a memory block is a compulsory miss).

"No-prefetching is done" is again to make it clear that first access is a compulsory miss. Pre-fetching is a technique used in architecture where data is taken from memory to cache in advance.

Size of double is given as 8 bytes as otherwise we cannot measure the usage of cache. And we are never supposed to take size of int for size of double.
0
thank u :)
0
Sir we are given that or we can acquire from given question that  there are only word of each of 8 bytes so why we need four bit for addressing a word as we know our data is of 8 bytes each so cannot we use 1 bit for addressing a word inside a block ?? @arjun sir
1

@Bikram sir, for the part B) please see my method and point out where I am doing wrong.

#set=211=2048

#elements/cache block=16/8=2

#elements/set=4

#array elements which can be in cache simultaneously without conflicting the cache block:

total set*#elements per set

2048*4=8192=8 rows

so after 8 rows cache will be full and second wrap up will start, so A[8][0] will conflict to cache index with  A[0][0].

But as this is not the answer,soemthing is wrong in this..

1
@reena_kandari , of course you are correct, a[8][0] will index into set 0. Even if you calculate virtual address of this element you will get it as 0xFF000 000 + 65536 and you will see that the index no or set no (that is 11 bits after the rightmost 4 bits) are same that is 0. Here bit with index 16 will be 1.
1
but it means between A[0][0] and A[8][0] there must not be any conflicting index right? but answer is A[4][0]...how is it possible?
0
I am not sure if I understood your question properly. At index 0 or set 0 we have 4 elements a[0][0] , a[0][1], a[4][0],a[4][1] after first round . What conflict are you taking about ?
1
see, element A[0][0] will be store at set 0 and  after that all elements will be stored  at set1,2,3..until cache is full and  A[7][1023] will be stored at set number 2047.between these elements  no two elements will have the same index number.then how the element A[4][0] will have the same index as A[0][0].(according to my method)

I hope it is clear now.
1

we use row major storage ( means we store in row major order ), see  Arjun's answer again ..

ARR[0][0] is located at virtual address 0x FF000 000. (FF000 is page address and 000 is page offset). So, index bits are 000 0000 0000

Address of ARR[4][0] = 0xFF000 + 4 * 1024 * sizeof (double) [since we use row major storage]

= 0xFF000 000 + 4096*8

= 0xFF000 000 + 0x8000

= 0x FF008 000 ( index bits matches that of ARR [0][0] as both read 000 0000 0000)

ARR[0][0] is located at virtual address 0x FF000 000. (FF000 (17 bits) is cache block tag, next 11 bits are index bits and 4 least bits are offset bits)

for two addresses  have the same cache index, means their 11 address bits after the 4 offset bits from right must be same.

0

yes sir, I re-read arjun sir' answer and it is correct.If we calculate address of each element and then check for those 11 bits then answer is A[4][0] but sir my main doubt is the procedure I commented above why it fails although I feels all things are correct but conclusion is not true

#set=211=2048

#elements/cache block=16/8=2

#elements/set=4

#array elements which can be in cache simultaneously without conflicting the cache index  :

total set * #elements per set

2048*4=8192=8 rows

so after 8 rows cache will be full and second wrap up will start, so A[8][0] will conflict to cache index with  A[0][0].

0
@reena_kandari

what exactly is yours doubt ??
1
1
thank you sir now I got it,  I was taking individual block number as index..thank you so much :)
0
Why are we using virtual memory size here? Tag number bits etc. depend on number of blocks in main memory which in turn depends on physical memory size and size of blocks in memory.  How can we subtract word bits etc. from virtual memory bits. As far as I know, virtual memory size has nothing to do with number of blocks in physical memory and thus won't have anything to do with tag bits in cache memory. Kindly correct me where I am wrong.
0
Why are we using virtual memory size here? Tag number bits etc. depend on number of blocks in main memory which in turn depends on physical memory size and size of blocks in memory.  How can we subtract word bits etc. from virtual memory bits. As far as I know, virtual memory size has nothing to do with number of blocks in physical memory and thus won't have anything to do with tag bits in cache memory. Kindly correct me where I am wrong.

As per me, we have 1024*1024 elements i.e. a total of 2^20 elements in physical memory. Each such element is of 8 bytes. Each element has its own address which means there are 2^20 addressable units in physical memory.

Now, block size is 16 bytes and element size is 8 bytes. Thus there are 2 elements per block and hence number of blocks is 2^20/2=2^19

2^19 blocks and 2^11 sets means each set can map to 2^19/2^11 different block and thus tag number bits are 8

Number of cache lines = 2^12 = 4K
Number of tag bits = 8 bits

Thus, tag size = 4 K * 8 = 32 Kbits
Where am I calculating it wrong? I find this approach better than assuming virtual memory size = physical memory size.

If you look at next question belonging to same problem, you can see that main memory address is given as 0xFF000 which is 20 bits. Hence physical memory is addressable with 20 bits not 32 bits.
0
@ Arjun

should we consider read acess too for each elements or only write acess . if read acess is also there then the ans should be different ?
0
Where in the code data is being read?
0
for(i = 0; i < 1024; i++)
for(j = 0; j < 1024; j++)
ARR[i][j] = 0.0; // line 1

Is line 1 is a  read access or write access or both?
Write access is there but I'm having doubt of read access bcz first we are fetching memory location a[i][j]. So should we consider it as a read acess sir ??
0
In question,it is given that cache is managed using 32 bit virtual address.So why you are taking physical address of 32 bit to evaluate tag bit size.

Tag bit = Physical address(MM) bits - set no. - Block offset.

0

can you pleas explain, how do we get this

So, index bits are 000 0000 0000

for the statement:

ARR[0][0] is located at virtual address 0x FF000 000. (FF000 is page address and 000 is page offset).

1

Index bits represents Set (in case of K-way Set Associative) and Line (in case of Direct Mapping)?

0
virtual memory address is nothing but main memory address. because virtual memory is basically dividing the main memory into pages .
0
@Reena ji,

I calculated the same as you and was confused to see the wrong conclusion. What I understood going through the comments is this method would be right for direct mapped cache. But since this is 2-way set associative, main memory block 2^11mod2^11=0 is colliding with set 0 instead of block 2^12. Please tell me if I got it correctly? So, we have to see which elements are there in 2^11th block which comes out to be row 4. Please reply if this is the reason for wrong calculation.

Watch this video,it will clear all your doubts.

1 vote

Number of lines of cache = $\frac{64\times 2^{10}\ Bytes }{16\ Bytes} = 4096$

It is 2 way set associative so the number of sets =4096/2 = 2048

Each block size is 16B

Tag Bit bits needed to represent No. of sets block offset
32-11-4 = 17 11 4

no. of tag bits = 17

tag directory size = no. of lines * tag bit size = 17 * 4096 = 69632 B i.e., nearly equal to (D) 68KB

As per me, we have 1024*1024 elements i.e. a total of 2^20 elements in physical memory. Each such element is of 8 bytes. Each element has its own address which means there are 2^20 addressable units in physical memory.

Now, block size is 16 bytes and element size is 8 bytes. Thus there are 2 elements per block and hence number of blocks is 2^20/2=2^19

2^19 blocks and 2^11 sets means each set can map to 2^19/2^11 different block and thus tag number bits are 8

Number of cache lines = 2^12 = 4K
Number of tag bits = 8 bits

Thus, tag size = 4 K * 8 = 32 Kbits
1
It is already mentioned in question cache is managed using virtual address.

Why are you solving using physical address.

## Related questions

1
2.9k views
Consider a machine with a $2$-way set associative data cache of size $64$ $Kbytes$ and block size $16$ $bytes$. The cache is managed using $32$ $bit$ virtual addresses and the page size is $4$ $Kbytes$. A program to be run on this machine begins as follows: double ARR[1024 ... made by the program are those to array $ARR$. The cache hit ratio for this initialization loop is: $0$% $25$% $50$% $75$%
Consider a machine with a $2$-way set associative data cache of size $64$ Kbytes and block size $16$ bytes. The cache is managed using $32$ bit virtual addresses and the page size is $4$ Kbytes. A program to be run on this machine begins as follows: double ARR[1024][1024]; int i, j; /*Initialize ... elements have the same cache index as $ARR[0][0]$? $ARR[0][4]$ $ARR[4][0]$ $ARR[0][5]$ $ARR[5][0]$
For inclusion to hold between two cache levels $L_1$ and $L_2$ in a multi-level cache hierarchy, which of the following are necessary? $L_1$ must be write-through cache $L_2$ must be a write-through cache The associativity of $L_2$ must be greater than that of $L_1$ The $L_2$ cache must be at least as large as the $L_1$ cache IV only I and IV only I, II and IV only I, II, III and IV
Delayed branching can help in the handling of control hazards The following code is to run on a pipelined processor with one branch delay slot: I1: ADD $R2 \leftarrow R7 + R8$ I2: Sub $R4 \leftarrow R5 &ndash; R6$ I3: ADD $R1 \leftarrow R2 + R3$ I4: STORE ... Which of the instructions I1, I2, I3 or I4 can legitimately occupy the delay slot without any program modification? I1 I2 I3 I4