GATE CSE 2006 | Question: 81

Question

GATE CSE 2006 | Question: 81

go_editor asked Apr 23, 2016 • edited Jun 26, 2018 by Shikha Mallick

10,478 views

A CPU has a $32$ $KB$ direct mapped cache with $128$ byte-block size. Suppose $A$ is two dimensional array of size $512 \times512$ with elements that occupy $8-bytes$ each. Consider the following two $C$ code segments, $P1$ and $P2$.

$P1$:

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[i] [j]; 
    } 
}

P2:

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[j] [i]; 
    } 
}

$P1$ and $P2$ are executed independently with the same initial state, namely, the array $A$ is not in the cache and $i$, $j$, $x$ are in registers. Let the number of cache misses experienced by $P1$ be $M1$ and that for $P2$ be $M2$.

The value of the ratio $\frac{M_{1}}{M_{2}}$:

$0$
$\frac{1}{16}$
$\frac{1}{8}$
$16$

go_editor asked Apr 23, 2016 • edited Jun 26, 2018 by Shikha Mallick

go_editor

10.5k views

See all

Nirmalya Pratap 1

commented May 30, 2021

We can think 512* 512 2d array like 512 1d array each of size 512 element

It is given that element size is 8 byte so each 1d array size is 512 * 8 byte so 4KB

cache size given 32kb and 128 byte block size so,

total line in cache is 2^15/2^7 is 2^8

so we find out that their are 2^8 line in cache so at a time max 2^8 cache misses can happen then cache will filled up

we already find out that one 1d array size is 4KB and we know cache size is 32kb so number of 1d array we can load into cache at a time is 32kb/4kb = 8.

so 8 1d array at a time then total how many round needed to finish up all the 1d array ?

so their are 512 1d array and we can load 8 at a time so total round is 512/8 = 64 round

lets count now cache miss for this 64 round

ok so how many misses in 1 round?

so in our case in 1st round we are feeling the cache total then in 2nd round we are overriding all the cache data. so for every round 2^8 misses so

64 round 64*2^8 = 2^14 misses so M1 = 16384

Lets move to column major access( i.e. M2) and data store in raw major order by default is C language

so for M2 we have to know when we store the 1d array how many element store per block and how many block it can take to store only one 1d array and their are 512 such 1d array keeping in mind.

Mistake i did when i was solving the question

i think like their are 2^14 misses for 64 round means for here we done with 2^14 cause what we load we access them all and we got to next

but due to column major order we unnecessarily load a whole row and access one element and load one more raw. so their are 512 element are their in a raw so we have to do 512*2^14 that is M2 but this is wrong

Their are 512 element in one 1d array it is correct but at a time all 512 not loaded because we load one block and 512 element we can’t fit into one block so how many element we can fit into one block?

element size is 8 byte block size is 128 byte so number of element is 2^7/2^3 = 2^4

so only 16 element we will load up for misses that is not 512* 2^14 that is 2^4*2^14

so M2 is 2^18

then M1/M2 = 2^14/2^18 = 1/2^4 = 1/16

so answer is option B

6 Answers

Best answer

$\text{Number of Cache Lines}= \dfrac{2^{15}B}{128B}= 256$

$\text{In 1 Cache Line} =\dfrac{128B}{8B} = 16\ elements$

$P_1=\dfrac{\text{total elements in array}}{\text{elements in a cache line}}$

$\quad=\dfrac{512 \times 512}{16}= 2^{14}= 16384.$

$P_2= 512 \times 512=2^{18}$

$\dfrac{P_1}{P_2}=\dfrac{16384}{512 \times 512}$

$\quad = 2^{14-18}= 2^{-4}=\dfrac{1}{16}$

It is so, because for $P_1$ for every line there is a miss, and once a miss is processed we get $16$ elements in memory.
So, another miss happens after $16$ elements.
For $P_2$ for every element there is a miss because storage is row major order(by default) and we are accessing column wise.

Hence, answer is option B.

amarVashishth answered Apr 29, 2016 • edited Dec 25, 2017 by pavan singh

amarVashishth

See all

Page:

it means whenever storage will be in row major order(by default) and we will try to access in column wise. then for every element there will be miss — learner_geek, Jul 26, 2017
https://gateoverflow.in/1854/gate-cse-2006-question-80?show=382106#c382106 — Abhrajyoti00, Sep 3, 2022
For p2 for every element there is a miss because storage is row major order(by default) and we are accessing column wise.
I think This line is not always true, consider the following variation

A[8][512], rest info same, column wise access

$for(j=0;j<512;j++)${
$for(i=0;i<8;i++)${
$ x+=A[i][j]$
}
}
then there will 8 be misses for every 128 access

Refer the attached pic — DEBANJAN DAS2k, Dec 14, 2023

Arjun · Answer 1 · 2016-04-29T17:53:41+0000

Code being C implies array layout is row-major.

http://en.wikipedia.org/wiki/Row-major_order

When A[0][0] is fetched, 128 consecutive bytes are moved to cache. So, for the next 128/8 -1= 15 memory references there won't be a cache miss. For the next iteration of i loop also the same thing happens as there is no temporal locality in the code. So, number of cache misses for P1

$= \frac{512}{16} \times 512$

$ = 32 \times 512 $

$=2^{14} = 16384$

In the case of P2, the memory references are not consecutive. After A[0][0], the next access is A[1][0] which is after 512 * 8 memory locations. Since our cache block can hold only 128 contiguous memory locations, A[1][0] won't be in cache after a[0][0] is accessed. Now, the next location after A[0][0] is A[0][1] which will be accessed only after 512 iterations of the inner loop- after 512 distinct memory block accesses. In our cache we have only space for 32 KB/128 B = 256 memory blocks. So, by the time A[0][1] is accessed, its cache block would be replaced. So, each of the memory access in P2 results in a cache miss. Total number of cache miss

$ = 512\times 512$

So, $\frac{M_1}{M_2} = \frac{32 \times 512}{512 \times 512} = \frac{1}{16}$

In our cache we have only space for 32 KB/128 B = 256 memory blocks. So, by the time A[0][1] is accessed, its cache block would be replaced.

The quote above is not entirely obvious to me.

Please elaborate about this; how can we be sure that 512 iterations of the inner loop will definitely remove the block containing A[0][0]. Even if there are 512 accesses to distinct bocks for every iteration of the outer loop, maybe not all blocks are replaced and instead some cache lines are replaced multiple times?

For example, it could be the case that in each of the outer loop, every odd cache line is replaced twice and the even lines are left untouched after the initial compulsory miss. This means that when time comes to visit A[0][1], it will in fact be in the first cache block. — Divy Kala, Apr 2, 2019
@Arjun sir, What would be the miss ratio in P2 if we were using Optimal Block replacement policy instead of LRU? — shashank023, Jan 6, 2021

Tauhin Gangwar · Answer 2 · 2016-04-24T02:38:30+0000

No. of elements/block = 128/8=16

one row contains 512 elements

No. of blocks/row= 512/16 = 32

No. of cache lines = 32KB/128B= 256

FOR M1

Now to access one row 32 miss operations will be occurred

to access whole array it requires = 512 * 32 (miss operations)

= 16384

FOR M2

Now to access one column 512 miss operations will occur(as it is column major)

to access whole array it requires = 512 * 512 (miss operations)

so M1/M2=1/16

ANSWER IS (B)

tags	tag:apple
author	user:martin
title	title:apple
content	content:apple
exclude	-tag:apple
force match	+apple
views	views:100
score	score:10
answers	answers:2
is accepted	isaccepted:true
is closed	isclosed:true

GATE CSE 2006 | Question: 81

Please log in or register to add a comment.

Please log in or register to answer this question.

6 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Check previous question..in that question i calculated cache lines and MM block..Hence here Option B is correct.

Please log in or register to add a comment.

Related questions

3 3 Comments reply

Please log in or register to add a comment.

Please log in or register to answer this question.

6 Answers

4 4 Comments reply

Please log in or register to add a comment.

10 10 Comments reply

Please log in or register to add a comment.

0 reply

Please log in or register to add a comment.

Check previous question..in that question i calculated cache lines and MM block..Hence here Option B is correct.

0 reply

Please log in or register to add a comment.

Related questions

3 3 Comments

4 4 Comments

10 10 Comments

0

0