3.6k views

A CPU has a $32 KB$ direct mapped cache with $128$ byte-block size. Suppose A is two dimensional array of size $512 \times512$ with elements that occupy $8$-bytes each. Consider the following two C code segments, $P1$ and $P2$.

P1:

for (i=0; i<512; i++)
{
for (j=0; j<512; j++)
{
x +=A[i] [j];
}
}

P2:

for (i=0; i<512; i++)
{
for (j=0; j<512; j++)
{
x +=A[j] [i];
}
} 

$P1$ and $P2$ are executed independently with the same initial state, namely, the array $A$ is not in the cache and $i$, $j$, $x$ are in registers.  Let the number of cache misses experienced by $P1$ be $M_{1}$and that for $P2$ be $M_{2}$.

The value of $M_{1}$ is:

1. $0$
2. $2048$
3. $16384$
4. $262144$
edited | 3.6k views

Code being C implies array layout is row-major.

http://en.wikipedia.org/wiki/Row-major_order

When $A[0][0]$ is fetched, $128$ consecutive bytes are moved to cache. So, for the next $\dfrac{128}{8} -1=15$ memory references there won't be a cache miss.
For the next iteration of $i$ loop also the same thing happens as there is no temporal locality in the code. So, number of cache misses for $P1$ is

$= \dfrac{512}{16} \times 512$

$= 32 \times 512$

$=2^{14} = 16384$

edited
0
Sir in this question it is not mentioned whether array is arranged in row wise or column wise order in memory, so in such questions do we have to assume it to be row wise ?
+1

Sir I think a typo in line "So, by the time A[1][0] is accessed, its cache block would be replaced"  it should be A[0][1].

+4
Also Sir I think 256 available blocks in cache is not a limiting factor here, because

A[0][0] will map to cache 0

A[1][0] will map to cache 32

A[2][0] will map to cache 64

A[3][0] will map to cache 96

A[4][0] will map to cache 128

A[5][0] will map to cache 160

A[6][0] will map to cache 192

A[7][0] will map to cache 224

A[8][0] will again map to cache 0

so the cache is being replaced every 8 block accesses not 256 block accesses

so even if we have 512 cache blocks then too we will get the same answer.

0
Each cache line will hold 4 array elements so if A[0][0] is mapped to cache line 0, A[1][0] will map to cache line 128.
0
In the question if we would have used a different Cache organization....then also the no of cache miss would remain same...?
+4

@Danish: Between two iterations in P2 there are 512 distinct memory block accesses and we have only 256 blocks in cache. So, AT LEAST 2 memory blocks must be mapped to a cache block and hence the second one would replace the first memory block. So, in this way for the second iteration every memory access will cause a cache miss.

This is just an overall analysis without considering the cache strategy (assuming best mapping possible). If we see the exact mapping by knowing 512 elements of 8 bytes each in the array we can get the exact cache size which will be needed so that no cache miss occurs. (512 is just a lower bound for it)

Row major or column major?

The given code is said to be C and C uses row-major order.

http://en.wikipedia.org/wiki/Row-major_order

+1
Thanks:)

$\text{Number of Cache Lines} = \dfrac{2^{15}B}{128B} = 256$
$\text{In 1 Cache Line} = \dfrac{128B}{8B} = 16\ elements$

$P_1 = \dfrac{\text{total elements in array}}{\text{elements in a cache line}}$

$=\dfrac{512 \times 512}{16}$

$= 2^{14}$

$= 16384.$

It is so because for $P_1$ for every line there is a miss, and once a miss is processed
we get $16$ elements in memory. So another miss happens after $16$ elements.

edited
0
@amarVashishth

What will be the answer for P2 applyying the above same procedure ?
0
classs!!!! thank you!!!

## Right answer is option C.

edited
0
Very nice explanation 👌
+1 vote

Cache size=32KB

Block Size=128Byte

Number of Block=256

Since array element are stored in row major fashion and number of element in each block=16(128(Block size)/8(array element size));

Suppose are are stored like below ways as::

A[0][0] A[0][1]..................A[0][15]

So whenever P1 will fetch A[0][0] it will be fetch all 15 so Miss ratio==1/16

Since array size=2^18

Since out of 16 reference only one miss happen ,then out of 2^18 number of misses will be=2^18*1/16=2^14=16384

M1=16384;

Now for P2::;

Number of Misses will be=2^18 bcz of column major order along with all 2^18 references will be Miss.

So M2=2^18

Therefor;

M1/M2=1/16

edited

so Option C

Comment below if u get anything wrong about it

Number of cache lines = 64 64 =1 =210
Number of sets = 21022=28
H1 = 18/5 + 0.8 ns
= 3.6 + 0.8 ns = 4.4 ns
Direct mapped cache:
This is no need of multiplexer in direct mapped cache.
So H2 = 16/ 5 nsec
= 3.2 nsec
Difference = H1 – H2
= 4.4 – 3.2 = 1.2 nsec

1
2