edited by
16,575 views
47 votes
47 votes

A CPU has a $32 KB$ direct mapped cache with $128$ byte-block size. Suppose A is two dimensional array of size $512 \times512$ with elements that occupy $8$-bytes each. Consider the following two C code segments, $P1$ and $P2$.

P1: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[i] [j]; 
    } 
}

P2: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[j] [i]; 
    } 
} 

$P1$ and $P2$ are executed independently with the same initial state, namely, the array $A$ is not in the cache and $i$, $j$, $x$ are in registers.  Let the number of cache misses experienced by $P1$ be $M_{1}$and that for $P2$ be $M_{2}$.

The value of $M_{1}$ is:

  1. $0$
  2. $2048$
  3. $16384$
  4. $262144$
edited by

9 Answers

Best answer
57 votes
57 votes

Code being C implies array layout is row-major.

http://en.wikipedia.org/wiki/Row-major_order

When $A[0][0]$ is fetched, $128$ consecutive bytes are moved to cache. So, for the next $\dfrac{128}{8} -1=15$ memory references there won't be a cache miss.
For the next iteration of $i$ loop also the same thing happens as there is no temporal locality in the code. So, number of cache misses for $P1$ is

$= \dfrac{512}{16} \times 512$

$ = 32 \times 512 $

$=2^{14} = 16384$

Correct Answer: $C$

edited by
50 votes
50 votes

$\text{Number of Cache Lines} = \dfrac{2^{15}B}{128B} = 256$
$\text{In 1 Cache Line} = \dfrac{128B}{8B} = 16\ elements$

$P_1 = \dfrac{\text{total elements in array}}{\text{elements in a cache line}}$

$=\dfrac{512 \times 512}{16}$

$= 2^{14}$

$= 16384.$

It is so because for $P_1$ for every line there is a miss, and once a miss is processed
we get $16$ elements in memory. So another miss happens after $16$ elements.

Hence, answer is option C.

edited by
40 votes
40 votes

Right answer is option C. 

edited by
Answer:

Related questions