edited by
10,342 views
34 votes
34 votes

A CPU has a $32$ $KB$ direct mapped cache with $128$ byte-block size. Suppose $A$ is two dimensional array of size $512 \times512$ with elements that occupy $8-bytes$ each. Consider the following two $C$ code segments, $P1$ and $P2$.

$P1$: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[i] [j]; 
    } 
}

P2: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[j] [i]; 
    } 
}

$P1$ and $P2$ are executed independently with the same initial state, namely, the array $A$ is not in the cache and $i$, $j$, $x$ are in registers. Let the number of cache misses experienced by $P1$ be $M1$ and that for $P2$ be $M2$.

The value of the ratio $\frac{M_{1}}{M_{2}}$:

  1. $0$ 
  2. $\frac{1}{16}$
  3. $\frac{1}{8}$
  4. $16$ 
edited by

6 Answers

Best answer
50 votes
50 votes

$\text{Number of Cache Lines}= \dfrac{2^{15}B}{128B}= 256$

$\text{In 1 Cache Line} =\dfrac{128B}{8B} = 16\ elements$

$P_1=\dfrac{\text{total elements in array}}{\text{elements in a cache line}}$

$\quad=\dfrac{512 \times 512}{16}= 2^{14}= 16384.$

$P_2= 512 \times 512=2^{18}$

$\dfrac{P_1}{P_2}=\dfrac{16384}{512 \times 512}$

$\quad = 2^{14-18}= 2^{-4}=\dfrac{1}{16}$

It is so, because for $P_1$ for every line there is a miss, and once a miss is processed we get $16$ elements in memory.
So, another miss happens after $16$ elements.
For $P_2$ for every element there is a miss because storage is row major order(by default) and we are accessing column wise.

Hence, answer is option B.

edited by
68 votes
68 votes

Code being C implies array layout is row-major.

http://en.wikipedia.org/wiki/Row-major_order

When A[0][0] is fetched, 128 consecutive bytes are moved to cache. So, for the next 128/8 -1= 15 memory references there won't be a cache miss. For the next iteration of i loop also the same thing happens as there is no temporal locality in the code. So, number of cache misses for P1

$= \frac{512}{16} \times 512$

$ = 32 \times 512 $

$=2^{14} = 16384$

 

In the case of P2, the memory references are not consecutive. After A[0][0], the next access is A[1][0] which is after 512 * 8 memory locations. Since our cache block can hold only 128 contiguous memory locations, A[1][0] won't be in cache after a[0][0] is accessed. Now, the next location after A[0][0] is A[0][1] which will be accessed only after 512 iterations of the inner loop- after 512 distinct memory  block accesses. In our cache we have only space for 32 KB/128 B = 256 memory blocks. So, by the time A[0][1] is accessed, its cache block would be replaced. So, each of the memory access in P2 results in a cache miss. Total number of cache miss

$ = 512\times 512$

So, $\frac{M_1}{M_2} = \frac{32 \times 512}{512 \times 512} = \frac{1}{16}$

7 votes
7 votes

No. of elements/block = 128/8=16

one row contains 512 elements 

No. of blocks/row= 512/16 = 32

No. of cache lines = 32KB/128B= 256

 FOR M1

Now to access one row 32 miss operations will be occurred

to access whole array it requires = 512 * 32 (miss operations)

                                                    = 16384

FOR M2 

Now to access one column 512 miss operations will occur(as it is column major)

to access whole array it requires = 512 * 512 (miss operations)

so  M1/M2=1/16 

ANSWER IS (B)

Answer:

Related questions