recategorized by
404 views
4 votes
4 votes

Consider the C code fragment given below:

#define N 1000
double a[N][N], d=0;
....
for (j = 0; j < N; j++)
    for (i = 0; i < N; i++)
        d += a[i][j];

The elements of $a$ are $8$ bytes each. Assume that $a$ matrix is not in cache at the beginning of the loop given variable $d$ is in a register and starting addresses of $a$ is at cache line boundary. Suppose the compiler does an loop interchange optimization to interchange the $i$ and $j$ loops without any other code change in the loop as follows.

#define N 1000
int a[N][N], d=0;
....
for (i = 0; i < N; i++)
    for (j = 0; j < N; j++)
        d += a[i][j];

Determine the percentage reduction in cache misses assuming an 8 KB direct-mapped data cache with 64-byte blocks. (Up to one decimal place)

recategorized by

1 Answer

Best answer
7 votes
7 votes
Since, cache size is $8\;KB$ and block size is $64$ bytes, and being direct mapped, number of cache blocks $ = \dfrac{8\;KB}{64 \;B} = 128. $

In the original loop, every adjacent inner loop accesses are to separate cache lines in memory - $10000\times8 = 80K$ bytes apart causing $1000$ cache misses.

Now, we also have to see if any of the cache line accessed in the inner loop can cause a hit in the next iteration of the outer loop. This is not possible because the accesses being $80K$ bytes apart which is  $10$ times cache size (EXACT multiples of cache size will map to same cache line). Thus, in the original loop, number of cache misses $=1000\times 1000 = 1M.$

Now, lets see the loop interchanged version. Now, for the inner loop all memory accesses are to adjacent memory locations. Since a cache line can hold $64/8=8$ elements of $a,$ we'll have a cache miss only once in $8$ iterations. So, number of cache misses $=1000 \times \left\lceil \dfrac{1000}{8} \right\rceil = 125K.$

So, reduction in cache misses $ = 1M - 125K.$

Percentage reduction in cache misses $ = \dfrac{1M-125K}{1M} \times 100\%= 87.5\%.$
selected by
Answer:

Related questions

5 votes
5 votes
1 answer
2