The Gateway to Computer Science Excellence
+29 votes

A CPU has a $32 KB$ direct mapped cache with $128$ byte-block size. Suppose A is two dimensional array of size $512 \times512$ with elements that occupy $8$-bytes each. Consider the following two C code segments, $P1$ and $P2$.


for (i=0; i<512; i++)
    for (j=0; j<512; j++)
        x +=A[i] [j]; 


for (i=0; i<512; i++)
    for (j=0; j<512; j++)
        x +=A[j] [i]; 

$P1$ and $P2$ are executed independently with the same initial state, namely, the array $A$ is not in the cache and $i$, $j$, $x$ are in registers.  Let the number of cache misses experienced by $P1$ be $M_{1}$and that for $P2$ be $M_{2}$.

The value of $M_{1}$ is:

  1. $0$
  2. $2048$
  3. $16384$
  4. $262144$
in CO and Architecture by Active (3.3k points)
edited by | 5k views
For every 16 array elements there is a for 512*512 elements it will be (512*512)/16 = 16384 misses!

8 Answers

+37 votes
Best answer

Code being C implies array layout is row-major.

When $A[0][0]$ is fetched, $128$ consecutive bytes are moved to cache. So, for the next $\dfrac{128}{8} -1=15$ memory references there won't be a cache miss.
For the next iteration of $i$ loop also the same thing happens as there is no temporal locality in the code. So, number of cache misses for $P1$ is

$= \dfrac{512}{16} \times 512$

$ = 32 \times 512 $

$=2^{14} = 16384$

Correct Answer: $C$

by Veteran (431k points)
edited by
Sir in this question it is not mentioned whether array is arranged in row wise or column wise order in memory, so in such questions do we have to assume it to be row wise ?

Sir I think a typo in line "So, by the time A[1][0] is accessed, its cache block would be replaced"  it should be A[0][1].

Also Sir I think 256 available blocks in cache is not a limiting factor here, because

A[0][0] will map to cache 0

A[1][0] will map to cache 32

A[2][0] will map to cache 64

A[3][0] will map to cache 96

A[4][0] will map to cache 128

A[5][0] will map to cache 160

A[6][0] will map to cache 192

A[7][0] will map to cache 224

A[8][0] will again map to cache 0

so the cache is being replaced every 8 block accesses not 256 block accesses

so even if we have 512 cache blocks then too we will get the same answer.

Please clarify ...
Each cache line will hold 4 array elements so if A[0][0] is mapped to cache line 0, A[1][0] will map to cache line 128.
In the question if we would have used a different Cache organization....then also the no of cache miss would remain same...?

@Danish: Between two iterations in P2 there are 512 distinct memory block accesses and we have only 256 blocks in cache. So, AT LEAST 2 memory blocks must be mapped to a cache block and hence the second one would replace the first memory block. So, in this way for the second iteration every memory access will cause a cache miss.

This is just an overall analysis without considering the cache strategy (assuming best mapping possible). If we see the exact mapping by knowing 512 elements of 8 bytes each in the array we can get the exact cache size which will be needed so that no cache miss occurs. (512 is just a lower bound for it)

Row major or column major? 

The given code is said to be C and C uses row-major order.

The number of misses if P2 was executed (ie, column major order) would remain the same, right?
+35 votes

$\text{Number of Cache Lines} = \dfrac{2^{15}B}{128B} = 256$
$\text{In 1 Cache Line} = \dfrac{128B}{8B} = 16\ elements$

$P_1 = \dfrac{\text{total elements in array}}{\text{elements in a cache line}}$

$=\dfrac{512 \times 512}{16}$

$= 2^{14}$

$= 16384.$

It is so because for $P_1$ for every line there is a miss, and once a miss is processed
we get $16$ elements in memory. So another miss happens after $16$ elements.

Hence, answer is option C.

by Boss (30.8k points)
edited by

What will be the answer for P2 applyying the above same procedure ?
classs!!!! thank you!!!
Nice explanation !
what does the one cache line meaans
+13 votes

Right answer is option C. 

by Active (2k points)
edited by
Very nice explanation 👌
best and easy!!!!
+3 votes

If you aren't getting clear idea about this question..Then please watch it.

by (247 points)
edited by
+1 vote

First we talk about P1::

Cache size=32KB

Block Size=128Byte

Number of Block=256

Since array element are stored in row major fashion and number of element in each block=16(128(Block size)/8(array element size));

Suppose are are stored like below ways as::

A[0][0] A[0][1]..................A[0][15]

So whenever P1 will fetch A[0][0] it will be fetch all 15 so Miss ratio==1/16

Since array size=2^18

Since out of 16 reference only one miss happen ,then out of 2^18 number of misses will be=2^18*1/16=2^14=16384


Now for P2::;

Number of Misses will be=2^18 bcz of column major order along with all 2^18 references will be Miss.

So M2=2^18



by Boss (10.2k points)
0 votes


so Option C 

Comment below if u get anything wrong about it

by Boss (11.6k points)
0 votes

Hence Option C is correct.

by Junior (767 points)
–3 votes
Number of cache lines = 64 64 =1 =210
Number of sets = 21022=28
H1 = 18/5 + 0.8 ns
= 3.6 + 0.8 ns = 4.4 ns
Direct mapped cache:
This is no need of multiplexer in direct mapped cache.
So H2 = 16/ 5 nsec
= 3.2 nsec
Difference = H1 – H2
= 4.4 – 3.2 = 1.2 nsec
by Boss (10.2k points)

Related questions

Quick search syntax
tags tag:apple
author user:martin
title title:apple
content content:apple
exclude -tag:apple
force match +apple
views views:100
score score:10
answers answers:2
is accepted isaccepted:true
is closed isclosed:true
50,737 questions
57,312 answers
105,046 users