The Gateway to Computer Science Excellence
First time here? Checkout the FAQ!
+22 votes

A CPU has a $32 KB$ direct mapped cache with $128$ byte-block size. Suppose A is two dimensional array of size $512 \times512$ with elements that occupy $8$-bytes each. Consider the following two C code segments, $P1$ and $P2$.


for (i=0; i<512; i++)
    for (j=0; j<512; j++)
        x +=A[i] [j]; 


for (i=0; i<512; i++)
    for (j=0; j<512; j++)
        x +=A[j] [i]; 

$P1$ and $P2$ are executed independently with the same initial state, namely, the array $A$ is not in the cache and $i$, $j$, $x$ are in registers.  Let the number of cache misses experienced by $P1$ be $M_{1}$and that for $P2$ be $M_{2}$.

The value of $M_{1}$ is:

  1. $0$
  2. $2048$
  3. $16384$
  4. $262144$
asked in CO and Architecture by Active (3.3k points)
edited by | 4.1k views

7 Answers

+31 votes
Best answer

Code being C implies array layout is row-major.

When $A[0][0]$ is fetched, $128$ consecutive bytes are moved to cache. So, for the next $\dfrac{128}{8} -1=15$ memory references there won't be a cache miss.
For the next iteration of $i$ loop also the same thing happens as there is no temporal locality in the code. So, number of cache misses for $P1$ is

$= \dfrac{512}{16} \times 512$

$ = 32 \times 512 $

$=2^{14} = 16384$

Correct Answer: $C$

answered by Veteran (413k points)
edited by
Sir in this question it is not mentioned whether array is arranged in row wise or column wise order in memory, so in such questions do we have to assume it to be row wise ?

Sir I think a typo in line "So, by the time A[1][0] is accessed, its cache block would be replaced"  it should be A[0][1].

Also Sir I think 256 available blocks in cache is not a limiting factor here, because

A[0][0] will map to cache 0

A[1][0] will map to cache 32

A[2][0] will map to cache 64

A[3][0] will map to cache 96

A[4][0] will map to cache 128

A[5][0] will map to cache 160

A[6][0] will map to cache 192

A[7][0] will map to cache 224

A[8][0] will again map to cache 0

so the cache is being replaced every 8 block accesses not 256 block accesses

so even if we have 512 cache blocks then too we will get the same answer.

Please clarify ...
Each cache line will hold 4 array elements so if A[0][0] is mapped to cache line 0, A[1][0] will map to cache line 128.
In the question if we would have used a different Cache organization....then also the no of cache miss would remain same...?

@Danish: Between two iterations in P2 there are 512 distinct memory block accesses and we have only 256 blocks in cache. So, AT LEAST 2 memory blocks must be mapped to a cache block and hence the second one would replace the first memory block. So, in this way for the second iteration every memory access will cause a cache miss.

This is just an overall analysis without considering the cache strategy (assuming best mapping possible). If we see the exact mapping by knowing 512 elements of 8 bytes each in the array we can get the exact cache size which will be needed so that no cache miss occurs. (512 is just a lower bound for it)

Row major or column major? 

The given code is said to be C and C uses row-major order.

+27 votes

$\text{Number of Cache Lines} = \dfrac{2^{15}B}{128B} = 256$
$\text{In 1 Cache Line} = \dfrac{128B}{8B} = 16\ elements$

$P_1 = \dfrac{\text{total elements in array}}{\text{elements in a cache line}}$

$=\dfrac{512 \times 512}{16}$

$= 2^{14}$

$= 16384.$

It is so because for $P_1$ for every line there is a miss, and once a miss is processed
we get $16$ elements in memory. So another miss happens after $16$ elements.

Hence, answer is option C.

answered by Boss (30.5k points)
edited by

What will be the answer for P2 applyying the above same procedure ?
classs!!!! thank you!!!
+3 votes

Right answer is option C. 

answered by Active (1.7k points)
edited by
Very nice explanation 👌
+1 vote

First we talk about P1::

Cache size=32KB

Block Size=128Byte

Number of Block=256

Since array element are stored in row major fashion and number of element in each block=16(128(Block size)/8(array element size));

Suppose are are stored like below ways as::

A[0][0] A[0][1]..................A[0][15]

So whenever P1 will fetch A[0][0] it will be fetch all 15 so Miss ratio==1/16

Since array size=2^18

Since out of 16 reference only one miss happen ,then out of 2^18 number of misses will be=2^18*1/16=2^14=16384


Now for P2::;

Number of Misses will be=2^18 bcz of column major order along with all 2^18 references will be Miss.

So M2=2^18



answered by Loyal (9.7k points)
0 votes

If you aren't getting clear idea about this question..Then please watch it.

answered by (217 points)
edited by
0 votes


so Option C 

Comment below if u get anything wrong about it

answered by Boss (10.6k points)
–3 votes
Number of cache lines = 64 64 =1 =210
Number of sets = 21022=28
H1 = 18/5 + 0.8 ns
= 3.6 + 0.8 ns = 4.4 ns
Direct mapped cache:
This is no need of multiplexer in direct mapped cache.
So H2 = 16/ 5 nsec
= 3.2 nsec
Difference = H1 – H2
= 4.4 – 3.2 = 1.2 nsec
answered by Loyal (9.7k points)

Related questions

Quick search syntax
tags tag:apple
author user:martin
title title:apple
content content:apple
exclude -tag:apple
force match +apple
views views:100
score score:10
answers answers:2
is accepted isaccepted:true
is closed isclosed:true
49,807 questions
54,504 answers
74,885 users