search
Log In
35 votes
8.4k views

A CPU has a $32 KB$ direct mapped cache with $128$ byte-block size. Suppose A is two dimensional array of size $512 \times512$ with elements that occupy $8$-bytes each. Consider the following two C code segments, $P1$ and $P2$.

P1: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[i] [j]; 
    } 
}

P2: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[j] [i]; 
    } 
} 

$P1$ and $P2$ are executed independently with the same initial state, namely, the array $A$ is not in the cache and $i$, $j$, $x$ are in registers.  Let the number of cache misses experienced by $P1$ be $M_{1}$and that for $P2$ be $M_{2}$.

The value of $M_{1}$ is:

  1. $0$
  2. $2048$
  3. $16384$
  4. $262144$
in CO and Architecture
edited by
8.4k views
3
For every 16 array elements there is a miss...so for 512*512 elements it will be (512*512)/16 = 16384 misses!
1
how every 16 element is miss???
0
Because whenever there is a miss, the contents of entire block is fetched. Here each block contains 16 elements. Divide block size by size of each element.

8 Answers

48 votes
 
Best answer

Code being C implies array layout is row-major.

http://en.wikipedia.org/wiki/Row-major_order

When $A[0][0]$ is fetched, $128$ consecutive bytes are moved to cache. So, for the next $\dfrac{128}{8} -1=15$ memory references there won't be a cache miss.
For the next iteration of $i$ loop also the same thing happens as there is no temporal locality in the code. So, number of cache misses for $P1$ is

$= \dfrac{512}{16} \times 512$

$ = 32 \times 512 $

$=2^{14} = 16384$

Correct Answer: $C$


edited by
1
Sir in this question it is not mentioned whether array is arranged in row wise or column wise order in memory, so in such questions do we have to assume it to be row wise ?
1

Sir I think a typo in line "So, by the time A[1][0] is accessed, its cache block would be replaced"  it should be A[0][1].

6
Also Sir I think 256 available blocks in cache is not a limiting factor here, because

A[0][0] will map to cache 0

A[1][0] will map to cache 32

A[2][0] will map to cache 64

A[3][0] will map to cache 96

A[4][0] will map to cache 128

A[5][0] will map to cache 160

A[6][0] will map to cache 192

A[7][0] will map to cache 224

A[8][0] will again map to cache 0

so the cache is being replaced every 8 block accesses not 256 block accesses

so even if we have 512 cache blocks then too we will get the same answer.

Please clarify ...
1
Each cache line will hold 4 array elements so if A[0][0] is mapped to cache line 0, A[1][0] will map to cache line 128.
0
In the question if we would have used a different Cache organization....then also the no of cache miss would remain same...?
5

@Danish: Between two iterations in P2 there are 512 distinct memory block accesses and we have only 256 blocks in cache. So, AT LEAST 2 memory blocks must be mapped to a cache block and hence the second one would replace the first memory block. So, in this way for the second iteration every memory access will cause a cache miss.

This is just an overall analysis without considering the cache strategy (assuming best mapping possible). If we see the exact mapping by knowing 512 elements of 8 bytes each in the array we can get the exact cache size which will be needed so that no cache miss occurs. (512 is just a lower bound for it)

Row major or column major? 

The given code is said to be C and C uses row-major order.

http://en.wikipedia.org/wiki/Row-major_order

2
Thanks:)
0
The number of misses if P2 was executed (ie, column major order) would remain the same, right?
0

@ Sir
In this question for P1, what will be the no. of hits?
Is that 32 X 512 X 15 = 2,45,760?

43 votes

$\text{Number of Cache Lines} = \dfrac{2^{15}B}{128B} = 256$
$\text{In 1 Cache Line} = \dfrac{128B}{8B} = 16\ elements$

$P_1 = \dfrac{\text{total elements in array}}{\text{elements in a cache line}}$

$=\dfrac{512 \times 512}{16}$

$= 2^{14}$

$= 16384.$

It is so because for $P_1$ for every line there is a miss, and once a miss is processed
we get $16$ elements in memory. So another miss happens after $16$ elements.

Hence, answer is option C.


edited by
1
@amarVashishth

What will be the answer for P2 applyying the above same procedure ?
1
classs!!!! thank you!!!
1
Nice explanation !
2
what does the one cache line meaans
1
this should be best answer
29 votes

Right answer is option C. 


edited by
1
Very nice explanation πŸ‘Œ
1
best and easy!!!!
0
Best answer , thanks
0
How are the number of references 512Γ—512 in method 1?

Please help.
10 votes

If you aren't getting clear idea about this question..Then please watch it.


edited by
3 votes

Hence Option C is correct.

0
Ye sabse easy explanation tha!... thankyou
1 vote

First we talk about P1::

Cache size=32KB

Block Size=128Byte

Number of Block=256

Since array element are stored in row major fashion and number of element in each block=16(128(Block size)/8(array element size));

Suppose are are stored like below ways as::

A[0][0] A[0][1]..................A[0][15]

So whenever P1 will fetch A[0][0] it will be fetch all 15 so Miss ratio==1/16

Since array size=2^18

Since out of 16 reference only one miss happen ,then out of 2^18 number of misses will be=2^18*1/16=2^14=16384

M1=16384;

Now for P2::;

Number of Misses will be=2^18 bcz of column major order along with all 2^18 references will be Miss.

So M2=2^18

Therefor;

M1/M2=1/16

0 votes

 

so Option C 

Comment below if u get anything wrong about it

–3 votes
Number of cache lines = 64 64 =1 =210
Number of sets = 21022=28
H1 = 18/5 + 0.8 ns
= 3.6 + 0.8 ns = 4.4 ns
Direct mapped cache:
This is no need of multiplexer in direct mapped cache.
So H2 = 16/ 5 nsec
= 3.2 nsec
Difference = H1 – H2
= 4.4 – 3.2 = 1.2 nsec
Answer:

Related questions

28 votes
4 answers
1
5.8k views
Consider two cache organizations. First one is $32$ $kB$ $2$-way set associative with $32$ $byte$ block size, the second is of same size but direct mapped. The size of an address is $32$ $bits$ in both cases . A $2$-to-$1$ multiplexer has latency of $0.6 ns$ while a $k-$bit comparator has ... while that of direct mapped is $h_2$. The value of $h_2$ is: $2.4$ $ns$ $2.3$ $ns$ $1.8$ $ns$ $1.7$ $ns$
asked Apr 24, 2016 in CO and Architecture jothee 5.8k views
27 votes
5 answers
2
5.1k views
A CPU has a $32$ $KB$ direct mapped cache with $128$ byte-block size. Suppose $A$ is two dimensional array of size $512 \times512$ with elements that occupy $8-bytes$ each. Consider the following two $C$ code segments, $P1$ and $P2$. $P1$: for (i=0; i<512; i++) { for (j=0; j<512; j++) ... and that for $P2$ be $M2$. The value of the ratio $\frac{M_{1}}{M_{2}}$: $0$ $\frac{1}{16}$ $\frac{1}{8}$ $16$
asked Apr 23, 2016 in CO and Architecture jothee 5.1k views
47 votes
4 answers
3
14.7k views
Consider two cache organizations. First one is $32 \hspace{0.2cm} KB$ $2-way$ set associative with $32 \hspace{0.2cm} byte$ block size, the second is of same size but direct mapped. The size of an address is $32 \hspace{0.2cm} bits$ in both cases . A $2-to-1$ ... $h_1$ is: $2.4 \text{ ns} $ $2.3 \text{ ns}$ $1.8 \text{ ns}$ $1.7 \text{ ns}$
asked Sep 26, 2014 in CO and Architecture Rucha Shelke 14.7k views
30 votes
3 answers
4
7.9k views
A CPU has a cache with block size 64 bytes. The main memory has $k$ banks, each bank being $c$ bytes wide. Consecutive $c$ βˆ’ byte chunks are mapped on consecutive banks with wrap-around. All the $k$ banks can be accessed in parallel, but two accesses to the same bank ... and $k = 24$, the latency of retrieving a cache block starting at address zero from main memory is: 92 ns 104 ns 172 ns 184 ns
asked Sep 26, 2014 in CO and Architecture Rucha Shelke 7.9k views
...