24,147 views

Consider a machine with a byte addressable main memory of $2^{16}$ bytes. Assume that a direct mapped data cache consisting of $32$ lines of $64$ bytes each is used in the system. A $50 \times 50$ two-dimensional array of bytes is stored in the main memory starting from memory location $1100H$. Assume that the data cache is initially empty. The complete array is accessed twice. Assume that the contents of the data cache do not change in between the two accesses.

How many data misses will occur in total?

1. $48$
2. $50$
3. $56$
4. $59$

If someone like me wondered why first block is in 4th block, this is why.

1100 H = (0001 0001 0000 0000)b excluding lower 6 bits for offset, we get 0001 0001 00 bits for cache block which is the 4th block.

Source: Arjun sir answer here: https://gateoverflow.in/43511/gate2007-81

Assume that the contents of the data cache do not change in between the two accesses

Can someone explain what does this part mean?

Bits used to represent the address = $\log_2{2^{16}} = 16$

Each cache line size $=64$ bytes; means offset requires $6\text{-bits}$

Total number of lines in cache $= 32;$ means line # requires $5\text{-bits}$

So, tag bits $= 16- 6-5=5$

We have a $2\text{D-array}$ each of its element is of size $=1\text{ Byte};$
Total size of this array $= 50 \times 50 \times 1\text{ Byte}=2500\text{ Bytes}$

So, total number of lines it will require to get contain in cache

$=\dfrac{2500B}{64B} = 39.0625 \approx 40$

Starting address of array $= 1100H = 00010 \ 00100 \ 000000$
The group of bits in middle represents Cache Line number $\implies$ array starts from cache line number $4$,
We require $40$ cache lines to hold all array elements, but we have only $32$ cache lines

Lets group/partition our $2500$ array elements in those $40$ array lines, we call this first array line as $A_0$ which will have $64$ of its elements.

This line(group of $64$ elements) of array will be mapped to cache line number $4$ as found by analysis of starting address of array above.

This all means that among those $40$ array lines some array lines will be mapped to same cache line, coz there are just $32$ cache lines but $40$ of array lines.

This is how mapping is:
$\begin{matrix} 0& A_{28} & \\ 1& A_{29} & \\ 2& A_{30} & \\ 3& A_{31} & \\ 4& A_{0} & A_{32} \\ 5& A_{1} & A_{33} \\ 6& A_{2} & A_{34} \\ 7& A_{3} & A_{35} \\ 8& A_{4} & A_{36} \\ 9& A_{5} & A_{37} \\ 10& A_{6} & A_{38} \\ 11& A_{7} & A_{39} \\ 12& A_{8} & \\ \vdots\\ 30& A_{26} & \\ 31& A_{27} & \end{matrix}$

So, if we access complete array twice we get $=32+8+8+8 = 56$ miss because only $8$ lines from cache line number $4$ to $11$ are miss operation, rest are Hits(not counted) or Compulsory misses(first 32).

Why A32 is mapped to 4?. A32 , 32 mod 32 = 0  should be mapped to 0th index. Please let me know on what basis this mapping is done here.

@Harish Hatmode It is being mapped to the $0$th index (or line no.$4$) only. This is because the starting address of the program is $1100H$ which shows that the starting line number is $4$ and not $0$.

Got it. Thanks.

$2^{16} = 64$ KB main memory is mapped to 32 lines of 64 bytes. So, number of offset bits = 6 (to identify a byte in a line) and number of indexing bits = 5 (to identify the line).

Size of array = 50* 50 = 2500 B. If array is stored in row-major order (first row, second-row..), and if elements are also accessed in row-major order (or stored and accessed in column-major order), for the first 64 accesses, there will be only 1 cache miss, and for 2500 accesses, there will be 2500/64 = 40 cache misses during the first iteration.

We have 5 index bits and 6 offset bits. So, for 211 (5 + 6 = 11) continuous memory addresses there wont be any cache conflicts as the least significant bits are offset bits followed by index bits.

So, number of array elements that can be accessed without cache conflict = 2048 (as element size is a byte). The next 452 elements conflict with the first 452 elements. This means 452/64 = 8 cache blocks are replaced. (We used ceil, as even if one element from a new block is accessed, the whole block must be fetched).

So, during second iteration we incur misses for the first 8 cache blocks as well as for the last 8 cache blocks. So, total data cache misses across 2 iterations = 40 + 8 + 8 = 56.

by

how this????the next 452 elements conflict with the first 452 elements. This means 452/64 = 8 cache blocks are replaced.

There are only 64 bytes in a set and we have 32 sets. So, for 32*64 = 2048 B of CONTIGUOUS memory accesses, we won't have a cache conflict as each of them will go to a distinct location in cache. After this the CYCLE repeats.  The next location after 2048, will go to the first location and removes that entry- as we are having direct mapped cache. Like wise for the next 452 elements.
@arjun sir last 8 cache misses for what..?
We do not care what is the starting address to solve this question.

Total 50x50 = 2500 elements.
One block can hold 64 elements.
2500/64 = 39.06  so  40 blocks are needed to hold them.

So for initial 40 blocks, there will be 40 misses.

When the second iteration starts, (9 to 40 of array are present) in 0 to 32 blocks.
Now for the first 8, there will be 8 misses.

It is direct mapped cache, so respective blocks with same reminder will be replaced
The first 8 blocks of array will replace the last 8 blocks( Since last 8 have same reminder as 1st 8 elements.  33 mod 32 =1, 34 mod 32 = 2 so on)

Again after 32 blocks, (33 to 40 will replace  1 to 8)  8 miss.

Total 40+8+8 = 56 misses.
by

Note that, it is a linked question. So to find the answer of 2nd one, you have to draw that figure given in best ans.
But just to answer this question, it is straight forward.
you missed something in the question starting address of the array is line-4....

### SIMPLIFIED VERSION OF THIS QUESTION:

There are 40 blocks in the Main Memory, that are to be "directly mapped" to a cache of 32 lines., twice. If the first block maps to the fourth line, calculate the cache misses.

For the first 32 blocks (block 0 - 31) — 32 misses.

For the next 8 blocks (block 32 - 39), we'll have to replace block 0 - 7 — 8 misses.

Now, in the second iteration, for the first 8 blocks (block 0 - 7), we'll have to replace block 32 - 39 — 8 misses.

Towards the end, we find out block 32 - 39 aren't present. So, we replace block 0 - 7 with them. — 8 misses.

So, total = 56 cache misses.

PS: All this replace, and counter-replace action is going on on blocks 0 - 7 (or 32 - 39), which are present at line no. 4 to 11. (b mod l)