edited by
14,015 views
44 votes
44 votes

A CPU has a cache with block size $64$ bytes. The main memory has $k$ banks, each bank being $c$ bytes wide. Consecutive $c$ − byte chunks are mapped on consecutive banks with wrap-around. All the $k$ banks can be accessed in parallel, but two accesses to the same bank must be serialized. A cache block access may involve multiple iterations of parallel bank accesses depending on the amount of data obtained by accessing all the $k$ banks in parallel. Each iteration requires decoding the bank numbers to be accessed in parallel and this takes $\frac{k}{2} ns$.The latency of one bank access is $80$ ns. If $c =  2$ and $k = 24$, the latency of retrieving a cache block starting at address zero from main memory is:

  1. $92$ ns
  2. $104$ ns
  3. $172$ ns
  4. $184$ ns
edited by

4 Answers

60 votes
60 votes

This question is based on the concept of MEMORY INTERLEAVING... which says that instead of accessing data from memory every time, it is better to divide memory in modules or banks and distribute consecutive data on each module to access the data in parallel..to improve data transfer rate. For this purpose the additional decoder is used to access each module in parallel, so we have to count the latency of decoder also along with each module latency.

now i am going to explain the solution:---->

according to the original question there are k banks and k=24 and each bank has c bytes where c=2 . So total we got 2*24=48 bytes in one iteration.

now we have to calculate one iteration latency:

decoding time for one iteration is k/2 ns: 24/2=12 ns

and each bank latency is 80 ns

normally when decoder latency is given then total iteration time is calculated as; K*(decoder latency) + bank latency

but here we have given the total decoding latency of iteration=12 ns

therefore for one iteration we require : 12+80= 92 ns

Now as we discussed above in one iteration we can get 48 bytes of data but question ask for cache block(64 bytes) transfer therefore we require 2 iterations....... that is 2*92=184 ns

56 votes
56 votes
Cache block $= 64$ bytes.

Each bank in memory is $2$ bytes and there are $24$ such banks. So, in $1$ iteration we can get $2\ast 24 = 48$ bytes and getting $64$ bytes requires $2$ iterations.

So, latency $= k/2 + 80 + k/2 + 80$ (since in each iteration we need to select the banks and the bank decoding time $(k/2)$ is independent of the number of banks we are going to access)

$= 12 + 80 + 12 + 80 = 184$ ns

But total memory capacity here is $2\ast 24 = 48$ bytes. What is the need for cache management and decoding mechanism?
edited by
6 votes
6 votes

This question is based on the concept of memory interleaving. The answer which follows is based on the concept given in the text “Computer Organization” by Hamacher et. al [5e] (p.330 section 5.6.1)

I have directly substituted the values for $k$ and $c$ in the diagram above. So we have $24$ memory banks in total and each bank is $2$ bytes wide. What is to be noted is that the $\color{goldenrod}{\text{memory address}}$ given to the memory subsystem is divided into two parts. The lower order bits of the address signify which bank the word is in and the higher order bits of the address determine the location of the word in the particular memory module.

Now given that we start from address zero of main memory, when this address is applied, to the memory sub-system it takes $12 (= \frac{k}{2}=\frac{24}{2})$ ns for this address to be decoded. Each module has two buffers : Address Buffer Register (ABR) and Data Buffer Register (DBR). (Why are they important? Shall be dealt with shortly...)

At the end of this $12$ ns decode time, I assume that the “Address in module” part of the memory address is latched in the ABR of individual memory banks. Once this is done, the question says, it takes $80$ ns for the data in each bank to appear at the output (or more precisely to get latched in the DBR). So we can in parallel access this data out of $24$ memory banks (provided we have hardware arrangement (bus) to transfer 48 bytes of data in a go!!)

Now as per the question we need to transfer $64$ bytes of data. But in one go, we can access $48 =(24\times 2)$ bytes of data (we access $24$ banks and each provides $2$ byte wide data, so a total of $48$ bytes). So in the first go, we can transfer $48$ bytes of data. Which leaves us with $64-48=16$ bytes more data to be transferred. But $48$ bytes is the minimum granularity of parallel transfer. So we shall perform another transfer of $48$ bytes only, but only $16$ bytes of which shall be needed by us. This being said let us look at the timing diagram:

In the timing diagram above I could overlap the address decoding for the second iteration while the data retrieval from bank is still in progress because we have assumed that the address is latched in ABR. So we device the second iteration in such a manner that the moment the first iteration latches the its data in DBR, we have the address for the second iteration latched in ABR.

If we had assumed that the address was not latched in a buffer then we would have to make sure that during the entire span of data retrieval from the memory banks, we should maintain the addresses to the module as constant. Any change in addresses during the retrieval where no buffers are used shall lead to erroneous data output… [In such case answer shall be ($2\times (12+80)=184$ ns).

But in the setup, which I have assumed, answer shall be $172$ ns. Option (C).

----

This approach came to my mind because, $172$ ns is given in options and also while solving questions about pipelining we assume queuing of the results of stages to possibly reduce cycles...

3 votes
3 votes

Explanation: Size of cache block=64 B No. of main memory banks K=24 Size of each bank C=2 bytes So time taken for  parallel access T=decoding time +latency time T=(K/2)+latency =12+80=92 ns But C=2 for accesses =2*92=184ns So (D) is correct option

Answer:

Related questions