This question is based on the concept of memory interleaving. The answer which follows is based on the concept given in the text “Computer Organization” by Hamacher et. al [5e] (p.330 section 5.6.1)
I have directly substituted the values for $k$ and $c$ in the diagram above. So we have $24$ memory banks in total and each bank is $2$ bytes wide. What is to be noted is that the $\color{goldenrod}{\text{memory address}}$ given to the memory subsystem is divided into two parts. The lower order bits of the address signify which bank the word is in and the higher order bits of the address determine the location of the word in the particular memory module.
Now given that we start from address zero of main memory, when this address is applied, to the memory sub-system it takes $12 (= \frac{k}{2}=\frac{24}{2})$ ns for this address to be decoded. Each module has two buffers : Address Buffer Register (ABR) and Data Buffer Register (DBR). (Why are they important? Shall be dealt with shortly...)
At the end of this $12$ ns decode time, I assume that the “Address in module” part of the memory address is latched in the ABR of individual memory banks. Once this is done, the question says, it takes $80$ ns for the data in each bank to appear at the output (or more precisely to get latched in the DBR). So we can in parallel access this data out of $24$ memory banks (provided we have hardware arrangement (bus) to transfer 48 bytes of data in a go!!)
Now as per the question we need to transfer $64$ bytes of data. But in one go, we can access $48 =(24\times 2)$ bytes of data (we access $24$ banks and each provides $2$ byte wide data, so a total of $48$ bytes). So in the first go, we can transfer $48$ bytes of data. Which leaves us with $64-48=16$ bytes more data to be transferred. But $48$ bytes is the minimum granularity of parallel transfer. So we shall perform another transfer of $48$ bytes only, but only $16$ bytes of which shall be needed by us. This being said let us look at the timing diagram:
In the timing diagram above I could overlap the address decoding for the second iteration while the data retrieval from bank is still in progress because we have assumed that the address is latched in ABR. So we device the second iteration in such a manner that the moment the first iteration latches the its data in DBR, we have the address for the second iteration latched in ABR.
If we had assumed that the address was not latched in a buffer then we would have to make sure that during the entire span of data retrieval from the memory banks, we should maintain the addresses to the module as constant. Any change in addresses during the retrieval where no buffers are used shall lead to erroneous data output… [In such case answer shall be ($2\times (12+80)=184$ ns).
But in the setup, which I have assumed, answer shall be $172$ ns. Option (C).
----
This approach came to my mind because, $172$ ns is given in options and also while solving questions about pipelining we assume queuing of the results of stages to possibly reduce cycles...