Log In
47 votes

Consider two cache organizations. First one is $32 \hspace{0.2cm} KB$ $2-way$ set associative with $32  \hspace{0.2cm} byte$ block size, the second is of same size but direct mapped. The size of an address is $32 \hspace{0.2cm} bits$ in  both cases . A $2-to-1$ multiplexer has latency of $0.6  \hspace{0.2cm} ns$ while a $k-bit$ comparator has latency of  $\frac{k}{10} ns$. The hit latency of the set associative organization is $h_1$ while that of direct mapped is $h_2$.

The value of $h_1$ is: 

  1. $2.4  \text{ ns} $
  2. $2.3  \text{ ns}$ 
  3. $1.8  \text{ ns}$ 
  4. $1.7  \text{ ns}$ 
in CO and Architecture
edited by
question says 32 kb cache ? Is this typo ?
yes typo corrected now

For value of $h_{2}$ refer ->

For $k-way$ set associative, $k$ comparators are needed. Direct mapped cache is $1-way$ set associative. Hence, it also requires a comparator. But $MUX$ is needed for data/block selection from multiple blocks which is not needed in Direct mapped cache as there is only single block in direct cache  which whether to select or not be known from comparator only.
@Aghori In Direct mapping as soon as index and tag bits are matched then to decide which word should trasfer to processor? for this we have word offset (in this case its 5Bits)

so to decide which word would be transfer mux will be required or not??

@krishn.jh A word is not transferred but a whole block of words is transferred to the processor. So we don't need to decide which word to transfer rather which block to transfer. For direct mapping only 1 block matches so no MUX is used but for k way set associative mapping we have k blocks to choose from once the set is identified, so here MUX is needed.



see in comment mux is also required but in case of fetch.

for checking Hit/miss we do not consider mux delay.

in set associative also comparison is sufficient?????

4 Answers

66 votes
Best answer

Cache size is $32 \hspace{0.2cm} KB$ and cache block size is $32 \hspace{0.2cm} B$. So,

$\text{number of sets}  = \dfrac{\text{cache size}}{\text{no. of blocks in a set } \times \text{ block size}}$

$ = \dfrac{32 \hspace{0.2cm}KB}{2 \times 32 \hspace{0.2cm}B} = 512$

So, number of index bits needed $= 9$ ( since $2^9 = 512$). Number of offset bits $= 5$ (since $2^5 = 32 \hspace{0.2cm} B$ is the block size and assuming byte addressing). So, number of tag bits $= 32 - 9 - 5 = 18$ (as memory address is of $32 \hspace{0.2cm} bits$). 

So, $\text{ time for comparing the data}$ $ \text{= Time to compare the data + Time to select the block in set} \\= 0.6 + 18/10 \text{ ns} \\= 2.4 \text{ ns}.$

(Two comparisons of tag bits need to be done for each block in a set, but they can be carried out in parallel and the succeeding one multiplexed as the output).


Correct Answer: $A$

edited by
why we not add 0.6   in case  2 .

Second one is direct mapped cache. In an associated cache (here it is 2-way), there are two slots for a cache set entry. So, we need to compare the tag in both the entries and we select the one that matches, and for this a multiplexer is used. We don't need this multiplexer in the case of direct mapped cache. 

sir . one big question to my mind is. why we have not considered the delay of mux. we have seen that the delay for 2*1 is given and we can definitely derive the total number of 2*1 muxs used to implement the 2^10 *1 mux as stated in digital logic. we are taking the delay of or gate even when they have not mentioned it and neglating that we can use the 2* to make 2^10*1 are we moulding the question according to the answer. what i think if we can make it then they should be done .
question says 32 kb cache ? Is this typo ?
In direct mapping if we want 2 power 10 to 1 MUX then why not realize it using 2 to 1 MUX  using various levels and take 0.6 for every level that we got . Infact he has not mentioned the OR gate delay we can take it zero but why are we taking 0 for mux.
Sir pls help me with this doubt..

(Suppose there are 4 sets and each set has 2 lines...and Tag bits are suppose 16)

First a set will be selected...for that we will be needing a 1×4 Demux.....latency of Demux required....(1)


Each line of the selected set will be compared in 2 comparators required of size 16 bits.....latency of comparators required....(2)


Once the line is selected,

a 2×1 Mux will be used to selected that block to latency of MUX is required....(3)

So total latency = (1) + (2) + (3)

Now all m asking u is to pls put small light on latency of Demux (1)..we ignored it just bcoz nothing is mentioned about in question ?
what if the cache was 4-way set associative and there were only 2:1 MUX ? so how many 2:1 MUX's would be used? 2 or 3? (i.e.0.6 ns latency would be counted twice or thrice?)

irrespective the associativity of cache we  need always one OR(or 2:1 MUX) gate because we only have to decide if required block is present in cache or not.

Comparison is done in parallel and we need to select data for which we require 2*1 mux.

@Hemant dont you think this diagram is wrong. It does not uses block offset at all. Also notice the 2nd comparator (=) matching tag has only one input!!! With what it is matching tag?

@Arjun sir its really confusing here. Now that I have taken so much of effort, please please clarify.

I guess following steps are performed in 2-way set associative mapping

  1. Index set
    With 9 bits (this can be done with inputting 9 bits to DeMUX address input, DeMUX will in turn will make corresponding index bit output HIGH. But information of such indexing DeMUX or any other approach is not given, so assuming negligible)
  2. Match line tag inside set indexed in above step
    With 18 bits to select appropriate line with the set (as stated this can be done 18/10=2.4 ns)
  3. Offset word inside line chosen in above step
    With 5 bits. Again no information is done here about this, but can be done with MUX. Applying least significant 5 offset bits as MUX address lines and input will be data bits of different words within a line chosen. Output will be data bits of word at given offset. (Somewhat more involved since each word is multi-bit, but anyway we have to select one word present at given offset in set of many words. So many to one is functionality of MUX)


  1. No information about how to index set (Can done with DeMUX as explained above point 1)
  2. No information about how to offset word inside chosen line (can be done with MUX as shown in the diagram below)
  3. Not getting where to use that 2-to-1 MUX stated in question. I dont understand for what you have used it while adding its delay 0.6 to 1.8ns. You have sid "succeeding one multiplexed as the output". But I really dont get how we multiplex like that. I feel the reference circuit diagram is flawed:

very good explanation arjun sir
why we are not considering decoder latency to decode index?

For a  "K-way" set associative cache, we need K comparators of size "TAG" bits and $K:1$ Multiplexer to select data.

Nice slide:


@Ayush Upadhyaya in this the delay by the or gate has also been added 
why not so in this question?



K:1 Multiplexer to select data.

That means in one set having K blocks we select data from one of the K blocks of which the tag is matched using MUX but is it possible with K : 1 MUX

Here block size is 32 byte..does that means 32 bytes from each block in selected set given to MUX so total inputs  for MUX will be 32 * K...then how K:1 ??


Here separate MUX for each block in set 


@Arjun Sir,

We are asked about the “Hit Latency”, so we actually don’t care about the data selection part.

Questions is:

  • What exactly is $2:1$ multiplexer doing here? What exactly do we need to select? We already know the sets in which we are, now, for that set we just use comparators and then use $OR$ gate to know the Hit/Miss.

Now, if we didn’t use $2:1$ mux for that, and used it for actually implementing $OR$ gate using mux, then the answer you mentioned in this GATE answers is absolutely fine.

If we used it for that tag selection something, then what did we consider for $OR$ gate?

But then why in this answer (link), (question says to calculate “Hit Latency”), you mentioned this:

For direct mapped entry we assumed that each cache entry can hold a single cache block. But this changes in a set-associative cache and with a 2-way set, each cache entry can hold 2 cache blocks. i.e., after indexing to a cache, we have to compare the tag bits of 2 entries. But this comparison can be done in parallel and result be multiplexed. We also need an OR gate to determine the output of tag comparison for 2 set entries. The tag comparison result of the entries serve as select lines to the MUX. So, here hit time


In this we considered both, can you explain in a detailed manner about those $0.6 + 0.6$?

Thank you.

PS: Also, can you tell me how to tag user, when I typed “@Arjun" it didn’t become blue neither it allowed me to select some user after typing “@” which generally happens.

Sir you have written “Time to Select block in set” but according to the figure you have provided, it is selecting “the required word in the selected block using the block offset as the select lines”. Can you elaborate on how we are actually selecting the particular block?
@Arjun sir, In the figure that you have provided can you say what are the input lines and the select lines for the 2:1 MUX?
If in 2-way set ,we need 2 comparators .then why the time taken to compare data is taken once?

For set-associative mapping, we require k comparators if the mapping set is k-way. So, here we’ll require 2 comparators, right?

96 votes

gives a better insight.

1 flag:
✌ Edit necessary (Aalok8523 “@Amarvashishth sir, please correct one typo :- in place of [9 * 2^9] write [2^9 * 1]”)
the info is not given. is the info about or gate given ??. not given why not consider that zero. we know we can implement the 2^10*1 mux using 2*1 muxs . so why we have not derived the total number of muxs which will be required , and then calculated , and a mux of 1*2^9. i think u missed that . plz recheck and edit the image .

Why do we need 210:1 mux? Wh not a 10:210 decoder? Alo, u have written 9:2Mux. Should it not be a decoder?

The figure is not correct- See the one at end of the accepted answer.
in set associative why 2^9 * 9 MUX needed?? Why not 2^9 *1 MUX?? See the pic you have given??

as we are realizing the or gate using 2*1 mux in the same way we can realize the 17 (2^10*1)mux using 2*1 mux .in that way 17 parallel structures will be there each having 10 levels of 2*1 mux. so in such case  hit latency should be (10*0.6+17/10) i.e 7.7. sir please clarify i am having a doubt in this.
@Arjun sir, i think that implementation given in above answer is correct because in implementation we can't keep shift 2*1 multiplexer on basis of  "blocks of which set" we need at any point of time.
8 votes

In Direct Mapped Cache

Hit Latency = 210to 1 MUX Delay + n-bit tag comparator  Delay
                    =  0 + 17/10 =1.7ns 

  ( No information Regarding 210to 1 MUX delay . So let us assume it to be 0 )

only 1 comparator and 17  210-to-1 MUX are required but  MUX are operating in parallel

2-way Set Associative Cache

Hit Latency = 29to1 MUX Delay + Comparator Delay + OR gate delay

                   = 0 + 18/10 + 0.6 = 2.4 ns

18(= 2 comparators per set )  324 (= 36 29to1 MUX  per set ) 9 (= 1 OR gate per set ) are required.
comparators are operating parallelly and MUX are operating parallelly . OR gate can be realised usuing 2to1 MUX

edited by
For direct mapped, why 17 MUX? Because after a cache entry is selected, we have only one possible data source. We don't again need to multiplex here. We use a multiplexer when there are multiple active inputs and we want to avoid some of them.

arjun sir.....amarVashishth diagram  for 2-way associative is using two comparators////but if ...first mux is  placed..then after comparator is placed....then only we can check it is hit/miss...mux can't compare...comparator can compare..


@arjun Sir..
We have got 17bit tag and one mux is used to select and compare 1 bit from tag bits  . so we ned 17 MUX to select and compare 17 bit tag.

@arjun sir

In direct mapped we have 2^10 X 1 MUX which will select any one line(cache block) of the cache once line is selected then MUX gives output which is single bit of the corresponding TAG(which line is selected)

now we have total 17 bits in tag so we need 17 MUX ....

No of MUX in direct mapped = no. of bits in TAG = size of comparator

size of each MUX = N X 1  where N is the no. of lines in cache memory

no. of comaparator = 1
See the figure at end of my answer- all hand drawn figures for this question are wrong :(

If you have any query about that figure we can discuss..
@arjun sir..  I still don't get why my figure is not correct @cse23 has given explanation very nicely.  What I know is exactly same.

Please correct us if we are wrong.
First of all we have 10 bits for indexing. How this is used in 1024-1 MUX?
Size of the MUX=2^index bits to 1
that means in direct mapping 10 bits are used for indexing,therefore 2^10 to 1= 1024 to 1 mux is used.In question nothing is specified about 2^10 to 1 mux latency so we take this as zero
But arjun sir in direct mapped cache we need the MUX to select the line no as shown in hand made diagrams. That is all i know. Its okay that we will consider their delay as negligible but why are these pic wrong??

9 bits to represent 2^9 sets.

2*2^9(= 2 comparators per set )  36*2^9 (= 36 29to1 MUX  per set ) 2^9 (= 1 OR gate per set ) should be the answer @pC . please correct if wrong.


@Arjun  sir 

We need 

For direct-mapped cache :

                   1) Block-index decoder,

                   2) comparators

For fully associative cache

                   1) comparators

                   2) Block-selection MUX

For Set-associative cache,

                   1) set index decoder,

                   2) comparators,

                   3) Way-selection MUX.

Is this correct ??? Also we can use a decoder to find the correct word within a block using Block-offset bits ...right ??? 

only one 1 MUX for Direct mapping
why did we assume zero for direct mapping mu latency we can also for 2^10 - 1 mux using 2 - 1 mux and find the mux latency for direct mapping as well
2 votes
Tag bit= 18, set bit= 9, offset bit= 5

Firstly 1 multiplexer or set index decoder(size 2^9 x 1) will be used to reach the required set, then all tag bits of 2 lines in that set will be matched with the help of 2 comparators of size 18(tag bits). Don't bother yourself on how to fetch all tag bits from 2 lines (we could use multiplexer same as we fetch from Direct & Associative mapping) because that will become complex & advanced topic. Then lastly we required an OR gate or a multiplexer(mux. can work as an "OR" gate) to check whether there is a hit or miss.
So ignore mux. delay reach to set no. or to fetch tag bits in all mapping techniques.
Note- A comparator compares 1 bit at a time that's why T.comparator= K * comparator latency

Hit latency=Comparator Delay + OR gate(Mux.) delay

                 =1.8 + 0.6 = 2.4 ns

Related questions

28 votes
4 answers
Consider two cache organizations. First one is $32$ $kB$ $2$-way set associative with $32$ $byte$ block size, the second is of same size but direct mapped. The size of an address is $32$ $bits$ in both cases . A $2$-to-$1$ multiplexer has latency of $0.6 ns$ while a $k-$bit comparator has ... while that of direct mapped is $h_2$. The value of $h_2$ is: $2.4$ $ns$ $2.3$ $ns$ $1.8$ $ns$ $1.7$ $ns$
asked Apr 24, 2016 in CO and Architecture jothee 5.8k views
27 votes
5 answers
A CPU has a $32$ $KB$ direct mapped cache with $128$ byte-block size. Suppose $A$ is two dimensional array of size $512 \times512$ with elements that occupy $8-bytes$ each. Consider the following two $C$ code segments, $P1$ and $P2$. $P1$: for (i=0; i<512; i++) { for (j=0; j<512; j++) ... and that for $P2$ be $M2$. The value of the ratio $\frac{M_{1}}{M_{2}}$: $0$ $\frac{1}{16}$ $\frac{1}{8}$ $16$
asked Apr 23, 2016 in CO and Architecture jothee 5k views
35 votes
8 answers
A CPU has a $32 KB$ direct mapped cache with $128$ byte-block size. Suppose A is two dimensional array of size $512 \times512$ with elements that occupy $8$-bytes each. Consider the following two C code segments, $P1$ and $P2$. P1: for (i=0; i<512; i++) { for (j=0; j<512; j++ ... misses experienced by $P1$ be $M_{1}$and that for $P2$ be $M_{2}$. The value of $M_{1}$ is: $0$ $2048$ $16384$ $262144$
asked Sep 26, 2014 in CO and Architecture Rucha Shelke 8.3k views
30 votes
3 answers
A CPU has a cache with block size 64 bytes. The main memory has $k$ banks, each bank being $c$ bytes wide. Consecutive $c$ − byte chunks are mapped on consecutive banks with wrap-around. All the $k$ banks can be accessed in parallel, but two accesses to the same bank ... and $k = 24$, the latency of retrieving a cache block starting at address zero from main memory is: 92 ns 104 ns 172 ns 184 ns
asked Sep 26, 2014 in CO and Architecture Rucha Shelke 7.8k views