This book was created programmatically by GATE Overflow on Sep 21, 2017. If you feel any doubt regarding the answer, click on the question link and give a comment on the site. Studying all these questions might get you 60 marks in GATE but that might not be enough for an IIT. So, read standard books, solve exercise questions and use these questions for cementing the concepts and aim 85+. At least if you are not getting the solution to a given problem first refer standard book. If any error is found on any question it shall be updated at http://gateoverflow.in/corrections.
PDFs for the remaining subjects will be made available at http://classroom.gateoverflow.in and you can enroll in this course to get notification for the same. Enrollment is free and account details are of GATE Overflow with a new password which have been sent to all registered emails on GATE Overflow. New users will receive this email within a few minutes of confirming their email address.

You can now join our Facebook group for GATE CSE discussions.

You can visit http://playlists.gatecse.in for high quality videos for GATE CSE and how to use GO site/ebook.

This book consists of only previous year GATE, TIFR, ISI and CMI questions (CS from 1987 and all 5 years of IT) all of which are relevant for GATE. Out of syllabus subjects as of GATE 2017 are removed from this book except in rare cases.

Since GATE Overflow started in August 2014, a lot of people have dedicated their time and effort in bringing this book now. Initiated by Omesh Pandita and Arjun Suresh as a Q/A platform for CSE students, Kathleen Bankson was instrumental in getting all previous year GATE questions here. Then experts like Praven Saini, Happy Mittal, Sankaranarayanan P.N., Suraj Kumar etc. have contributed a lot to the answers here. Pragy Agarwal even after topping GATE has continuously contributed here with his knowledge as well as in making the contents beautiful with fine latex skills. We also have to thank the work by Jothee, Misbah, Ishrat and Nataliyah who are continuously adding and keeping the contents here neat and clean. There are also many toppers of GATE 2015, 2016, 2017 and probably 2018 who are contributing a lot here. The list of all the contributors can be found here but even that does not include the contributions of some like Arif Ali Anapparakkal in helping design this book, Arvind Devaraj and others who have provided guidance and help etc. Last but not the least, we thank all the users of GATE Overflow.

We thank the contributions of Silpa V.S., Rahul Kumar Yadav and others for getting the GATECSE Lastrank page maintained. Bikram Ballav is behind most of the exams on GO (http://mockgate.com) and Arindam Sarkar made the interface for it. Pragy Agarwal is also behind the rank and score predictor tool, (http://mymarks.gatecse.in) used by GO which has 99-100% accuracy over the last 2 years.

Special thanks to Sachin Mittal for making the How to Use GO vidoes, Silpa V.S. for classifying the questions topicwise for the book, Pooja Palod for making the GATE 2018 schedule and Debashish Deka for GO classroom contributions.

Also thanks to all toppers who took time to write a review for GO.

Table of Contents

  1. CO & Architecture
    (175)
    1. (17)
    2. (53)
    3. (1)
    4. (2)
    5. (1)
    6. (1)
    7. (2)
    8. (4)
    9. (2)
    10. (1)
    11. (4)
    12. (4)
    13. (1)
    14. (6)
    15. (4)
    16. (18)
    17. (1)
    18. (11)
    19. (1)
    20. (34)
    21. (2)
    22. (2)
    23. (3)
  2. Computer Networks
    (189)
    1. (8)
    2. (1)
    3. (2)
    4. (9)
    5. (4)
    6. (3)
    7. (1)
    8. (5)
    9. (6)
    10. (1)
    11. (1)
    12. (6)
    13. (4)
    14. (1)
    15. (1)
    16. (1)
    17. (1)
    18. (7)
    19. (8)
    20. (5)
    21. (1)
    22. (1)
    23. (4)
    24. (2)
    25. (1)
    26. (1)
    27. (6)
    28. (3)
    29. (5)
    30. (14)
    31. (4)
    32. (1)
    33. (1)
    34. (8)
    35. (1)
    36. (1)
    37. (3)
    38. (15)
    39. (4)
    40. (4)
    41. (14)
    42. (12)
    43. (2)
    44. (1)
    45. (4)
    46. (1)
  3. Databases
    (203)
    1. (22)
    2. (5)
    3. (1)
    4. (1)
    5. (1)
    6. (27)
    7. (7)
    8. (13)
    9. (9)
    10. (3)
    11. (1)
    12. (2)
    13. (3)
    14. (16)
    15. (25)
    16. (1)
    17. (37)
    18. (1)
    19. (1)
    20. (24)
    21. (1)
  4. Digital Logic
    (238)
    1. (7)
    2. (2)
    3. (4)
    4. (20)
    5. (3)
    6. (1)
    7. (4)
    8. (7)
    9. (2)
    10. (34)
    11. (1)
    12. (1)
    13. (3)
    14. (7)
    15. (1)
    16. (1)
    17. (8)
    18. (6)
    19. (4)
    20. (2)
    21. (2)
    22. (1)
    23. (3)
    24. (16)
    25. (5)
    26. (1)
    27. (4)
    28. (1)
    29. (11)
    30. (1)
    31. (10)
    32. (48)
    33. (1)
    34. (2)
    35. (1)
    36. (2)
    37. (4)
    38. (1)
    39. (1)
    40. (1)
    41. (1)
    42. (1)
    43. (1)
    44. (1)
  5. Operating System
    (280)
    1. (1)
    2. (3)
    3. (3)
    4. (1)
    5. (1)
    6. (13)
    7. (29)
    8. (1)
    9. (1)
    10. (1)
    11. (4)
    12. (4)
    13. (1)
    14. (6)
    15. (6)
    16. (1)
    17. (1)
    18. (6)
    19. (1)
    20. (1)
    21. (25)
    22. (2)
    23. (4)
    24. (36)
    25. (46)
    26. (26)
    27. (2)
    28. (7)
    29. (1)
    30. (7)
    31. (1)
    32. (37)
    33. (1)

The most relevant addressing mode to write position-independent codes is:

  1. Direct mode
  2. Indirect mode
  3. Relative mode
  4. Indexed mode

Match the pairs in the following questions:

(A) Base addressing (p) Reentranecy
(B) Indexed addressing (q) Accumulator
(C) Stack addressing (r) Array
(D) Implied addressing (s) Position independent

 

The instruction format of a CPU is:

OP CODE MODE RegR

_____one memory word___

 

Mode and RegR together specify the operand. RegR specifies a CPU register and Mode specifies an addressing mode. In particular, Mode = 2 specifies that ‘the register RegR contains the address of the operand, after fetching the operand, the contents of R RegR are incremented by 1'.

An instruction at memory location 2000 specifies Mode = 2 and the RegR refers to program counter (PC).

  1. What is the address of the operand?

  2. Assuming that is a non-jump instruction, what are the contents of PC after the execution of this instruction?

Relative mode of addressing is most relevant to writing

  1. Co-routines

  2. Position – independent code

  3. Shareable code

  4. Interrupt Handlers

 

Which of the following addressing modes permits relocation without any change whatsoever in the code?

  1. Indirect addressing

  2. Indexed addressing

  3. Base register addressing

  4. PC relative addressing

 

A certain processor supports only the immediate and the direct addressing modes. Which of the following programming language features cannot be implemented on this processor?

  1. Pointers
  2. Arrays
  3. Records
  4. Recursive procedures with local variable

 

The most appropriate matching for the following pairs

     X: Indirect addressing          1: Loops 
     Y: Immediate addressing         2: Pointers 
     Z: Auto decrement addressing    3: Constants 

is

  1. X - 3  Y - 2  Z - 1
  2. X - 1  Y - 3  Z - 2
  3. X - 2  Y - 3  Z - 1
  4. X - 3  Y - 1  Z - 2

Which is the most appropriate match for the items in the first column with the items in the second column

X. Indirect Addressing  I. Array implementation
Y. Indexed addressing II. Writing relocatable code
Z. Base Register Addressing  III. Passing array as parameter
  1. (X, III) (Y, I) (Z, II)
  2. (X, II) (Y, III) (Z, I)
  3. (X, III) (Y, II) (Z, I)
  4. (X, I) (Y, III) (Z, II)

In the absolute addressing mode

  1. the operand is inside the instruction
  2. the address of the operand in inside the instruction
  3. the register containing the address of the operand is specified inside the instruction
  4. the location of the operand is implicit

Which of the following addressing modes are suitable for program relocation at run time?

  1. Absolute addressing

  2. Based addressing

  3. Relative addressing

  4. Indirect addressing

  1. I and IV
  2. I and II
  3. II and III
  4. I, II and IV

Consider a three word machine instruction

ADD A[R0], @B

The first operand (destination) “A[R0]” uses indexed addressing mode with R0 as the index register. The second operand (source) “@B” uses indirect addressing mode. A and B are memory addresses residing at the second and third words, respectively. The first word of the instruction specifies the opcode, the index register designation and the source and destination addressing modes. During execution of ADD instruction, the two operands are added and stored in the destination (first operand).

The number of memory cycles needed during the execution cycle of the instruction is:

  1. 3
  2. 4
  3. 5
  4. 6

Match each of the high level language statements given on the left hand side with the most natural addressing mode from those listed on the right hand side.

(1) A[I] = B[J] (a) Indirect addressing
(2) while (*A++); (b) Indexed addressing
(3) int temp = *x (c) Auto increment
  1. (1, c), (2, b) (3, a)
  2. (1, c), (2, c) (3, b)
  3. (1, b), (2, c) (3, a)
  4. (1, a), (2, b) (3, c)

Which of the following statements about relative addressing mode is FALSE?

  1. It enables reduced instruction size
  2. It allows indexing of array element with same instruction
  3. It enables easy relocation of data
  4. It enables faster address calculation than absolute addressing

The memory locations 1000, 1001 and 1020 have data values 18, 1 and 16 respectively before the following program is executed.

 MOVI  $R_s, 1$  ; Move immediate
 LOAD  $R_d, 1000(R_s)$  ; Load from memory
 ADDI $ R_d, 1000$  ; Add immediate
 STOREI  $0(R_d), 20$  ; Store immediate

Which of the statements below is TRUE after the program is executed ?

  1. Memory location 1000 has value 20
  2. Memory location 1020 has value 20
  3. Memory location 1021 has value 20
  4. Memory location 1001 has value 20

Which of the following is/are true of the auto-increment addressing mode?

  1. It is useful in creating self-relocating code

  2. If it is included in an Instruction Set Architecture, then an additional ALU is required for effective address calculation

  3. The amount of increment depends on the size of the data item accessed

  1. I only
  2. II only
  3. III only
  4. II and III only

Consider a hypothetical processor with an instruction of type $\text{LW  R1 , 20(R2)}$, which during execution reads a 32-bit word from memory and stores it in a 32-bit register $\text{R1}$. The effective address of the memory location is obtained by the addition of a constant 20 and the contents of register $\text{R2}$. Which of the following best reflects the addressing mode implemented by this instruction for the operand in memory?

(A) Immediate addressing

(B) Register addressing

(C) Register Indirect Scaled Addressing

(D) Base Indexed Addressing

Consider the C struct defined below:

struct data {
    int marks [100];
    char grade;
    int cnumber;
};
struct data student;

The base address of student is available in register R1. The field student.grade can be accessed efficiently using

(A) Post-increment addressing mode, (R1)+

(B) Pre-decrement addressing mode, -(R1)

(C) Register direct addressing mode, R1

(D) Index addressing mode, X(R1), where X is an offset represented in 2's complement 16-bit representation.

Answers: Addressing Modes

C) Relative Mode since we can just change the content of base register if we wish to relocate.
4 votes -- srestha ( 58.4k points)
Selected Answer
(A) Base addressing  Position independent (By changing value in Base register location of address can be changed)
(B) Indexed addressing Array
(C) Stack addressing  Reentranecy (Whenever code happens to be used again, address need not be the same)
(D) Implied addressing  Accumulator (If an address is not specified, it is assumed/implied to be the Accumulator)
6 votes -- Prashant Singh ( 49.2k points)
Selected Answer

a) Address of the operand = content of PC = 2001 as PC holds the address of the next instruction to be executed and instruction size is 1 word as given in the diagram.

b) After execution of the current instruction PC will be automatically incremented by 1 when the next instruction is fetched. Also one extra increment will be done by operand fetch. So, PC = 2003 supposing next instruction is fetched. If we assume next instruction fetch is not done (this should be the default here), it should be 2002.

5 votes -- Arjun Suresh ( 294k points)
Selected Answer
Answer: B

Relative mode addressing is most relevant to writing a position-independent code.

Ref: http://en.wikipedia.org/wiki/Addressing_mode#PC-relative
13 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer
PC relative addressing is the best option. For Base register addressing, we have to change the address in the base register while in PC relative there is absolutely no change in code needed.
7 votes -- Arjun Suresh ( 294k points)
Selected Answer
Pointer access requires indirect addressing which can be simulated with indexed addressing or register indirect addressing but not with direct and immediate addressing. An array and record access needs a pointer access. So, options A, B and C cannot be implemented on such a processor.

Now, to handle recursive procedures we need to use stack. A local variable inside the stack will be accessed as *(SP+offset) which is nothing but a pointer access and requires indirect addressing. Usually this is done by moving the SP value to Base register and then using Base Relative addressing to avoid unnecessary memory accesses for indirect addressing- but not possible with just direct and immediate addressing.

So, options A, B, C and D are correct.
21 votes -- Arjun Suresh ( 294k points)
Selected Answer
C is the most appropriate one
9 votes -- Bhagirathi Nayak ( 13.3k points)
Selected Answer
(A) is the answer.

Array implementation can use Indexed addressing

While passing array as parameter we can make use of a pointer (as in C) and hence can use Indirect addressing

Base Register addressing can be used to write relocatable code by changing the content of Base Register.
18 votes -- Arjun Suresh ( 294k points)
Selected Answer

(b) is the answer. Absolute addressing mode means address of operand is given in the instruction.

(a) operand is inside the instruction -> immediate addressing
(c) register containing the address in specified in operand-> register Indirect addressing
(d) the location of operand is implicit-> implicit addressing

25 votes -- gatecse ( 13.4k points)
Selected Answer
Answer: C

A displacement type addressing should be preferred. So, I is not the answer.

Indirect Addressing leads to extra memory reference which is not preferable at run time. So, IV is not the answer.
10 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer

1 memory read - get first operand from memory address A+R0 (A is given as part of instruction)
1 memory read - get address of second operand (since second uses indirect addressing)

1 memory read - to get second operand from the address given by the previous memory read

1 memory write - to store to first operand (which is the destination)

So, totally 4 memory cycles once the instruction is fetched.

The second and third words of the instruction are loaded as part of the Instruction fetch and not during the execute stage:
Ref: http://www.cs.iit.edu/~cs561/cs350/fetch/fetch.html

42 votes -- Arjun Suresh ( 294k points)
Selected Answer
(c) is the answer.
A[i] = B[j];     Indexed addressing

while(*A++);      Auto increment

temp = *x;       Indirect addressing
10 votes -- Arjun Suresh ( 294k points)
Selected Answer
(D) is false. Relative addressing cannot be faster than absolute addressing as absolute address must be calculated from relative address. With specialized hardware unit, this can perform equally as good as absolute addressing but not faster.

(A) is true as instead of absolute address we can use a much smaller relative address in instructions which results in smaller instruction size.

(B) By using the base address of array we can index array elements using relative addressing.

(C) is true as we only need to change the base address in case of relocation- instructions remain the same.
23 votes -- Arjun Suresh ( 294k points)
Selected Answer

D) Memory location 1001 has value 20.
$R_s \leftarrow 1$ (Immediate Addressing)
$R_d \leftarrow 1$ (Indexed Addressing, value at memory location 1+1000 = 1001 is loaded to $R_d$ which is 1)
$R_d \leftarrow 1001$ ($R_d$ becomes 1+1000)
store in  address $1001 \leftarrow 20$

18 votes -- Abhinav Rana ( 707 points)
Selected Answer

In auto increment addressing mode, the base address is incremented after operand fetch. This is useful in fetching elements from an array. But this has no effect in self-relocating code (where code can be loaded to any address) as this works on the basis of an initial base address. 

An additional ALU is desirable for better execution especially with pipelining, but never a necessity. 

Amount of increment depends on the size of the data item accessed as there is no need to fetch a part of a data. 

So, answer must be C only. 

24 votes -- Arjun Suresh ( 294k points)
Selected Answer
Answer: D

Base Index Addressing, as the content of register R2 will serve as the index and 20 will be the Base address.
14 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer

Option (D)

Displacement Mode :-

Similar to index mode, except instead of a index register a base register will be used. Base register contains a pointer to a memory location. An integer (constant) is also referred to as a displacement. The address of the operand is obtained by adding the contents of the base register plus the constant. The difference between index mode and displacement mode is in the number of bits used to represent the constant. When the constant is represented a number of bits to access the memory, then we have index mode. Index mode is more appropriate for array accessing; displacement mode is more appropriate for structure (records) accessing.

 

reference

http://www.cs.iit.edu/~cs561/cs350/addressing/addsclm.html

5 votes -- Niraj Raghuvanshi ( 209 points)

A file system uses an in-memory cache to cache disk blocks. The miss rate of the cache is shown in the figure. The latency to read a block from the cache is $1$ ms and to read a block from the disk is $10$ ms. Assume that the cost of checking whether a block exists in the cache is negligible. Available cache sizes are in multiples of $10$ MB.

The smallest cache size required to ensure an average read latency of less than $6$ ms is _________ MB.

 

 

 

A block-set associative cache memory consists of $128$ blocks divided into four block sets. The main memory consists of $16, 384$ blocks and each block contains $256$ eight bit words.

  1. How many bits are required for addressing the main memory?
  2. How many bits are needed to represent the TAG, SET and WORD fields?
The access times of the main memory and the Cache memory, in a computer system, are 500 n sec and 50 n sec, respectively. It is estimated that 80% of the main memory request are for read the rest for write. The hit ratio for the read access only is 0.9 and a write-through policy (where both main and cache memories are updated simultaneously) is used. Determine the average time of the main memory.

In the three-level memory hierarchy shown in the following table, $p_i$ denotes the probability that an access request will refer to $M_i$.

Hierarchy Level

$(M_i)$

Access Time

$(t_i)$

Probability of access

$(p_i)$

Page Transfer Time

$(T_i)$

$M_1$

$M_2$

$M_3$

$10^{-6}$

$10^{-5}$

$10^{-4}$

0.99000

0.00998

0.00002

0.001 sec

0.1 sec

--

 

If a miss occurs at level $M_i$, a page transfer occurs from $M_{i+1}$ to $M_i$ and the average time required for such a page swap is $T_i$. Calculate the average time $t_A$ required for a processor to read one word from this memory system.

The principle of locality justifies the use of

(a) Interrupts
(b) DMA
(c) Polling
(d) Cache Memory
A computer system has a 4 K word cache organized in block-set-associative manner with 4 blocks per set, 64 words per block. The number of bits in the SET and WORD fields of the main memory address format is:
 
(A) 15, 40
(B) 6, 4
(C) 7, 2
(D) 4, 6

A computer system has a three level memory hierarchy, with access time and hit ratios as shown below:

Level 1 (Cache memory)

Access time = 50 nsec/byte

Size Hit Ratio
8 M byte 0.80
16 M byte 0.90
64 M byte 0.95

 

Level 2 (main memory)

Access time = 200 nsec/byte

Size Hit ratio
4M byte 0.98
16 M byte 0.99
64 M byte 0.995

 

Level 3

Size Hit ratio
260 Mbyte 1.0

 

  1. What should be the minimum sizes of level 1 and 2 memories to achieve an average access time of less than 100 nsec

  2. What is the average access time achieved using the chosen sizes of level 1 and level 2 memories?

For a set-associative Cache organization, the parameters are as follows:

$t_c$ Cache access time
$t_m$ Main memory access time
$l$ number of sets
$b$ block size
$k\times b$ set size

 

Calculate the hit ratio for a loop executed 100 times where the size of the loop is $n\times b$, and $n = k\times m$ is a non-zero integer and $1 < m \leq l$.

Give the value of the hit ratio for $l = 1$.

The main memory of a computer has 2 cm blocks while the cache has 2c blocks. If the cache uses the set associative mapping scheme with 2 blocks per set, then block k of the main memory maps to the set

  1. (k mod m) of the cache
  2. (k mod c) of the cache
  3. (k mod 2c) of the cache
  4. (k mod 2 cm) of the cache

 

More than one word are put in one cache block to

  1. exploit the temporal locality of reference in a program
  2. exploit the spatial locality of reference in a program
  3. reduce the miss penalty
  4. none of the above

A CPU has 32-bit memory address and a 256 KB cache memory. The cache is organized as a 4-way set associative cache with cache block size of 16 bytes.

  1. What is the number of sets in the cache?
  2. What is the size (in bits) of the tag field per cache block?
  3. What is the number and size of comparators required for tag matching?
  4. How many address bits are required to find the byte offset within a cache block?
  5. What is the total amount of extra memory (in bytes) required for the tag bits?

In a C program, an array is declared as float A[2048]. Each array element is 4 Bytes in size, and the starting address of the array is 0x00000000. This program is run on a computer that has a direct mapped data cache of size 8 Kbytes, with block (line) size of 16 Bytes.

  1. Which elements of the array conflict with element A[0] in the data cache? Justify your answer briefly.
  2. If the program accesses the elements of this array one by one in reverse order i.e., starting with the last element and ending with the first element,  how many data cache misses would occur? Justify your answer briefly. Assume that the data cache is initially empty and that no other data or instruction accesses are to be considered.

Consider a small two-way set-associative cache memory, consisting of four blocks. For choosing the block to be replaced, use the least recently used (LRU) scheme. The number of cache misses for the following sequence of block addresses is

8, 12, 0, 12, 8

  1. 2
  2. 3
  3. 4
  4. 5
Consider a system with 2 level cache. Access times of Level 1 cache, Level 2 cache and main memory are 1 ns, 10 ns, and 500 ns, respectively. The hit rates of Level 1 and Level 2 caches are 0.8 and 0.9, respectively. What is the average access time of the system ignoring the search time within the cache?
 
  1. 13.0
  2. 12.8
  3. 12.6
  4. 12.4

Consider a fully associative cache with 8 cache blocks (numbered 0-7) and the following sequence of memory block requests:

4, 3, 25, 8, 19, 6, 25, 8, 16, 35, 45, 22, 8, 3, 16, 25, 7
If LRU replacement policy is used, which cache block will have memory block 7?

  1. 4
  2. 5
  3. 6
  4. 7

Consider a direct mapped cache of size 32 KB with block size 32 bytes. The CPU generates 32 bit addresses. The number of bits needed for cache indexing and the number of tag bits are respectively,

  1. 10, 17
  2. 10, 22
  3. 15, 17
  4. 5, 17

Consider a 2-way set associative cache memory with 4 sets and total 8 cache blocks (0-7) and a main memory with 128 blocks (0-127). What memory blocks will be present in the cache after the following sequence of memory block references if LRU policy is used for cache block replacement. Assuming that initially the cache did not have any memory block from the current job?
0 5 3 9 7 0 16 55

  1. 0 3 5 7 16 55
  2. 0 3 5 7 9 16 55
  3. 0 5 7 9 16 55
  4. 3 5 7 9 16 55

Consider two cache organizations. First one is 32 kb 2-way set associative with 32 byte block size, the second is of same size but direct mapped. The size of an address is 32 bits in  both cases . A 2-to-1 multiplexer has latency of $0.6 ns$ while a $k-$bit comparator has latency of  $\frac{k}{10} ns$. The hit latency of the set associative organization is $h_1$ while that of direct mapped is $h_2$.

The value of $h_1$ is: 

  1. 2.4 ns 
  2. 2.3 ns 
  3. 1.8 ns 
  4. 1.7 ns 

Consider two cache organizations. First one is 32 kb 2-way set associative with 32 byte block size, the second is of same size but direct mapped. The size of an address is 32 bits in  both cases . A 2-to-1 multiplexer has latency of $0.6 ns$ while a $k-$bit comparator has latency of  $\frac{k}{10} ns$. The hit latency of the set associative organization is $h_1$ while that of direct mapped is $h_2$.
The value of $h_2$ is:  

  1. 2.4 ns 
  2. 2.3 ns 
  3. 1.8 ns 
  4. 1.7 ns 

 

A CPU has a 32 KB direct mapped cache with 128 byte-block size. Suppose A is two dimensional array of size $512 \times512$ with elements that occupy 8-bytes each. Consider the following two C code segments, P1 and P2.

P1: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[i] [j]; 
    } 
}

P2: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[j] [i]; 
    } 
} 

P1 and P2 are executed independently with the same initial state, namely, the array A is not in the cache and i, j, x are in registers.  Let the number of cache misses experienced by P1 be M1 and that for P2 be M2.

The value of Mis:

  1. 2048 
  2. 16384 
  3. 262144 

A CPU has a 32 KB direct mapped cache with 128 byte-block size. Suppose A is two dimensional array of size $512 \times512$ with elements that occupy 8-bytes each. Consider the following two C code segments, P1 and P2.

P1: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[i] [j]; 
    } 
}

P2: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[j] [i]; 
    } 
}

P1 and P2 are executed independently with the same initial state, namely, the array A is not in the cache and i, j, x are in registers.  Let the number of cache misses experienced by P1 be M1 and that for P2 be M2.

The value of the ratio $\frac{M_{1}}{M_{2}}$

  1. $0$ 
  2. $\frac{1}{16}$
  3. $\frac{1}{8}$
  4. $16$ 

A cache line is 64 bytes. The main memory has latency 32ns and bandwidth 1G.Bytes/s. The time required to fetch the entire cache line from the main memory is

  1. 32 ns
  2. 64 ns
  3. 96 ns
  4. 128 ns

A computer system has a level-1 instruction cache (1-cache), a level-1 data cache (D-cache) and a level-2 cache (L2-cache) with the following specifications:

  Capacity Mapping Method Block size
I-cache 4K words Direct mapping 4 Words
D-cache 4K words 2-way set associative mapping 4 Words
L2-cache 64K words 4-way set associative mapping 16 Words

The length of the physical address of a word in the main memory is 30 bits. The capacity of the tag memory in the I-cache, D-cache and L2-cache is, respectively,

  1. 1 K x 18-bit, 1 K x 19-bit, 4 K x 16-bit
  2. 1 K x 16-bit, 1 K x 19-bit, 4 K x 18-bit
  3. 1 K x 16-bit, 512 x 18-bit, 1 K x 16-bit
  4. 1 K x 18-bit, 512 x 18-bit, 1 K x 18-bit

Consider a 4-way set associative cache consisting of 128 lines with a line size of 64 words. The CPU generates a 20-bit address of a word in main memory. The number of bits in the TAG, LINE and WORD fields are respectively:

  1. 9, 6, 5
  2. 7, 7, 6
  3. 7, 5, 8
  4. 9, 5, 6

Consider a machine with a byte addressable main memory of 216 bytes. Assume that a direct mapped data cache consisting of 32 lines of 64 bytes each is used in the system. A 50 x 50 two-dimensional array of bytes is stored in the main memory starting from memory location 1100H. Assume that the data cache is initially empty. The complete array is accessed twice. Assume that the contents of the data cache do not change in between the two accesses. 

How many data misses will occur in total?

  1. 48 
  2. 50
  3. 56
  4. 59

Consider a machine with a byte addressable main memory of $2^{16}$ bytes. Assume that a direct mapped data cache consisting of 32 lines of 64 bytes each is used in the system. A 50 x 50 two-dimensional array of bytes is stored in the main memory starting from memory location 1100H. Assume that the data cache is initially empty. The complete array is accessed twice. Assume that the contents of the data cache do not change in between the two accesses. 

Which of the following lines of the data cache will be replaced by new blocks in accessing the array for the second time?

  1. line 4 to line 11
  2. line 4 to line 12
  3. line 0 to line 7
  4. line 0 to line 8

Consider a Direct Mapped Cache with 8 cache blocks (numbered 0-7). If the memory block requests are in the following order

3, 5, 2, 8, 0, 63, 9,16, 20, 17, 25, 18, 30, 24, 2, 63, 5, 82,17, 24.

Which of the following memory blocks will not be in the cache at the end of the sequence ?

  1. 3
  2. 18
  3. 20
  4. 30

For inclusion to hold between two cache levels L1 and L2 in a multi-level cache hierarchy, which of the following are necessary?

  1. L1 must be write-through cache

  2. L2 must be a write-through cache

  3. The associativity of L2 must be greater than that of L1

  4. The L2 cache must be at least as large as the L1 cache

  1. IV only
  2. I and IV only
  3. I, II and IV only
  4. I, II, III and IV

Consider a machine with a 2-way set associative data cache of size 64 Kbytes and block size 16 bytes. The cache is managed using 32 bit virtual addresses and the page size is 4 Kbytes. A program to be run on this machine begins as follows:

double ARR[1024][1024];
int i, j;
/*Initialize array ARR to 0.0 */
for(i = 0; i < 1024; i++)
    for(j = 0; j < 1024; j++)
        ARR[i][j] = 0.0;

The size of double is 8 bytes. Array ARR is located in memory starting at the beginning of virtual page 0xFF000 and stored in row major order. The cache is initially empty and no pre-fetching is done. The only data memory references made by the program are those to array ARR.

The total size of the tags in the cache directory is

  1. 32 Kbits
  2. 34 Kbits
  3. 64 Kbits
  4. 68 Kbits

Consider a machine with a 2-way set associative data cache of size 64 Kbytes and block size 16 bytes. The cache is managed using 32 bit virtual addresses and the page size is 4 Kbytes. A program to be run on this machine begins as follows:

double ARR[1024][1024];
int i, j;
/*Initialize array ARR to 0.0 */
for(i = 0; i < 1024; i++)
    for(j = 0; j < 1024; j++)
        ARR[i][j] = 0.0;


The size of double is 8 bytes. Array ARR is located in memory starting at the beginning of virtual page 0xFF000 and stored in row major order. The cache is initially empty and no pre-fetching is done. The only data memory references made by the program are those to array ARR.

Which of the following array elements have the same cache index as ARR[0][0]?

  1. ARR[0][4]
  2. ARR[4][0]
  3. ARR[0][5]
  4. ARR[5][0]

Consider a machine with a 2-way set associative data cache of size 64 Kbytes and block size 16 bytes. The cache is managed using 32 bit virtual addresses and the page size is 4 Kbytes. A program to be run on this machine begins as follows:

double ARR[1024][1024];
int i, j;
/*Initialize array ARR to 0.0 */
for(i = 0; i < 1024; i++)
    for(j = 0; j < 1024; j++)
        ARR[i][j] = 0.0;


The size of double is 8 bytes. Array ARR is located in memory starting at the beginning of virtual page 0xFF000 and stored in row major order. The cache is initially empty and no pre-fetching is done. The only data memory references made by the program are those to array ARR.

The cache hit ratio for this initialization loop is

  1. 0%
  2. 25%
  3. 50%
  4. 75%

 

Consider a computer with a 4-ways set-associative mapped cache of the following character­istics: a total of 1 MB of main memory, a word size of 1 byte, a block size of 128 words and a cache size of 8 KB.

The number of bits in the TAG, SET and WORD fields, respectively are:

  1. 7, 6, 7
  2. 8, 5, 7
  3. 8, 6, 6
  4. 9, 4, 7

Consider a computer with a 4-ways set-associative mapped cache of the following character­istics: a total of 1 MB of main memory, a word size of 1 byte, a block size of 128 words and a cache size of 8 KB.

While accessing the memory location 0C795H by the CPU, the contents of the TAG field of the corresponding cache line is

  1. 000011000
  2. 110001111
  3. 00011000
  4. 110010101

Consider a 4-way set associative cache (initially empty) with total 16 cache blocks. The main memory consists of 256 blocks and the request for memory blocks are in the following order:

0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155.

Which one of the following memory block will NOT be in cache if LRU replacement policy is used?

  1. 3
  2. 8
  3. 129
  4. 216

A computer system has an L1 cache, an L2 cache, and a main memory unit connected as shown below. The block size in L1 cache is 4 words. The block size in L2 cache is 16 words. The memory access times are 2 nanoseconds, 20 nanoseconds and 200 nanoseconds for L1 cache, L2 cache and the main memory unit respectively.

When there is a miss in L1 cache and a hit in L2 cache, a block is transferred from L2 cache to L1 cache. What is the time taken for this transfer?

  1. 2 nanoseconds
  2. 20 nanoseconds
  3. 22 nanoseconds
  4. 88 nanoseconds

A computer system has an L1 cache, an L2 cache, and a main memory unit connected as shown below. The block size in L1 cache is 4 words. The block size in L2 cache is 16 words. The memory access times are 2 nanoseconds, 20 nanoseconds and 200 nanoseconds for L1 cache, L2 cache and the main memory unit respectively.

When there is a miss in both L1 cache and L2 cache, first a block is transferred from main memory to L2 cache, and then a block is transferred from L2 cache to L1 cache. What is the total time taken for these transfers?

  1. 222 nanoseconds
  2. 888 nanoseconds
  3. 902 nanoseconds
  4. 968 nanoseconds

An 8KB direct-mapped write-back cache is organized as multiple blocks, each size of 32-bytes. The processor generates 32-bit addresses. The cache controller contains the tag information for each cache block comprising of the following.

  • 1 valid bit
  • 1 modified bit
  • As many bits as the minimum needed to identify the memory block mapped in the cache.

What is the total size of memory needed at the cache controller to store meta-data (tags) for the cache?

(A) 4864 bits

(B) 6144 bits

(C) 6656 bits

(D) 5376 bits

A computer has a $256$-$\text{KByte}$, 4-way set associative, write back data cache with block size of $32$ $\text{Bytes}$. The processor sends $32$ $\text{bit}$ addresses to the cache controller. Each cache tag directory entry contains, in addition to address tag, $2$ valid bits, $1$ modified bit and $1$ replacement bit.

The number of bits in the tag field of an address is

  1. 11
  2. 14
  3. 16
  4. 27

A computer has a $256$-$\text{KByte}$, 4-way set associative, write back data cache with block size of $32$ $\text{Bytes}$. The processor sends $32$ $\text{bit}$ addresses to the cache controller. Each cache tag directory entry contains, in addition to address tag, $2$ valid bits, $1$ modified bit and $1$ replacement bit.

The size of the cache tag directory is

  1. 160 $\text{Kbits}$
  2. 136 $\text{Kbits}$
  3. 40 $\text{Kbits}$
  4. 32 $\text{Kbits}$

In a $k$-way set associative cache, the cache is divided into $v$ sets, each of which consists of $k$ lines. The lines of a set are placed in sequence one after another. The lines in set $s$ are sequenced before the lines in set $(s+1)$. The main memory blocks are numbered 0 onwards. The main memory block numbered $j$ must be mapped to any one of the cache lines from

(A) $(j\text{ mod }v) * k \text{ to } (j \text{ mod } v) * k + (k-1) $

(B) $(j \text{ mod } v) \text{ to } (j \text{ mod } v) + (k-1) $

(C) $(j \text{ mod } k) \text{ to } (j \text{ mod } k) + (v-1) $

(D) $(j \text{ mod } k) * v \text{ to } (j \text{ mod } k) * v + (v-1) $

An access sequence of cache block addresses is of length $N$ and contains n unique block addresses. The number of unique block addresses between two consecutive accesses to the same block address is bounded above by $k$. What is the miss ratio if the access sequence is passed through a cache of associativity $ A\geq k $ exercising least-recently-used replacement policy?

  1. $n/N$
  2. $1/N$
  3. $1/A$
  4. $k/n$

In designing a computer's cache system, the cache block (or cache line) size is an important parameter. Which one of the following statements is correct in this context?

  1. A smaller block size implies better spatial locality
  2. A smaller block size implies a smaller cache tag and hence lower cache tag overhead
  3. A smaller block size implies a larger cache tag and hence lower cache hit time
  4. A smaller block size incurs a lower cache miss penalty

If the associativity of a processor cache is doubled while keeping the capacity and block size unchanged, which one of the following is guaranteed to be NOT affected?

  1. Width of tag comparator
  2. Width of set index decoder
  3. Width of way selection multiplexer
  4. Width of processor to main memory data bus
A 4-way set-associative cache memory unit with a capacity of 16 KB is built using a block size of 8 words. The word length is 32 bits. The size of the physical address space is 4 GB. The number of bits for the TAG field is ____
The memory access time is 1 nanosecond for a read operation with a hit in cache, 5 nanoseconds for a read operation with a miss in cache, 2 nanoseconds for a write operation with a hit in cache and 10 nanoseconds for a write operation with a miss in cache. Execution of a sequence of instructions involves 100 instruction fetch operations, 60 memory operand read operations and 40 memory operand write operations. The cache hit-ratio is 0.9. The average memory access time (in nanoseconds) in executing the sequence of instructions is ______.
Assume that for a certain processor, a read request takes 50 nanoseconds on a cache miss and 5 nanoseconds on a cache hit. Suppose while running a program, it was observed that 80% of the processor's read requests result in a cache hit. The average read access time in nanoseconds is _____.

Consider a machine with a byte addressable main memory of $2^{20}$ bytes, block size of 16 bytes and a direct mapped cache having $2^{12}$ cache lines. Let the addresses of two consecutive bytes in main memory be $(E201F)_{16}$ and $(E2020)_{16}$. What are the tag and cache line addresses ( in hex) for main memory address $(E201F)_{16}$?

 

  1. E, 201
  2. F, 201
  3. E, E20
  4. 2, 01F
The width of the physical address on a machine is $40$ bits. The width of the tag field in a $512$ KB $8$-way set associative cache is ________ bits.
Consider a two-level cache hierarchy with $L1$ and $L2$ caches. An application incurs $1.4$ memory accesses per instruction on average. For this application, the miss rate of $L1$ cache is $0.1$; the $L2$ cache experiences, on average, $7$ misses per $1000$ instructions. The miss rate of $L2$ expressed correct to two decimal places is ________.
A cache memory unit with capacity of $N$ words and block size of $B$ words is to be designed. If it is designed as a direct mapped cache, the length of the TAG field is 10 bits. If the cache unit is now designed as a 16-way set-associative cache, the length of the TAG field is ____________ bits.

In a two-level cache system, the access times of $L_1$ and $L_2$ caches are 1 and 8 clock cycles, respectively. The miss penalty from the $L_2$ cache to main memory is 18 clock cycles. The miss rate of $L_1$ cache is twice that of $L_2$. The average memory access time (AMAT) of this cache system is 2 cycles. The miss rates of $L_1$ and $L_2$ respectively are

  1. 0.111 and 0.056
  2. 0.056 and 0.111
  3. 0.0892 and 0.1784
  4. 0.1784 and 0.0892

The read access times and the hit ratios for different caches in a memory hierarchy are as given below:

Cache Read access time (in nanoseconds) Hit ratio
I-cache 2 0.8
D-cache 2 0.9
L2-cache 8 0.9

The read access time of main memory in 90 nanoseconds. Assume that the caches use the referred-word-first read policy and the writeback policy. Assume that all the caches are direct mapped caches. Assume that the dirty bit is always 0 for all the blocks in the caches. In execution of a program, 60% od memory reads are for instruction fetch and 40% are for memory operand fetch. The average read access time in nanoseconds (up to 2 decimal places) is _________

 

Consider a machine with a byte addressable main memory of $2^{32}$ bytes divided into blocks of size 32 bytes. Assume that a direct mapped cache having 512 cache lines is used with this machine. The size of the tag field in bits is _______

Answers: Cache Memory

Selected Answer

Look aside Cache Latency = 1ms

Main Memory Latency = 10ms

  • Lets try with  20 MB

         Miss rate = 60% , Hit rate = 40%

         Avg  =0.4 (1) +0.6 (10)

                 = 0.4 +6 = 6.4 ms > 6 ms

  • Next Take 30 MB 

          Miss rate = 40% , Hit rate = 60%

          Avg  = 0.6 (1) + 0.4 (10)

                 = 0.6 + 4 = 4.6 ms < 6ms

So answer is 30 MB

20 votes -- Akash ( 43.8k points)
Selected Answer

For  main memory, there are 2^14 blocks and each block size is 2^8 byte (byte can be considered as eight-bit words)

1.So size of main memory = 2^14*2^8=4MB( 22 bits required for addressing the main memory).

2. For Word field , we require 8 bits,

As number of blocks is 2^7 

As 4 block in 1 set, then 32 sets will be needed for 128 blocks so, For SET, we require 5bits.

As now tag bits= 22-(5+8)= 9 bits

9 bits (for tag) 5 bits( for set) 8bits (for word)

 

 

8 votes -- kirti singh ( 3.4k points)
Selected Answer
Average memory access time =  Time spend for read + Time spend for write

= Read time when cache hit + Read time when cache miss
+Write time when cache hit + Write time when cache miss

= 0.8 ⨯ 0.9 ⨯ 50 + 0.8 ⨯ 0.1 ⨯ (500+50) (assuming hierarchical read from memory and cache as only simultaneous write is mentioned in question)
+ 0.2 ⨯ 0.9 ⨯ 500 + 0.2 ⨯ 0.1 ⨯ 500 (simultaneous write mentioned in question)

= 36 + 44 + 90 + 10 = 180 ns

http://www.howardhuang.us/teaching/cs232/24-Cache-writes-and-examples.pdf
11 votes -- Arjun Suresh ( 294k points)
Selected Answer

We are given the probabiliy of access being a hit in each level (clear since their sum adds to 1). So, we can get the average access time as:

$t_A = 0.99 \times {10}^{-6} \\+ 0.00998 \times ({10}^{-6} +{10}^{-5} + 0.001) \\+ 0.00002 \times ({10}^{-6} + {10}^{-5} + {10}^{-4} + 0.1 + 0.001)] \\ \approx (0.99 + 1 + 2 ) \times [10^{-6}] \\= 4 \mu s$.


We can also use the following formula- for 100% of accesses $M_1$ is accessed, whenever $M_1$ is a miss, $M_2$ is accessed and when both misses only $M_3$ is accessed. So, average memory access time,

$t_A = 10^{-6} + (1-0.99) \times (10^{-5} + 0.001) + 0.00002 \times (10^{-4} + 0.1)
\\= 1 + 1.01 + 2
\\= 4.01 \mu s.$

 

5 votes -- Arjun Suresh ( 294k points)
Selected Answer
It is D.

Locality of reference is actually the frequent accessing of any storage location or some value. We can say in simple language that whatever things are used more frequently, they are stored in the locality of reference. So we have cache memory for the purpose.
10 votes -- Gate Keeda ( 19.1k points)
Selected Answer
Number of sets = 4K /(64*4) = 16. So, we need 4 bits to identify a set => SET = 4 bits.

64 words per block means WORD is 6 bits.

So, (D)
8 votes -- Arjun Suresh ( 294k points)
Selected Answer

The equation for access time can be written as follows (assuming $a,b$ are the hit ratios of level1 and level2 respectively).

$T=T_1 + (1-a)T_2+(1-a)\times(1-b)T_3$

 Here $T\leq 100, T_1 = 50ns ,T_2 = 200ns$ and $T_3 = 5000ns.$ On substituting the $a, b$ for the first case we get

T = 95ns for a = 0.8 and b = 0.995. i.e., L1 = 8M and L2 = 64M.
T = 75ns for a = 0.9 and b = 0.99. i.e., L1 = 16M and L2 = 4M

b.

  1. $L_1 = 8M, a = 0.8, L_2 = 4M, b = 0.98$. So, 
    $T = 50 + 0.2 \times 200 + 0.2 \times 0.02 \times 5000 \\= 50 + 40 + 20 = 110ns$
  2. $L_1 = 16M, a = 0.9, L_2 = 16M, b = 0.99$. So,
    $T = 50 + 0.1 \times 200 + 0.1 \times 0.01 \times 5000 \\=50 + 20 + 5 = 75ns$
  3. $L_1 = 64M, a = 0.95, L_2= 64M, b = 0.995$. So, 
    $T = 50 + 0.05 \times 200 + 0.05 \times 0.005 \times 5000 \\= 50 + 10 + 1.25 = 61.25ns$

 

 

3 votes -- kireeti ( 1.1k points)
Selected Answer
$\text{Size of the loop} = n\times b = k \times m \times b$

$\text{Size of a set} = k \times b$ ($k-way$ associative)

Here, size of the loop is smaller than size of a set as $m \leq l$. Now, however be the mapping (whether all be mapped to the same set or not), we are guaranteed that the entire loop is in cache without any replacement.

For the first iteration:

$\text{No. of accesses} = n \times b$
$\text{No. of misses} = n$ as each new block access is a miss and loop body has $n$ blocks each of size $b$ for a total size of $n \times b$.

For, the remaining 99 iterations:

$\text{No. of accesses} = n \times b$
$\text{No. of misses} = 0$

So, $\text{total no. of accesses} = 100 nb$

$\text{Total no. of hits} = \text{Total no. of accesses} - \text{Total no. of misses}$

$= 100 n b - n  $

So, $\text{hit ratio} = \frac{100nb - n}{100nb} = 1- \frac{1}{100b}$

The hit ratio is independent if $l$, so for $l=1$ also we have $\text{hit ratio} = 1- \frac{1}{100b}$
8 votes -- Arjun Suresh ( 294k points)
Selected Answer
Number of cache blocks = 2c

Number of sets in cache = 2c/2 = c since each set has 2 blocks. Now, a block of main memory gets mapped to a set (associativity of 2 just means there are space for 2 memory blocks in a cache set), and we have 2cm blocks being mapped to c sets. So, in each set 2m different main memory blocks can come and block k of main memory will be mapped to k mod c.
10 votes -- Arjun Suresh ( 294k points)
Selected Answer
exploit the spatial locality of reference in a program as, if the next locality is addressed immediately, it will already be in the cache.

Consider the scenario similar to cooking, where when an ingredient is taken from cupboard, you also take the near by ingredients along with it- hoping that they will be needed in near future.
26 votes -- Arjun Suresh ( 294k points)
Selected Answer

What is the number of sets in the cache?

     Number of sets = Cache memory/(set associativity * cache block size)

                              = 256KB/(4*16 B)

                              = 4096

 

What is the size (in bits) of the tag field per cache block?

    Memory address size = 32-bit

   Number of bits required to identify a particular set  = 12 (Number of sets = 4096)

   Number of bits required to identify a paticular location in cache line = 4 (cache block size = 16)

  size of tag field = 32 -12 -4 = 16-bit

 

What is the number and size of comparators required for tag matching?

We use 4-way set associate cache. So, we need 4 comparators each of size 16 bits

http://ecee.colorado.edu/~ecen2120/Manual/caches/cache.html

 

How many address bits are required to find the byte offset within a cache block?

Cache block size is 16 byte. so 4 bits are required to find the byte offset within a cache block.

 

What is the total amount of extra memory (in bytes) required for the tag bits?

size of tag = 16 bits

Number of sets = 4096

Set associativity = 4

Extramemory required to store the tag bits = 16 * 4096 * 4 bits = 218 bytes

15 votes -- suraj ( 5.1k points)
Selected Answer

(a)

Data cache size = 8KB.

Block line size = 16B.

Since each array element occupies 4B, four consecutive array elements occupy a block line (elements are aligned as starting address is 0)

Number of cache blocks = 8KB/16B = 512. Number of cache blocks needed for the array = 2048/4 = 512. So, all the array elements has its own cache block and there is no collision.

We can also explain this with respect to array address. Starting address is 0x00000000 = 0b0000..0 (32 0's). Ending address is 0x00001FFF = 0b0000..0111111111111 (4*2048 = 8192 location).

Here, the last 4 bits are used as OFFSET bits and the next 9 bits are used as SET bits. So, since the ending address is not extending beyond these 9 bits, all cache accesses are to diff sets.

(b) If the last element is accessed first, its cache block is fetched. (which should contain the previous 3 elements of the array also since each cache block hold 4 elements of array and 2048 is and exact multiple of 4). Thus, for every 4 accesses, we will have a cache miss => for 2048 accesses we will have 512 cache misses. (This would be same even if we access array in forward order).

15 votes -- Arjun Suresh ( 294k points)
Selected Answer

We have 4 blocks and 2 blocks in a set

=> there are 2 sets. So blocks will go

to sets as follows:

Set Number Block Number
0 0, 8, 12
1  

since the lowest bit of block address is used for indexing into the set. 

So, 8, 12 and 0 first miss in cache with 0 replacing 8 (there are two slots in each set due to 2-way set) and then 12 hits in cache and 8 again misses. So totally 4 misses. 

8 votes -- Arjun Suresh ( 294k points)
Selected Answer
By default we consider hierarchical access - because that is the common implementation and simultaneous access cache has great practical difficulty. But here the question is a bit ambiguous -- it says to ignore search time within the cache - usually search is applicable for an associative cache but here no such information given. So, may be they are telling to ignore the search time for L1 and just consider the time for L2 for an L1 miss and similarly just consider memory access time for L2 miss. This is nothing but simultaneous access.

Access time for hierarchical access $ =  t_1 + (1-h_1) \times t_2 + (1-h_1) (1-h_2) t_m \\= 1 + 0.2 \times 10 +  0.2 \times 0.1 \times 500 \\= 13ns.$

Access time for simultaneous access $ = h_1 \times t_1 + (1-h_1)h_2 \times t_2 + (1-h_1)(1-h_2)t_m \\= 0.8 + 0.2 \times 0.9 \times 10 + 0.2 \times 0.1 \times 500 \\= 12.6 ns.$

Both options in choice -  :O
21 votes -- Arjun Suresh ( 294k points)
option C

t1 * h1 + (1- h1) h2 t2 + (1-h1) (1-h2) tm

tm- main memory access time
10 votes -- rajsh3kar ( 1.3k points)
Selected Answer
When 45 comes, the cache contents are
4 3 25 8 19 6 16 35

LRU array (first element being least recently used)

[4 3 19 6 25 8 16 35]

So, 45 replaces 4

45 3 25 8 19 6 16 35 [3 19 6 25 8 16 35 45]

Similarly 22 replaces 3 to give

45 22 25 8 19 6 16 35 [19 6 25 8 16 35 45 22]

8 hits in cache

45 22 25 8 19 6 16 35 [19 6 25 16 35 45 22 8]

3 replaces 19

45 22 25 8 3 6 16 35 [6 25 16 35 45 22 8 3]

16 and 25 hits in cache

45 22 25 8 3 6 16 35 [6 35 45 22 8 16 25 3]

Finally 7 replaces 6, which is in block 5.

So, answer is (B)
10 votes -- Arjun Suresh ( 294k points)
Selected Answer
Number of blocks = cache size/block size = 32KB/32 = 1024 bytes. So, indexing requires 10 bits. Number of OFFSET bits required to access 32 bit block = 5. So, number of TAG bits = 32 - 10 - 5 = 17. So, answer is (A).
11 votes -- Arjun Suresh ( 294k points)
Selected Answer

128 main memory blocks are mapped to 4 sets in cache. So, each set maps 32 blocks each. And in each set there is place for two blocks (2-way set). 

Now, we have 4 sets meaning 2 index bits. Also, 32 blocks going to one set means 5 tag bits. 

Now, these 7 bits identify a memory block and tag bits are placed before index bits. (otherwise adjacent memory references- spatial locality- will hamper cache performance)

So, based on the two index bits (lower 2 bits) blocks will be going to sets as follows:

Set Number Block Numbers
0 0,16
1 5, 9
2  
3 3, 7, 55

 

Since, each set has only 2 places, 3 will be throw out as its the least recently used block. So, final content of cache will be

0 5 7 9 16 55

(C) choice.

 

22 votes -- Arjun Suresh ( 294k points)
Selected Answer

Cache size is 32 KB and cache block size is 32 B. So,

$\text{number of sets}  = \frac{\text{cache size}}{\text{no. of blocks in a set } \times \text{ block size}}$

$ = \frac{32KB}{2 \times 32B} = 512$

So, number of index bits needed = 9 ( since 29 = 512). Number of offset bits = 5 (since 25 = 32 B is the block size and assuming byte addressing). So, number of tag bits = 32 - 9 - 5 = 18 (as memory address is of 32 bits). 

So, time for comparing the data = Time to compare the data + Time to select the block in set  = 0.6 + 18/10 ns = 2.4 ns. (Two comparisons of tag bits need to be done for each block in a set, but they can be carried out in parallel and the succeeding one multiplexed as the output).

Ref: https://courses.cs.washington.edu/courses/cse378/09au/lectures/cse378au09-19.pdf


22 votes -- Arjun Suresh ( 294k points)
 

gives a better insight.

20 votes -- Amar Vashishth ( 28.7k points)

word offset=5bit

block offset=32kb/32=10bit

so tag bit=32-10-5=17bit

hit latency=mux delay+comparator delay

1.  mux is not required in direct mapped cache coz we have only one comprator(IF IT IS 2 WAY SET ASSOCITATIVE THEN COMPRATOR WILL BE 2 AND WE NEED A MUX OF 2-TO-1 TO DECIDE HIT/MISS) so mux delay=0..

2  comp. delay=k/10=17/10=1.7.

so h2 =1.7

5 votes -- asutosh kumar Biswal ( 10.2k points)
Selected Answer

Code being C implies array layout is row-major.

http://en.wikipedia.org/wiki/Row-major_order

When A[0][0] is fetched, 128 consecutive bytes are moved to cache. So, for the next 128/8 -1= 15 memory references there won't be a cache miss. For the next iteration of i loop also the same thing happens as there is no temporal locality in the code. So, number of cache misses for P1

$= \frac{512}{16} \times 512$

$ = 32 \times 512 $

$=2^{14} = 16384$
 

14 votes -- Arjun Suresh ( 294k points)

$\begin{align*} \text{Number of Cache Lines} &= \frac{2^{15}B}{128B}\\ &= 256\\ \text{In 1 Cache Line} &= \frac{128B}{8B} = 16\ elements\\ \\ P_1 &= \frac{\text{total elements in array}}{\text{elements in a cache line}}\\ &= \frac{512 \times 512}{16}\\ &= 2^{14}\\ &= 16384 \end{align*}$

It is so because for $P_1$ for every line there is a miss, and once a miss is processed we get 16 elements in memory. So another miss happens after 16 elements.

Hence, answer = option C

11 votes -- Amar Vashishth ( 28.7k points)
Selected Answer

$\begin{align*} \text{Number of Cache Lines} &= \frac{2^{15}B}{128B}\\ &= 256\\ \text{In 1 Cache Line} &= \frac{128B}{8B} = 16\ elements\\ \\ P_1 &= \frac{\text{total elements in array}}{\text{elements in a cache line}}\\ &= \frac{512 \times 512}{16}\\ &= 2^{14}\\ &= 16384\\ \\ P_2 &= 512 \times 512 \\ &= 2^{18}\\ \\ \frac{P_1}{P_2} &= \frac{16384}{512 \times 512}\\ &= 2^{14-18}\\ &= 2^{-4}\\ &= \frac{1}{16} \end{align*}$

It is so because for $P_1$ for every line there is a miss, and once a miss is processed we get 16 elements in memory. So another miss happens after 16 elements.
for $P_2$ for every element there is a miss coz storage is row major order(by default) and we are accessing column wise.

Hence,
answer = option B

 

10 votes -- Amar Vashishth ( 28.7k points)

Code being C implies array layout is row-major.

http://en.wikipedia.org/wiki/Row-major_order

When A[0][0] is fetched, 128 consecutive bytes are moved to cache. So, for the next 128/8 -1= 15 memory references there won't be a cache miss. For the next iteration of i loop also the same thing happens as there is no temporal locality in the code. So, number of cache misses for P1

$= \frac{512}{16} \times 512$

$ = 32 \times 512 $

$=2^{14} = 16384$

 

In the case of P2, the memory references are not consecutive. After A[0][0], the next access is A[1][0] which is after 512 * 8 memory locations. Since our cache block can hold only 128 contiguous memory locations, A[1][0] won't be in cache after a[0][0] is accessed. Now, the next location after A[0][0] is A[0][1] which will be accessed only after 512 iterations of the inner loop- after 512 distinct memory  block accesses. In our cache we have only space for 32 KB/128 B = 256 memory blocks. So, by the time A[0][1] is accessed, its cache block would be replaced. So, each of the memory access in P2 results in a cache miss. Total number of cache miss

$ = 512\times 512$

So, $\frac{M_1}{M_2} = \frac{32 \times 512}{512 \times 512} = \frac{1}{16}$

15 votes -- Arjun Suresh ( 294k points)
Selected Answer

ans : c

 for 1 sec it is 10^9 bytes

so for 64 bytes?

it is 64 * 1 /10^9 so it is 64 ns but mm latency is 32 so total time required to place cache line is 

64+32 = 96 ns

15 votes -- rajsh3kar ( 1.3k points)
Selected Answer

1. I-cache

  • Number of blocks in cache = 4K/4 = 210 blocks
  • Bits to represent blocks = 10
  • Number of words in a block = 4 = 22 words
  • Bits to represent words = 2
  • tag bits = 30 - (10+2) = 18
  • Each block will have it's own tag bits. So total tag bits = 1K x 18 bits.

2. D-cache

  • Number of blocks in cache = 4K/4 = 210 blocks
  • Number of sets in cache = 210/2 = 2sets
  • Bits to represent sets = 9
  • Number of words in a block = 4 = 22 words
  • Bits to represent words = 2
  • tag bits = 30 - (+2) = 19
  • Each block will have it's own tag bits. So total tag bits = 1K x 19 bits.

3. L2 cache

  • Number of blocks in cache = 64K/16 = 212 blocks
  • Number of sets in cache = 212/4 = 210 sets
  • Bits to represent sets = 10
  • Number of words in cache = 16 = 24 words
  • Bits to represent words = 4
  • tag bits = 30 - (10+4) = 16
  • Each block will have it's own tag bits. So total tag bits = 212 x 16 bits = 4K x 16 bits

Option A.

13 votes -- Viral Kapoor ( 2k points)
Selected Answer
Number of sets = cache size/(size of a block * No. of blocks in a set)

= 128 * 64 / (64 * 4) (4 way set associative means 4 blocks in a set)

= 32.

So, number of index (LINE) bits = 5 and number of WORD bits = 6 size cache block (line) size is 64. So, number of TAG bits = 20 - 6 - 5 = 9.

Answer is (D) choice
10 votes -- Arjun Suresh ( 294k points)
Selected Answer

bits used to represent the address = $\log_2{2^{16}} = 16$

each cache line size = 64 bytes; means offset requires 6 bits

total number of lines in cache = 32; means line# requires 5 bits

so, tag bits $= 16 - 6 - 5 = 5$

we have a 2D array each of its element is of size = 1 Byte;
total size of this array = $50 \times 50 \times 1Byte = 2500 Bytes$

so, total number of lines it will require to get contain in cache = $\frac{2500B}{64B} = 39.0625 \approx 40$

starting address of array = 1100H = $00010 \ 00100 \ 000000$
the group of bits in middle represents Cache Line number $\implies$ array starts from cache line number $4$,
we require 40 cache lines to hold all array elements, but we have only 32 cache lines

Lets group/partition our 2500 array elements in those 40 array lines, we call this first array line as $A_0$ which will have 64 of its elements. this line(group of 64 elements) of array will be mapped to cache line number 4 as found by analysis of starting address of array above.

This all means that among those 40 array lines some array lines will be mapped to same cache line, coz there are just 32 cache lines but 40 of array lines.

this is how mapping is :
$$\begin{matrix} 0& A_{28} & \\ 1& A_{29} & \\ 2& A_{30} & \\ 3& A_{31} & \\ 4& A_{0} & A_{32} \\ 5& A_{1} & A_{33} \\ 6& A_{2} & A_{34} \\ 7& A_{3} & A_{35} \\ 8& A_{4} & A_{36} \\ 9& A_{5} & A_{37} \\ 10& A_{6} & A_{38} \\ 11& A_{7} & A_{39} \\ 12& A_{8} & \\ \vdots\\ 30& A_{26} & \\ 31& A_{27} & \end{matrix}$$

so, if we access complete array twice we get $=32+8+8+8 = 56$ miss
coz only 8 lines from cache line number 4 to 11 are miss operation, rest are Hits(not counted) or Compulsory misses(first 32).

Hence, Q.80 answer = option C
 

54 votes -- Amar Vashishth ( 28.7k points)

$2^{16} = 64$ KB main memory is mapped to 32 lines of 64 bytes. So, number of offset bits = 6 (to identify a byte in a line) and number of indexing bits = 5 (to identify the line).

Size of array = 50* 50 = 2500 B. If array is stored in row-major order (first row, second-row..), and if elements are also accessed in row-major order (or stored and accessed in column-major order), for the first 64 accesses, there will be only 1 cache miss, and for 2500 accesses, there will be 2500/64 = 40 cache misses during the first iteration. 

We have 5 index bits and 6 offset bits. So, for 211 (5 + 6 = 11) continuous memory addresses there wont be any cache conflicts as the least significant bits are offset bits followed by index bits. 

So, number of array elements that can be accessed without cache conflict = 2048 (as element size is a byte). The next 452 elements conflict with the first 452 elements. This means 452/64 = 8 cache blocks are replaced. (We used ceil, as even if one element from a new block is accessed, the whole block must be fetched). 

So, during second iteration we incur misses for the first 8 cache blocks as well as for the last 8 cache blocks. So, total data cache misses across 2 iterations = 40 + 8 + 8 = 56.

 

16 votes -- Arjun Suresh ( 294k points)
Selected Answer

Cache Organization:

Staring Address=1100H = 163+162+0+0 =4352B is the starting address.

We need to find Starting block = 4352B/64B = 68th block in main memory from where array start storing elements.

50*50B =array size = 50*50B/64B =39.0625 blocks needed =approx = 40 blocks

68,69,70....107 block we need = 40 blocks

starting block is 68 mod 32 = 4th cache block and after that in sequence they will be accessed. 

as shown in below table line no 4 to 11 has been replaced by array in second time

Cache block No. First Cycle Second Cycle
0 96  
1 97  
2 98  
3 99  
4 68 //100 68
5 69 //101 69
6 70 //102 70
7 71 //103 71
8 72 //104 72
9 73 // 105 73
10 74 //106 74
11 75 // 107 75
12 76  
13 77  
14 78  
15 79  
16 80  
17 81  
18 82  
19 83  
20 84  
21 85  
22 86  
23 87  
24 88  
25 89  
26 90  
27 91  
28 92  
29 93  
30 94  
31 95  

 

13 votes -- papesh ( 24.1k points)
Selected Answer
ans is B

cache location (memory block) = block req mod number of cache blocks. Since each block has only one location (associativity is 1) the last mod 8 request will be in cache (no need of any replacement policy as mapping is direct).

3, 5, 2, 8, 0, 63, 9, 16, 20, 17, 25, 18, 30, 24, 2, 63, 5, 82, 17, 24

Block 0- 8, 0, 16, 24, 24. At end contains 24.
1- 9, 17, 25, 17.  
2- 2, 18, 2, 82.
3- 3.
4- 20.
5- 5, 5.
6- 30.
7- 63 63.

So, memory block 18 is not in cache while 3, 20 and 30 are in cache.
10 votes -- rajsh3kar ( 1.3k points)
Selected Answer
1st is not correct as data need not to be exactly same at the same point of time and so write back policy can be used in this.

2nd is not needed when talking only about L1 and L2.

For 3rd, associativity can be equal.

So only 4th statement is Necessarily true - A choice.
11 votes -- Shaun Patel ( 6.9k points)
Selected Answer

Number of sets = cache size/ size of a set
= 64 KB / (16 B * 2) (two blocks per set)
= 2 K = 211
So, we need 11 bits for set indexing.

Number of WORD bits required = 4 as a cache block consists of 16 bytes and we need 4 bits to address each of them. 

So, number of tag bits = 32 - 11 - 4 = 17

Total size of the tag = 17 * Number of cache blocks
= 17 * 211 * 2 (since each set has 2 blocks)

= 68 Kbits

answer = option D) 68 Kbits

We use the top 17 bits for tag and the next 11 bits for indexing and next 4 for offset. So, for two addresses to have the same cache index, their 11 address bits after the 4 offset bits from right must be same. 

ARR[0][0] is located at virtual address 0x FF000 000. (FF000 is page address and 000 is page offset). So, index bits are 00000000000

Address of ARR[0][4] = 0xFF000 + 4 * sizeof (double) = 0xFF000 000 + 4*8 = 0xFF000 020 (32 = 20 in hex) (index bits differ)

Address of ARR[4][0] = 0xFF000 + 4 * 1024 * sizeof (double) [since we use row major storage] = 0xFF000 000 + 4096*8 = 0xFF000 000 + 0x8000 = 0xFF008 000 ( index bits matches that of ARR [0][0] as both read 000 0000 0000)
 
Address of ARR[0][5] = 0xFF000 + 5 * sizeof (double) = 0xFF000 000+ 5*8 = 0xFF000 028 (40 = 28 in hex) (index bits differ)

Address of ARR[5][0] = 0xFF000 + 5 * 1024 * sizeof (double) [since we use row major storage] = 0xFF000 000 + 5120*8 = 0xFF000 000 + 0xA000 = 0xFF00A 000 (index bits differ)

So, only ARR[4][0] and ARR[0][0] have the same cache index.

The inner loop is iterating from 0 to 1023, so consecutive memory locations are accessed in sequence. Since cache block size is only 16 bytes and our element being double is of size 8 bytes, during a memory access only the next element gets filled in the cache. i.e.; every alternative memory access is a cache miss giving a hit ratio of 50%. (If loops i and j are reversed, all accesses will be misses and hit ratio will become 0). 
 

25 votes -- Arjun Suresh ( 294k points)
Selected Answer

[ Need @arjun sir to verify this solution ]
Every element = 8B
Total Size Required By the Array = 1024 *1024 * 8 = 2 23 B
Every Block = 16 B
Number of blocks for Array =$\frac{2 ^{23}}{2^{4} }$ = 2 19
Elements in one block = $\frac{16}{8}$= 2
Number of blocks in cache =$\frac{2 ^{16}}{2^{4} }$ = 2 12
Number of Sets = $\frac{2 ^{12}}{2^{1} }$= 211

$\Rightarrow$ 2 elements in one block




MM block 0 $\varepsilon$  0 mod 211 = set 0
MM block 1 $\varepsilon$  1 mod 211 = set 1
..
..
MM block 2048 $\varepsilon$  2048 mod 211 = set 0  //  sharing same set 

Each Row of array has 1024 elements $\Rightarrow$  512 blocks required

Row 0 :  0-511 blocks
Row1 :  511-1024
..
..
..
Row 'x' : 2048-2560

We need to find 'x'

x= $\frac{2048}{512}$= 4
$\Rightarrow$ A[4][0] shares same set that of A[0][0]

11 votes -- pC ( 21.4k points)
Number of sets = cache size/ size of a set
= 64 KB / (16 B * 2) (two blocks per set)
= 2 K = 211
So, we need 11 bits for set indexing.

Number of WORD bits required = 4 as a cache block consists of 16 bytes and we need 4 bits to address each of them.

So, number of tag bits = 32 - 11 - 4 = 17

Total size of the tag = 17 * Number of cache blocks
= 17 * 211 * 2 (since each set has 2 blocks)

= 68 KB

We use the top 17 bits for tag and the next 11 bits for indexing and next 4 for offset. So, for two addresses to have the same cache index, their 11 address bits after the 4 offset bits from right must be same.

ARR[0][0] is located at virtual address 0x FF000 000. (FF000 is page address and 000 is page offset). So, index bits are 00000000000

Address of ARR[0][4] = 0xFF000 + 4 * sizeof (double) = 0xFF000 000 + 4*8 = 0xFF000 020 (32 = 20 in hex) (index bits differ)

Address of ARR[4][0] = 0xFF000 + 4 * 1024 * sizeof (double) [since we use row major storage] = 0xFF000 000 + 4096*8 = 0xFF000 000 + 0x8000 = 0xFF008 000 ( index bits matches that of ARR [0][0] as both read 000 0000 0000)
 
Address of ARR[0][5] = 0xFF000 + 5 * sizeof (double) = 0xFF000 000+ 5*8 = 0xFF000 028 (40 = 28 in hex) (index bits differ)

Address of ARR[5][0] = 0xFF000 + 5 * 1024 * sizeof (double) [since we use row major storage] = 0xFF000 000 + 5120*8 = 0xFF000 000 + 0xA000 = 0xFF00A 000 (index bits differ)

So, only ARR[4][0] and ARR[0][0] have the same cache index.

The inner loop is iterating from 0 to 1023, so consecutive memory locations are accessed in sequence. Since cache block size is only 16 bytes and our element being double is of size 8 bytes, during a memory access only the next element gets filled in the cache. i.e.; every alternative memory access is a cache miss giving a hit ratio of 50%. (If loops i and j are reversed, all accesses will be misses and hit ratio will become 0).
12 votes -- Arjun Suresh ( 294k points)
Selected Answer
block size=16B and one element=8B.so in one block 2 element will be stored.

for 1024*1024 element num of block required=1024*1024/2  =2^19 blocks required.

in one block first element will be a miss and second one is hit(since we are transferring two unit at a time)

=>hit ratio=total hit/total reference

               =2^19/2^20

                =1/2=0.5

               =0.5*100=50%
9 votes -- asutosh kumar Biswal ( 10.2k points)
Selected Answer

Number of cache blocks = 8KB/(128*1) = 64
 

Number of sets in cache = Number of cache blocks/ 4 (4-way set)
= 64 / 4 = 16

So, number of SET bits required = 4 (as 24 = 16, and with 4 bits we can get 16 possible outputs)

We can now straight away choose (D) as answer but for confirmation can proceed further. 

Since, only physical memory information is given we can assume cache is physically tagged (which is anyway the common case even in case of virtual memory). So, we can divide the physical memory into 16 regions so that, each set maps into only its assigned region. So, size of a region a set can address = 1MB/16 = 216 Bytes = 216 /128 = 2cache blocks (as cache block size is 128 words = 128 bytes). So, when an access comes to an cache entry, it must be able to determine which out of the 29 possible physical block it is. In short it needs 9 bits for TAG. 

Now, cache block size is 128 words and so to identify a word we need 7 bits for WORD. 
 


 

12 votes -- Arjun Suresh ( 294k points)
Selected Answer
As shown in https://gateoverflow.in/3403/gate2008-it_80 we have 16 sets in cache and correspondingly 16 regions in physical memory to which each set is mapped. Now, WORD bit size is 7 as we need 7 bits to address 128 possible words in a cache block. So, the lowest 7 bits of 0C795H will be used for this giving us the remaining bits as
0000 1100 0111 1

Of these bits, the lower 4 are used for addressing the 16 possible sets, giving us the tag bits: 0000 1100 0 in (A) choice.
9 votes -- Arjun Suresh ( 294k points)
Selected Answer

16 blocks and sets with 4 blocks each means there are 4 sets. So, the lower 2 bits are used for getting a set. And 4 way associative means in a set only the last 4 cache accesses can be stored. 

0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155

Mod 4 gives

0, 3, 1, 0, 3, 0, 1, 3, 0, 1, 3, 0, 0, 0, 1, 0, 3

Now for each of 0..3, the last 4 accesses will be in cache. So, {92, 32, 48, 8}, {155, 63, 159, 3}, {73, 129, 133, 1} and {} will be in cache. So, the missing element from choice is 216. 

20 votes -- Arjun Suresh ( 294k points)
Selected Answer
Ideally the answer should be 20 ns as it is the time to transfer a block from L2 to L1 and this time only is asked in question. But there is confusion regarding access time of L2 as this means the time to read data from L2 till CPU but here we need the time till L1 only. So, I assume the following is what is meant by the question.

A block is transferred from L2 to L1. And L1 block size being 4 words (since L1 is requesting we need to consider L1 block size and not L2 block size) and data width being 4 bytes, it requires one L2 access (for read) and one L1 access (for store). So, time = 20+2 = 22 ns.
39 votes -- Arjun Suresh ( 294k points)
Selected Answer

The transfer time should be 4*200 + 20 = 820 ns. But this is not in option. So, I assume the following is what is meant by the question.

L2 block size being 16 words and data width between memory and L2 being 4 words, we require 4 memory accesses(for read) and 4 L2 accesses (for store). Now, we need to send the requested block to L1 which would require one more L2 access (for read) and one L1 access (for store). So, total time

= 4 * (200 + 20) +  (20 + 2)

= 880 + 22

= 902 ns

18 votes -- Arjun Suresh ( 294k points)
Selected Answer
Number of cache blocks = cache size / size of a block
=8 KB/32 B
=256

So, we need 8 bits for indexing the 256 blocks of the cache. And since a block is 32 bytes we need 5 WORD bits to address each byte. So, out of the remaining 19 bits (32 - 8 - 5) should be tag bits.

So, a tag entry size = 19 + 1(valid bit) + 1(modified bit) = 21 bits.
Total size of metadata = 21 * Number of cache blocks
= 21 * 256
= 5376 bits
14 votes -- Arjun Suresh ( 294k points)
Selected Answer
Total cache size = 256 KB
Cache block size = 32 Bytes
So, number of cache entries = 256 K / 32 = 8 K

Number of sets in cache = 8 K/4 = 2 K as cache is 4-way associative.

So, log(2048) = 11 bits are needed for accessing a set. Inside a set we need to identify the cache entry.

No. of memory block possible = Memory size/Cache block size

$=\frac{2^{32}}{32} = 2^{27}$.

So, no. of memory block that can go to a single cache set

$=\frac{2^{27}}{2^{11}}\\=2^{16}.$

So, we need 16 tag bits along with each cache entry to identify which of the possible $2^{16}$ blocks is being mapped there.
13 votes -- Arjun Suresh ( 294k points)
Selected Answer

Total cache size = 256 KB
Cache block size = 32 Bytes
So, number of cache entries = 256 K / 32 = 8 K

Number of sets in cache = 8 K/4 = 2 K as cache is 4-way associative. 

So, log(2048) = 11 bits are needed for accessing a set. Inside a set we need to identify the cache entry. 
Total number of distinct cache entries = 232/cache entry size = 232/32 = 227
Out of this 227, each set will be getting only 227/211 = 216 possible distinct cache entries as we use the first 11 bits to identify a set. So, we need 16 bits to identify a cache entry in a set, which is the number of bits in the tag field.

Size of cache tag directory = Size of tag entry * Number of tag entries 
= 16 +(2+1+1) bits (2 valid, 1 modified, 1 replacement as given in question)  * 8 K
= 20 * 8 = 160 Kbits

Not needed for this question, still:

Valid bit: Tells if the memory referenced by the cache entry is valid. Initially, when a process is loaded all entries are invalid. Only when a page is loaded, its entry becomes valid. 

Modified bit: When processor writes to a cache location its modified bit is made 1. This information is used when a cache entry is replaced- entry 0 means no update to main memory needed. Entry 1 means an update is needed. 

Replacement bit: This is needed for the cache replacement policy. Explained in the below link:
https://www.seas.upenn.edu/~cit595/cit595s10/handouts/LRUreplacementpolicy.pdf

8 votes -- Arjun Suresh ( 294k points)
Selected Answer

Number of sets in cache = v. The question gives a sequencing for the cache lines. For set 0, the cache lines are numbered 0, 1, .., k-1. Now for set 1, the cache lines are numbered k, k+1,... k+k-1 and so on. So, main memory block j will be mapped to set (j mod v), which will be any one of the cache lines from (j mod v) * k to (j mod v) * k + (k-1). (Associativity plays no role in mapping- k-way associativity means there are k spaces for a block and hence reduces the chances of replacement.)

26 votes -- Arjun Suresh ( 294k points)
Selected Answer

There are N accesses to cache.
Out of these n are unique block addresses.
Now, we need to find the number of misses. (min. n misses are guaranteed whatever be the access sequence due to n unique block addresses)

We are given that between two consecutive accesses to the same block, there can be only k unique block addresses. So, for a  block to get replaced we can assume that all the next k block addresses goes to the same set (given cache is set-associative) which will be the worst case scenario (they may also go to a different set but then there is lesser chance of a replacement). Now, if associativity size is >= k, and if we use LRU (Least Recently Used) replacement policy, we can guarantee that these k accesses won't throw out our previously accessed cache entry (for that we need at least k accesses). So, this means we are at the best-cache scenario for cache replacement -- out of N accesses we miss only n (which are unique and can not be helped from getting missed and there is no block replacement in cache). So, miss ratio is n/N.

PS: In question it is given "bounded above by k", which should mean k unique block accesses as k is an integer, but to ensure no replacement this must be 'k-1'. Guess, a mistake in question. 

47 votes -- Arjun Suresh ( 294k points)
Selected Answer
(A) A smaller block size means during a memory access only a smaller part of near by addresses are brought to cache- meaning spatial locality is reduced.

(B) A smaller block size means more number of blocks (assuming cache size constant)and hence we need more cache tag bits to identify the correct block. So, cache tag becomes bigger.

(C) A smaller block size implying larger cache tag is true, but this can't lower cache hit time in any way.

(D) A smaller block size incurs a lower cache miss penalty. This is because during a cache miss, an entire cache block is fetched from next lower level of memory. So, a smaller block size means only a smaller amount of data needs to be fetched and hence reduces the miss penalty (Cache block size can go til the size of data bus to the next level of memory, and beyond this only increasing the cache block size increases the cache miss penalty).
25 votes -- Arjun Suresh ( 294k points)
Selected Answer
If associativity is doubled, keeping the capacity and block size constant, then the number of sets gets halved. So, width of set index decoder can surely decrease - (B) is false.
Width of way-selection multiplexer must be increased as we have to double the ways to choose from- (C) is false

As the number of sets gets decreased, the number of possible cache block entries that a set maps to gets increased. So, we need more tag bits to identify the correct entry. So, (A) is also false.

(D) is the correct answer- main memory data bus has nothing to do with cache associativity- this can be answered without even looking at other options.
17 votes -- Arjun Suresh ( 294k points)
Selected Answer

Number of sets = cache size / sizeof a set

Size of a set = blocksize * no. of blocks in a set 
= 8 words * 4 (4-way set-associative)
= 8*4*4 (since a word is 32 bits = 4 bytes)
= 128 bytes.

So, number of sets = 16 KB / (128 B) = 128

Now, we can divide the physical address space equally between these 128 sets. So, the number of bytes each set can access
= 4 GB / 128
= 32 MB
= 32/4 = 8 M words = 1 M blocks. (220 blocks)

So, we need 20 tag bits to identify these 220 blocks. 

21 votes -- Arjun Suresh ( 294k points)
Selected Answer
The question is to find the time taken for,
"100 fetch operation and 60 operand red operations and 40 memory
operand write operations"/"total number of instructions".

Total number of instructions= 100+60+40 =200

Time taken for 100 fetch operations(fetch =read)
= 100*((0.9*1)+(0.1*5)) // 1 corresponds to time taken for read 
                        // when there is cache hit

= 140 ns //0.9 is cache hit rate

Time taken for 60 read operations = 60*((0.9*1)+(0.1*5))
                                  = 84ns

Time taken for 40 write operations = 40*((0.9*2)+(0.1*10)) 
                                   = 112 ns

// Here 2 and 10 the time taken for write when there is cache 
// hit and no cahce hit respectively

So,the total time taken for 200 operations is = 140+84+112 
                                             = 336ns

Average time taken = time taken per operation = 336/200 
                                              = 1.68 ns 
4 votes -- Divya Bharti ( 4k points)
Fetch is also a memory read operation.

Avg access time $= \frac{160 (0.9 \times 1+0.1 \times 5)+ 40(0.9 \times 2+0.1 \times 10)}{200} \\=\frac{160\times 1.4+40 \times 2.8}{200} \\=\frac{336}{200} \\=1.68$
22 votes -- aravind90 ( 589 points)
Selected Answer
Ans 14 ns = 0.8(5) + 0.2(50)
18 votes -- Vikrant Singh ( 13.4k points)
Selected Answer

Block size of 16 bytes means we need 4 offset bits. (The lowest 4 digits of memory address are offset bits)

Number of sets in cache (cache lines) = 212 so the next lower 12 bits are used for set indexing. 

The top 4 bits (out of 20) are tag bits.

So, Answer A. 

13 votes -- Arjun Suresh ( 294k points)
Selected Answer

Physical Address = 40

   Tag + Set + Block Offset = 40

   T + S + B = 40    ----------- (1)


We have , Cache Size  =  number of sets $\times$ blocks per set $\times$ Block size

               512 KB = number of sets $\times$ 8 $\times$ Block size

               Number of sets $\times$ Block size = 512/8 KB = 64 KB

               S + B = 16    ---------(2)

 from (1) & (2) 

T = 24  bits (Ans)

Second way

Cache Size = 219
MM size = 240
This means, We need to map 240 / 219=221 Blocks to one line. And a set contain 23 lines.
Therefore 224 blocks are mapped to one set.
Using Tag field, I need to identify which one block out of 224 blocks are present in this set.
Hence 24 bits are needed in Tag field.

25 votes -- Himanshu Agarwal ( 16.2k points)

In question block size has not been given,so we can assume  block size  2x Byte.

Number of Blocks:-   512 X 210  ∕  2x =219-x

Number of sets:-    219-x  ∕  8 =216-x

So number of bits for sets=16-x

Let number of bits for Tag=T

And we have already assumed block size 2x Byte,therefore number of bits for block size is x

And finally,  T + (16-x) + x = 40

                              T+16 = 40

                                   T = 24

13 votes -- ajit ( 3.4k points)
Selected Answer

Answer = $0.05$.

17 votes -- Debashish Deka ( 51.4k points)
Selected Answer

 

 

 

 

 

 

 

 

 

 

 

 

 

 

In set-associative 1 set = 16 lines. So the number of index bits will be $4$ less than the direct mapped case.

So, Tag bits increased to $14$ bits.

13 votes -- Ahwan Mishra ( 5.3k points)
Selected Answer
In two-level memory system (hierarchical), it is clear that the second level is accessed only when first level access is a miss. So, we must include the first level access time in all the memory access calculations. Continuing this way for any level, we must include that level access time (without worrying about the hit rate in that level), to all memory accesses coming to that level (i.e., by just considering the miss rate in the previous level). So, for the given question, we can get the following equation:

$\text{AMAT} = \text{L1 access time} \\+ \text{L1 miss rate} \times \text{L2 access time} \\+ \text{L1 miss rate} \times \text{L2 miss rate} \times \text{Main memory access time}$

$2 = 1 + x \times 8 + 0.5 x^2 \times 18$
$\implies 9x^2 + 8x -1 = 0$

$\implies x = \frac{-8 \pm \sqrt{64 + 36}}{18} = \frac{2}{18} = 0.111$.

So, A.
10 votes -- Arjun Suresh ( 294k points)
Selected Answer

L2 cache is shared between Instruction and Data (is it always?, see below)

So, average read time

= Fraction of Instruction Fetch * Average Instruction fetch time + Fraction of Data Fetch * Average Data Fetch Time

Average Instruction fetch Time = L1 access time + L1 miss rate * L2 access time + L1 miss rate * L2 miss rate * Memory access time

= 2 + 0.2 * 8 + 0.2 * 0.1 * 90 
= 5.4 ns

Average Data fetch Time = L1 access time + L1 miss rate * L2 access time + L1 miss rate * L2 miss rate * Memory access time

= 2 + 0.1 * 8 + 0.1 * 0.1 * 90 
= 3.7 ns

So, average memory access time

$$= 0.6 \times 5.4 + 0.4 \times 3.7 = 4.72ns$$




Now, why L2 must be shared? Because we can otherwise use it for either Instruction or Data and it is not logical to use it for only 1. Ideally this should have been mentioned in question, but this can be safely assumed also (not enough merit for Marks to All). Some more points in the question:

Assume that the caches use the referred-word-first read policy and the writeback policy

Irrelevant for solving the given question as we do not care for writes

Assume that all the caches are direct mapped caches.

Again irrelevant as average access times are given

Assume that the dirty bit is always 0 for all the blocks in the caches

Again irrelevant as dirty bits are for cache replacement- which is not asked in the given question

12 votes -- Arjun Suresh ( 294k points)
Selected Answer
No. of blocks of main Memory $= \frac{2^{32}}{2^5} = 2^{27}$

And there are $512 = 2^9$ lines in Cache Memory.

Tag bits tell us to how many blocks does $1$ line in Cache memory points to

$1$ cache line points to $ \large \frac{2^{27}}{2^9} = 2^{18}$ lines

So, $18$ bits are required as TAG bits.
11 votes -- Manish Joshi ( 25.2k points)
The main difference(s) between a CISC and a RISC processor is/are that a RISC processor typically

(a) has fewer instructions

(b) has fewer addressing modes

(c) has more registers

(d) is easier to implement using hard-wired logic

Answers: Cisc Risc Architecture

Selected Answer
All are properties of RISC processor..

http://cs.stanford.edu/people/eroberts/courses/soco/projects/risc/whatis/index.html

http://cs.stanford.edu/people/eroberts/courses/soco/projects/risc/risccisc/index.html

http://homepage3.nifty.com/alpha-1/computer/Control_E.html
5 votes -- Digvijay ( 47k points)
(iii) Many microprocessors have a specified lower limit on clock frequency (apart from the maximum clock frequency limit) because _____

The floating point unit of a processor using a design $D$ takes $2t$ cycles compared to $t$ cycles taken by the fixed point unit. There are two more design suggestions $D_1$ and $D_2$. $D_1$ uses $30\%$ more cycles for fixed point unit but $30\%$ less cycles for floating point unit as compared to design $D$. $D_2$ uses $40\%$ less cycles for fixed point unit but $10\%$ more cycles for floating point unit as compared to design $D$. For a given program which has $80\%$ fixed point operations and $20\%$ floating point operations, which of the following ordering reflects the relative performances of three designs?
($D_i > D_j$ denotes that $D_i$ is faster than $D_j$)

  1. $D_1 > D > D_2$
  2. $D_2 > D > D_1$
  3. $D > D_2 > D_1$
  4. $D > D_1 > D_2$

Answers: Clock Frequency

Selected Answer
Clock frequency becomes low means time period of clock becomes high. When this time period increases beyond the time period in which the non-volatile memory contents must be refreshed, we loose those contents. So, clock frequency can't go below this value.

Ref: http://gateoverflow.in/261/microprocessors-specified-frequency-frequency-__________ ​
3 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer
(B)

$T = 0.8 \times \text{time taken in fixed point} + 0.2 \times \text{time taken in floating point}$
$D = 0.8 \times t + 0.2 \times 2t = 1.2t$

$D_1 = 0.8\times 1.3 t + 0.2 \times 0.7 \times 2t = 1.04 t + .28t = 1.32t$

$D_2 = 0.8 \times 0.6 t + 0.2 \times 1.1 \times 2t = 0.48t + .44t = 0.92t$
 

So, $D_2$ is the best design for this given program followed by $D$ and then $D_1$. Option B.
14 votes -- Vicky Bajoria ( 4.9k points)

Consider a $2-$way set associative cache with $256$ blocks and uses $LRU$ replacement. Initially the cache is empty. Conflict misses are those misses which occur due to the contention of multiple blocks for the same cache set. Compulsory misses occur due to first time access to the block. The following sequence of access to memory blocks :

      $\big \{0,128,256,128,0,128,256,128,1,129,257,129,1,129,257,129 \big \}$

is repeated $10$ times. The number of conflict misses experienced by the cache is _________ .

Answers: Conflict Misses

Selected Answer

I am reiterating the same thing what Arjun already explained.

{0,128,256,128,0,128,256,128,1,129,257,129,1,129,257,129}

1st Iteration:

for {0,128,256,128,0,128,256,128}

Block Id Type Set0 content
0 Compulsory Miss 0
128 Compulsory Miss 0 128
256 Compulsory Miss 128 256
128 hit 256 128
0 Conflict Miss 128 0
128 hit 0 128
256 Conflict Miss 128 256
128 hit 256 128

Total number of conflict misses =2;

Similarly for  {1,129,257,129,1,129,257,129}, total number of conflict misses in set1 = 2

Total number of conflict misses in 1st iteration = 2+2=4

2nd iteration:

for {0,128,256,128,0,128,256,128}

Block Id Type Set0 content
0 Conflict Miss 128 0
128 hit 0 128
256 Conflict Miss 128 256
128 hit 256 128
0 Conflict Miss 128 0
128 hit 0 128
256 Conflict Miss 128 256
128 hit 256 128

 Total number of conflict misses = 4

Similarly for  {1,129,257,129,1,129,257,129}, total number of conflict misses in set1 = 4

Total Number of conflict misses in 2nd iteration = 4+4=8

Note that content of each set is same, before and after 2nd iteration. Therefore each of the remaining iterations will also have 8 conflict misses.

 

Therefore, overall conflict misses = 4+8*9 = 76

10 votes -- suraj ( 5.1k points)
Microprogrammed control unit:

a)is faster than a hardwired control unit

b)facilitates easy implementation of new instructions

c)is useful when very small programs are to be run

d)usually refers to control unit of a microprocessor

Answers: Control Unit

Selected Answer
(a) is wrong. Microprogrammed CU can never be faster than hardwired CU. Microprogrammed CU it has an extra layer on top of hardwired CU and hence can only be slower than hardwired CU.

(b) is a suitable answer as we can add new instruction by changing the content of control memory.

(c) is not correct as when only small programs are there, hardwired control makes more sense.

(d) control unit can also be hardwired, so this is also not correct.

Reference: http://www2.mta.ac.il/~carmi/Teaching/Architecture/SlidesOriginal/ch07%20-%20Microprogrammed%20Control.pdf
5 votes -- Arjun Suresh ( 294k points)

Data forwarding techniques can be used to speed up the operation in presence of data dependencies. Consider the following replacements of LHS with RHS.

  1.  R1→ Loc, Loc→ R2 $\quad$ ≡  R1→ R2, R1 → Loc
  2.  R1→ Loc, Loc→ R2 $\quad$ ≡ R1→ R2
  3.  R1→ Loc, R2 → Loc $\quad$ ≡ R1→ Loc
  4.  R1→ Loc, R2 → Loc $\quad$ ≡  R2→ Loc

In which of the following options, will the result of executing the RHS be the same as executing the LHS irrespective of the instructions that follow ?

  1. i and iii
  2. i and iv
  3. ii and iii
  4. ii and iv

Consider the following code sequence having five instructions from $I_1 \text{ to } I_5$. Each of these instructions has the following format. 

OP Ri, Rj, Rk

Where operation OP is performed on contents of registers Rj and Rk and the result is stored in register Ri.

$I_1$: ADD R1, R2, R3

$I_2$: MUL R7, R1, R3

$I_3$: SUB R4, R1, R5

$I_4$: ADD R3, R2, R4

$I_5$: MUL R7, R8, R9

Consider the following three statements.

S1: There is an anti-dependence between instructions $I_2 \text{ and } I_5$

S2: There is an anti-dependence between instructions $I_2 \text{ and } I_4$

S3: Within an instruction pipeline an anti-dependence always creates one or more stalls

Which one of the above statements is/are correct?

 

  1. Only S1 is true
  2. Only S2 is true
  3. Only S1 and S3 are true
  4. Only S2 and S3 are true

Answers: Data Dependences

Selected Answer
(i) is true. Both LOC and R2 are getting the value of R1 in LHS and RHS.

(ii) false, because R2 gets the correct data in both LHS and RHS, but LOC is not updated in RHS.

(iii) is wrong because R2 is writing last, not R1 in LHS, but not in RHS.

(iv) is true. The first write to Loc in LHS is useless as it is overwritten by the next write.

So, answer is B.
11 votes -- Vicky Bajoria ( 4.9k points)
Selected Answer
Answer should be B.

 Anti-dependence can be overcome in pipeline using register renaming. So, "always" in S3 makes it false. Also, if I2 is completed before I4 (execution stage of MUL), then also there won't be any stall.
18 votes -- ppm ( 627 points)
Suppose the functions $F$ and $G$ can be computed in $5$ and $3$ nanoseconds by functional units $U_{F}$ and $U_{G}$, respectively. Given two instances of $U_{F}$ and two instances of $U_{G}$, it is required to implement the computation $F(G(X_{i}))$ for $1 \leq i \leq 10$. Ignoring all other delays, the minimum time required to complete this computation is ____________ nanoseconds.

Consider the following data path of a simple non-pipelined CPU. The registers A, B, A1, A2, MDR, the bus and the ALU are 8-bit wide. SP and MAR are 16-bit registers. The MUX is of size $8 \times (2:1)$ and the DEMUX is of size $8 \times (1:2)$. Each memory operation takes 2 CPU clock cycles and uses MAR (Memory Address Register) and MDR (Memory Date Register). SP can be decremented locally.

The CPU instruction "push r" where, r = A or B has the specification

M[SP] ← r 

SP ← SP - 1

How many CPU clock cycles are required to execute the "push r" instruction?

  1. 2
  2. 3
  3. 4
  4. 5

Consider the following data path of a CPU.

     

The ALU, the bus and all the registers in the data path are of identical size. All operations including incrementation of the PC and the GPRs are to be carried out in the ALU. Two clock cycles are needed for memory read operation – the first one for loading address in the MAR and the next one for loading data from the memory bus into the MDR.

The instruction “add R0, R1” has the register transfer interpretation R0 <= R0 + R1. The minimum number of clock cycles needed for execution cycle of this instruction is:

  1. 2
  2. 3
  3. 4
  4. 5

The ALU, the bus and all the registers in the data path are of identical size. All operations including incrementation of the PC and the GPRs are to be carried out in the ALU. Two clock cycles are needed for memory read operation – the first one for loading address in the MAR and the next one for loading data from the memory bus into the MDR.

The instruction "call Rn, sub” is a two word instruction. Assuming that PC is incremented during the fetch cycle of the first word of the instruction, its register transfer interpretation is

Rn <= PC + 1;

PC <= M[PC];

The minimum number of CPU clock cycles needed during the execution cycle of this instruction is:

  1. 2
  2. 3
  3. 4
  4. 5

Answers: Data Path

Selected Answer
The same concept is used in pipelining. Bottleneck here is $U_F$ as it takes 5 ns while $U_G$ takes 3ns only. We have to do 10 such calculations and we have 2 instances of $U_F$ and $U_G$ respectively. So, $U_F$ can be done in $50/2 = 25$ nano seconds. For the start $U_F$ needs to wait for $U_G$ output for 3 ns and rest all are pipelined and hence no more wait. So, answer is

$$3 + 25 = 28 ns.$$
33 votes -- Arjun Suresh ( 294k points)

if somebody is trying to understand through a pipeline diagram,

18 votes -- Motamarri Anusha ( 11.6k points)
Selected Answer
A microinstruction can't be further broken down into two or more. It can take more than a cycle if it involves a memory access. The first instruction given here isn't a microinstruction. It is an assembly lang instruction.

It can be broken down as:

T1 , T2: MAR<--SP

T3.      : MDR<-- r , SP<-- SP-1 ( it is not mandatory to decrement it in this cycle. Anyway, it can be decremented locally)

T4, T5     : M [MAR] <-- MDR

The problem says, 8-bit MDR, 8-bit data bus, 8 bit registers.Can't you see that the given CPU is 8-bit? 8 multiplexers transfer 8 bits when selection input is 0 and 1 respectively. During cycle 1, bits in even positions are moved to MAR. During cycle 2, bits in odd positions are transferred to MAR.  We certainly need to move 16-bit SP to 16-bit MAR via a 8-bit bus. So, 2 cycles to get SP to MAR.

The given data path has a single bus, which requires  r to be carried in a separate cycle. For the contents of r to be moved to MDR during the cycles T1 or T2, address and data bus should be separate. Here, it ain't the case.

Memory read takes 2 more cycles. In total, we need 5 of them clock cycles to execute a push.

https://www.cise.ufl.edu/~mssz/CompOrg/CDA-proc.html

Computer organization pal chaudari page 334-335

Computer architecture by behrooz parahmi exercise 7.6
Thank me later.
4 votes -- Rajaneesh Polavarapu ( 241 points)
Selected Answer
instruction fetch require two cycles but question asks how many clock cycles require for execution part only !

now for execution

1 ) R1 out,Sin   S <- R0 ...... 1st cycle

2) R2 out,Tin,    T <- R1 ...... 2nd cycle

3)S out Tout Add R0 in  , R0 <- R0 + R1  ..... 3rd cycle

so 3 cycles for execution

as it is asked for only execution cycles no of cycles=3

If it has been asked for instruction cycles then ans will be 5

hence option B is correct.
12 votes -- Pooja Palod ( 32.4k points)
Selected Answer

$MAR$ $\leftarrow $ $PC$ ..... $1$ cycle
$S$ $\leftarrow $ $PC$ (Since these two actions are independent they can be done in same cycle)
$MDR$ $\leftarrow $ $M[MAR]$ ..... $2$nd cycle (System BUS)
$RN$ $\leftarrow $ $S +1$     (ALU Is free and the two actions are independent.) (Internal BUS)
$PC$ $\leftarrow $ $MDR$ ----- $3$rd cycle

Therefore $3$ cycles needed.

A rough sketch:

18 votes -- Riya Roy(Arayana) ( 7.1k points)

The size of the data count register of a $\text{DMA}$ controller is $16 \text{bits}$. The processor needs to transfer a file of 29,154 kilobytes from disk to main memory. The memory is byte addressable. The minimum number of times the $\text{DMA}$ controller needs to get the control of the system bus from the processor to transfer the file from the disk to main memory is _________-.

On a non-pipelined sequential processor, a program segment, which is the part of the interrupt service routine, is given to transfer 500 bytes from an I/O device to memory.

        Initialize the address register
        Initialize the count to 500
LOOP:   Load a byte from device              
        Store in memory at address given by address register
        Increment the address register
        Decrement the count
        If count !=0 go to LOOP

Assume that each statement in this program is equivalent to a machine instruction which takes one clock cycle to execute if it is a non-load/store instruction. The load-store instructions take two clock cycles to execute.

The designer of the system also has an alternate approach of using the DMA controller to implement the same transfer. The DMA controller requires 20 clock cycles for initialization and other overheads. Each DMA transfer cycle takes two clock cycles to transfer one byte of data from the device to the memory.

What is the approximate speed up when the DMA controller based design is used in a place of the interrupt driven program based input-output?

(A) 3.4

(B) 4.4

(C) 5.1

(D) 6.7

Answers: Dma

Selected Answer

Data count register gives the number of words the DMA can transfer in a single cycle..

Here it is 16 bits.. so max 216 words can be transferred in one cycle..

Since memory is byte addressable.. 1 word = 1 byte
                                                so 216 bytes in 1 cycle..
Now for the given file..
                               File size = 29154 KB= 29154 * 210 B
                                                  1 cylce $\rightarrow$ DMA transfers 216 B
i.e 
                                             1 B transfered by DMA $\rightarrow$ 1/216 cycles.

Now for full file of size 29154 KB, minimum number of cylces = (29154 * 210 B) / 216 = 455.53
But number of cylces is asked so 455.53$\rightarrow$ 456..

17 votes -- Abhilash Panicker ( 8.8k points)
Selected Answer

                        STATEMENT                                           CLOCK CYCLE(S) NEEDED
              Initialize the address register                                        1
              Initialize the count to 500                                            1
        LOOP: Load a byte from device                                                2
              Store in memory at address given by address register                   2
              Increment the address register                                         1
              Decrement the count                                                    1
              If count != 0 go to LOOP                                               1

        Interrrupt driven transfer time = 1+1+500×(2+2+1+1+1) = 3502
        DMA based transfer time = 20+500*2 = 1020
        Speedup = 3502/1020 = 3.4
31 votes -- Manu Thakur ( 6k points)

Using an expanding opcode encoding for instructions, is it possible to encode all of the following in an instruction format shown in the below figure. Justify your answer.

14 double address instructions
127 single address instructions
60  no address (zero address) instructions

Answers: Expanding Opcode

Selected Answer

4 bits are for the opcode so number of 2 address instructions will be 2=16 ---so 14 double instructions are possible

But out of 16 only 14 are used so 2 are still left which can be used for 1 address instruction

For 1 address instruction we can use not only the 2 left over but also the 6 bits of operand1(to make it one address)--- so 6 bits that is 64 so they are laready 2 which each contains 64 so total 128 single address instructions -----so 127 single instructions are possible 

But out of 128 ,127 are used so 1 left which can be used for zero address instruction to make number of zero address we can use the operand2 address 6 bits so total possible are 64 so total 1*64 = 64 zero address instructions are possible

So all encoding are possible

 

1 votes -- Pavan Kumar Munnam ( 10k points)

Which of the following statements is true?

  1. ROM is a Read/Write memory

  2. PC points to the last instruction that was executed

  3. Stack works on the principle of LIFO

  4. All instructions affect the flags

 

Which of the following is not a form of memory

  1. instruction cache
  2. instruction register
  3. instruction opcode
  4. translation look-a-side buffer

Consider a new instruction named branch-on-bit-set (mnemonic bbs). The  instruction “bbs reg, pos, label” jumps to label if bit in position pos of register  operand reg is one. A register is 32 bits wide and the bits are numbered 0 to 31,  bit in position 0 being the least significant. Consider the following emulation of  this instruction on a processor that does not have bbs implemented.

$temp\leftarrow reg \& mask$

Branch to label if temp is non-zero. The variable temp is a temporary register. For correct emulation, the variable  mask must be generated by

  1. $ mask\leftarrow \text{0x1} << pos$
  2. $ mask\leftarrow \text{0xffffffff} << pos$
  3. $ mask\leftarrow pos $
  4. $ mask\leftarrow \text{0xf}$

Consider a RISC machine where each instruction is exactly 4 bytes long. Conditional and unconditional branch instructions use PC-relative addressing mode with Offset specified in bytes to the target location of the branch instruction. Further the Offset is always with respect to the address of the next instruction in the program sequence. Consider the following instruction sequence

Instr. No.

Instruction

i       : add  R2, R3, R4
i + 1 : sub  R5, R6, R7
i + 2 : cmp R1, R9, R10
i + 3 : beq  R1, Offset

If the target of the branch instruction is i, then the decimal value of the Offset is ____________ .

Answers: Instruction Execution

Selected Answer
It is C.

Only the top of the stack can be accessed at any time. You can imagine a stack to be opened from only one side data structure. So that if we put one thing over the other, we are able to access the last thing we inserted first. That is Last in First Out (LIFO).

ROM is Read Only Memory.

PC points to the next instruction to be executed.

Not all instructions affect the flags.
8 votes -- Gate Keeda ( 19.1k points)
Selected Answer
The instruction opcode is a part of the instruction which tells the processor what operation is to be performed so it is not a form of memory while the others are
10 votes -- Bhagirathi Nayak ( 13.3k points)
Selected Answer
a. $mask\leftarrow \text{0x1} << pos$

We want to check for a particular bit position say 2 (third from right). Let the number be 0xA2A7 (last 4 bits being 0111). Here, the bit at position 2 from right is 1. So, we have to AND this with 0x0004 as any other flag would give wrong value (may count other bits or discard the bit at position "pos"). And 0x0004 is obtained by 0x1 << 2 (by shifting 1 "pos" times to the left we get a flag with 1 being set only for the "pos" bit position).
10 votes -- Arjun Suresh ( 294k points)
Selected Answer
Ans:(-16)

assume addresses start with 2000 for first instruction.

2000---add R2,R3,R4

2004--sub r5,r6,r7

2008---r1,r9,r10

2012--beq r1,offset  //pc after instruction fetch of this instruction will be 2016,and branch target is 2000 ,offset will be (2016-16
)=2000

2016------next instruction
6 votes -- jatin saini ( 2.1k points)
Consider a processor with $64$ registers and an instruction set of size twelve. Each instruction has five distinct fields, namely, opcode, two source register identifiers, one destination register identifier, and twelve-bit immediate value. Each instruction must be stored in memory in a byte-aligned fashion. If a program has 100 instructions, the amount of memory (in bytes) consumed by the program text is _________.

In an 11-bit computer instruction format, the size of address field is 4-bits. The computer uses expanding OP code technique and has 5 two-address instructions and 32 one-address instructions. The number of zero-address instructions it can support is ________

 

State True or False with one line explanation

Expanding opcode instruction formats are commonly employed in RISC. (Reduced Instruction Set Computers) machines.
A machine has a 32-bit architecture, with 1-word long instructions. It has 64 registers, each of which is 32 bits long. It needs to support 45 instructions, which have an immediate operand in addition to two register operands. Assuming that the immediate operand is an unsigned integer, the maximum value of the immediate operand is ____________

Answers: Instruction Format

Selected Answer

Answer => 500 bytes

Number of register = 64

Number of bits to address register = $\left \lceil log_{2}64 \right \rceil$ = 6 bits

Number of Instruction = 12

Opcode size =  $\left \lceil log_{2}12 \right \rceil$ = 4

Opcode(4) reg1(6) reg2(6) reg2(6) Immediate(12)

Total bits per instruction = 34

Total bytes per instruction= 4.25

Due to byte alignment we cannot store 4.25 bytes, without wasting 0.75 bytes ,

So Total bytes per instruction = 5

Total instruction = 100

Total size = Number of instruction $\times$ Size of instruction

               100$\times$ 5= 500 Byes

23 votes -- Akash ( 43.8k points)
Selected Answer
No. of possible instruction encoding = $2^{11} = 2048$

No. of encoding taken by two-address instructions = $5 \times 2^4 \times 2^4 = 1280$

No. of encoding taken by one-address instructions = $32 \times 2^4 = 512$

So, no. of possible zero-address instructions = $2048 - (1280 + 512) = 256$
21 votes -- Arjun Suresh ( 294k points)
Selected Answer

I think the answer is TRUE.

RISC systems use fixed length instruction to simplify pipeline.

eg: MIPS, PowerPC: Instructions are 4 bytes long.

 

CISC systems use Variable-length instructions.

eg: Intel 80x86: Instructions vary from 1 to 17 bytes long.

 

Now the challenge is: How to fit multiple sets of instruction types into same (limited) number of bits (Fixed size instruction)?

Here comes Expanding opcode into the picture. 

RISC systems commonly uses Expanding opcode technique to have fixed size instructions. 

6 votes -- Sachin Mittal ( 7.1k points)
Selected Answer
64 registers means 6 bits $(\lceil \log_2 64 \rceil = 6)$ for a register operand. So, 2 register operand requires 12 bits. Now, 45 instructions require another 6 bits for opcode $(\lceil \log_2 45 \rceil = 6)$. So, totally 18 bits. So, we have 32 - 18 = 14 bits left for the immediate operand. So, the max value will be $2^{14} - 1 = 16383$ (as the operand is unsigned we do not need a sign bit and with 14 bits we can represent from 0 to $2^{14} -1$)
30 votes -- Arjun Suresh ( 294k points)
Many of the advanced microprocessors prefetch instructions and store it in an instruction buffer to speed up processing. This speed up is achieved because ________

Answers: Instruction Prefetch

Selected Answer
Because CPU is faster than memory. Fetching instructions from memory would require considerable amount of time while CPU is much faster. So, prefetching the instructions to be executed can save considerable amount of waiting time.
2 votes -- Arjun Suresh ( 294k points)

On receiving an interrupt from a I/O device the CPU:

  1. Halts for a predetermined time.
  2. Hands over control of address bus and data bus to the interrupting device.
  3. Branches off to the interrupt service routine immediately.
  4. Branches off to the interrupt service routine after completion of the current instruction.

In a vectored interrupt

  1. The branch address is assigned to a fixed location in memory

  2. The interrupting source supplies the branch information to the processor through an interrupt vector

  3. The branch address is obtained from a register in the processor

  4. None of the above

 

Which of the following is true?

  1. Unless enabled, a CPU will not be able to process interrupts.

  2. Loop instructions cannot be interrupted till they complete.

  3. A processor checks for interrupts before executing a new instruction.

  4. Only level triggered interrupts are possible on microprocessors.

 

A device employing INTR line for device interrupt puts the CALL instruction on the data bus while

  1. $\overline{INTA}$ is active
  2. HOLD is active
  3. READY is inactive
  4. None of the above

A device with data transfer rate 10 KB/sec is connected to a CPU. Data is transferred byte-wise. Let the interrupt overhead be 4$\mu$sec. The byte transfer time between the device interface register and CPU or memory is negligible. What is the minimum performance gain of operating the device under interrupt mode over operating it under program-controlled mode?

  1. 15
  2. 25
  3. 35
  4. 45

A CPU generally handles an interrupt by executing an interrupt service routine

  1. As soon as an interrupt is raised.

  2. By checking the interrupt register at the end of fetch cycle.

  3. By checking the interrupt register after finishing the execution of the current instruction.

  4. By checking the interrupt register at fixed time intervals.

 

Answers: Interrupts

Selected Answer

Answer should be D i.e branches off to ISR after completing current instruction.

CPU checks the status bit of interrupt at the completion of each current instruction running when there is a interrupt it service the interrupt using ISR.

http://gateoverflow.in/18581/isro2009-78

1 votes -- Gate Mission ( 3.7k points)
Selected Answer
Answer: B

A vectored interrupt is a processing technique in which the interrupting device directs the processor to the appropriate interrupt service routine. This is in contrast to a polled interrupt system, in which a single interrupt service routine must determine the source of the interrupt by checking all potential interrupt sources, a slow and relatively laborious process.
5 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer
Ans is A.

Options B and D is obviously false.

A processor checks for the interrupt before FETCHING an instruction, So Option C is also false.
7 votes -- Hardi Shah ( 329 points)
Selected Answer
INTR is a signal which if enabled then microprocessor has interrupt enabled it receives high INR signal & activates INTA signal, so another request can’t be accepted till CPU is busy in servicing interrupt. Hence (A) is correct option.
6 votes -- Tejas Jaiswal ( 661 points)
Selected Answer
In Programmed I/O, the CPU issues a command and waits for I/O operations to complete.

So here, CPU will wait for 1sec to transfer 10KB of data.

overhead in programmed I/O = 1 sec

In Interrupt mode , data is transferred word by word (here word size is 1 byte as mentioned in question "Data is transferred byte-wise").
So to transfer 1 byte of data overhead is $4 \times 10^{-6}$ sec
Thus to transfer 10 KB of data overhead is= $4 \times 10^{-6} \times 10^{4}$  sec

Performance gain  $= \frac{1}{4 \times 10^{-6} \times 10^4} = \frac{1}{4 x 10^{-2}} = 25$

Thus, (b) is correct answer.
25 votes -- ashwini anand ( 319 points)
Selected Answer
It will be C.

After finishing the execution of each instruction the CPU reads the interrupt pins to recognize the interrupts.

INTR = 1 = Interrupt is present.(Service the Interrupt)

        = 0 = Interrupt is not present.(Goto next Instruction fetch from user program)
7 votes -- Gate Keeda ( 19.1k points)

For the daisy chain scheme of connecting I/O devices, which of the following statements is true?

  1. It gives non-uniform priority to various devices
  2. It gives uniform priority to all devices
  3. It is only useful for connecting slow devices to a processor device
  4. It requires a separate interrupt pin on the processor for each device

 

A hard disk is connected to a 50 MHz processor through a DMA controller.  Assume that the initial set-up of a DMA transfer takes 1000 clock cycles for the processor, and assume that the handling of the interrupt at DMA completion requires 500 clock cycles for the processor.  The hard disk has a transfer rate of 2000 Kbytes/sec and average block transferred is 4 K bytes.  What fraction of the processor time is consumed by the disk, if the disk is actively transferring 100% of the time?

The correct matching for the following pairs is:

(A) DMA I/O (1) High speed RAM
(B) Cache (2) Disk
(C) Interrupt I/O (3) Printer
(D) Condition Code Register (4) ALU
  1. A-4 B-3 C-1 D-2

  2. A-2 B-1 C-3 D-4

  3. A-4 B-3 C-2 D-1

  4. A-2 B-3 C-4 D-1

Which of the following statements about synchronous and asynchronous I/O is NOT true?

  1. An ISR is invoked on completion of I/O in synchronous I/O but not in asynchronous I/O

  2. In both synchronous and asynchronous I/O, an ISR (Interrupt Service Routine) is invoked after completion of the I/O

  3. A process making a synchronous I/O call waits until I/O is complete, but a process making an asynchronous I/O call does not wait for completion of the I/O

  4. In the case of synchronous I/O, the process waiting for the completion of I/O is woken up by the ISR that is invoked after the completion of I/O

Answers: Io Handling

Selected Answer

daisy chaining approach tell the processor i which order the interrupt should be handled by providing priority to the devices . 
In daisy chaining method all the devices are connected in serial. The device with the highest priority is placed in the first position, followed by lower priority devices . interrupt pin is common to all

so answer is a

17 votes -- No Need ( 14.1k points)
Selected Answer

30 us for initialisation and termination and 2 ms for data transfer

Cpu time  is consumed only for initialisation and termination

% of cpu time consumed =30us/(30us+2ms)*100=1.5%

10 votes -- Pooja Palod ( 32.4k points)
Selected Answer
Answer: B
7 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer
Answer is (B).

In synchronous I/O process performing I/O operation will be placed in blocked state till the I/O operation is completed. An ISR will be invoked after the completion of I/O operation and it will place process from block state to ready state.

In asynchronous I/O, Handler function will be registered while performing the I/O operation. The process will not be placed in the block state and process continues to execute the remaining instructions. when the I/O operation completed signal mechanism is used to notify the process that data is available.
24 votes -- gate_asp ( 755 points)
A processor has $40$ distinct instruction and $24$ general purpose registers. A $32$-bit instruction word has an opcode, two registers operands and an immediate operand. The number of bits available for the immediate operand field is_______.

 

  1. Assume that a CPU has only two registers $R_1$ and $R_2$ and that only the following instruction is available $XOR \: R_i, R_j;\{R_j \leftarrow R_i \oplus R_j, \text{ for } i, j =1, 2\}$

    Using this XOR instruction, find an instruction sequence in order to exchange the contents of the registers $R_1$ and $R_2$

  2. The line p of the circuit shown in figure has stuck at 1 fault. Determine an input test to detect the fault.

Consider the following program fragment in the assembly language of a certain hypothetical processor. The processor has three general purpose registers R1, R2 and R3. The meanings of the instructions are shown by comments (starting with ;) after the instructions.

X:  CMP R1, 0;  Compare R1 and 0, set flags appropriately in status register
    JZ  Z;   Jump if zero to target Z
    MOV R2, R1; Copy contents of R1 to R2
    SHR R1; Shift right R1 by 1 bit
    SHL R1; Shift left R1 by 1 bit
    CMP R2, R1; Compare R2 and R1 and set flag in status register
    JZ  Y;    Jump if zero to target Y
    INC R3; Increment R3 by 1;
Y:  SHR R1; Shift right R1 by 1 bit
    JMP X;  Jump to target X
Z:...

  1. Initially R1, R2 and R3 contain the values 5, 0 and 0 respectively, what are the final values of R1 and R3 when control reaches Z?

  2. In general, if R1, R2 and R3 initially contain the values n, 0, and 0 respectively. What is the final value of R3 when control reaches Z?

 

Consider the following assembly language program for a hypothetical processor A, B, and C are 8 bit registers. The meanings of various instructions are shown as comments.

  MOV B, #0 ; $B \leftarrow 0$
  MOV C, #8 ; $C \leftarrow 8$
Z: CMP C, #0 ; compare C with 0
  JZ X ; jump to X if zero flag is set
  SUB C, #1 ; $C \gets C-1$
  RRC A, #1 ; right rotate A through carry by one bit. Thus:
    ; If the initial values of A and the carry flag are $a_7..a_0$ and 
    ; $c_0$ respectively, their values after the execution of this
    ; instruction will be $c_0a_7..a_1$ and $a_0$ respectively.
  JC Y ; jump to Y if carry flag is set
  JMP Z ; jump to Z
Y: ADD B, #1 ; $B \gets B+1$
  JMP Z ; jump to Z
X:      

 

If the initial value of register A is A0 the value of register B after the program execution will be

  1. the number of 0 bits in $A_0$
  2. the number of 1 bits in $A_0$
  3. $A_0$
  4. 8

Consider the following assembly language program for a hypothetical processor A, B, and C are 8 bit registers. The meanings of various instructions are shown as comments.

 

MOV B, #0

;

$B \leftarrow 0$

 

MOV C, #8

;

$C \leftarrow 8$

Z:

CMP C, #0

;

compare C with 0

 

JZ X

;

jump to X if zero flag is set

 

SUB C, #1

;

$C \gets C-1$

 

RRC A, #1

;

right rotate A through carry by one bit. Thus:

   

;

If the initial values of A and the carry flag are $a_7..a_0$ and 

   

;

$c_0$ respectively, their values after the execution of this

   

;

instruction will be $c_0a_7..a_1$ and $a_0$ respectively.

 

JC Y

;

jump to Y if carry flag is set

 

JMP Z

;

jump to Z

Y:

ADD B, #1

;

$B \gets B+1$

 

JMP Z

;

jump to Z

X:

     

 

 Which of the following instructions when inserted at location X will ensure that the value of the register A after program execution is as same as its initial value?

  1. RRC A, #1
  2. NOP  ; no operation
  3. LRC A, #1;   left rotate A through carry flag by one bit
  4. ADD A, #1

Consider the following program segment for a hypothetical CPU having three user registers R1, R2 and R3.

Instruction Operation Instruction Size (in words)
MOV R1, 5000 R1 $\leftarrow$ Memory[5000] 2
MOV R2(R1) R2 $\leftarrow$ Memory[(R1)] 1
ADD R2, R3 R2 $\leftarrow$ R2 + R3 1
MOV 6000, R2 Memory[6000] $\leftarrow$ R2 2
HALT Machine halts 1

Consider that the memory is byte addressable with size 32 bits, and the program has been loaded starting from memory location 1000 (decimal). If an interrupt occurs while the CPU has been halted after executing the HALT instruction, the return address (in decimal) saved in the stack will be

 

  1. 1007
  2. 1020
  3. 1024
  4. 1028

Consider the following program segment for a hypothetical CPU having three user registers R1, R2 and R3.

Instruction

Operation

Instruction Size (in words)

MOV R1, 5000

R1 $\leftarrow$ Memory[5000]

2

MOV R2(R1)

R2 $\leftarrow$ Memory[(R1)]

1

ADD R2, R3

R2 $\leftarrow$ R2 + R3

1

MOV 6000, R2

Memory[6000] $\leftarrow$ R2

2

HALT

Machine halts

1

Let the clock cycles required for various operations be as follows:

Register to/from memory transfer

:

3 clock cycles

ADD with both operands in register

:

1 clock cycle

Instruction fetch and decode

:

2 clock cycles per word

The total number of clock cycles required to execute the program is

  1. 29
  2. 24
  3. 23
  4. 20

 

If we use internal data forwarding to speed up the performance of a CPU (R1, R2 and R3 are registers and M[100] is a memory reference), then the sequence of operations

R1 → M[100]
M[100] → R2
M[100] → R3

can be replaced by

  1. R1 → R3
    R2 → M[100]
  2. M[100] → R2
    R1 → R2
    R1 → R3
  3. R1 → M[100]
    R2 → R3
  4. R1 → R2
    R1 → R3
    R1 → M[100]

A CPU has 24-bit instructions. A program starts at address 300 (in decimal). Which one of the following is a legal program counter (all values in decimal)?

  1. 400
  2. 500
  3. 600
  4. 700

In a simplified computer the instructions are:

OP $R_j$, $R_i$ -Performs $R_j$ OP $R_i$ and stores the result in register $R_j$.
OP $m,R_i$  -Performs $val$ OP $R_i$ and stores the result in register $R_i$. $val$
denotes the content of the memory location $m$.
MOV $m, R_i$ -Moves the content of memory location $m$ to register $R_i$
MOV $R_i, m$ -Moves the content of register $R_i$ to memory location $m$

The computer has only two registers, and OP is either ADD or SUB. Consider the following basic block:

$t_1\: = \: a+b$

$t_2\: = \: c+d$

$t_3\: = \: e-t_2$

$t_4\: = \: t_1 – t_3$

Assume that all operands are initially in memory. The final value of the computation should be in memory. What is the minimum number of MOV instructions in the code generated for this basic block?

  1. 2
  2. 3
  3. 5
  4. 6

Consider the following program segment. Here R1, R2 and R3 are the general purpose registers.

  Instruction Operation Instruction size (no. of words)
  MOV R1, (3000) R1 $\leftarrow$ m[3000] 2
LOOP: MOV R2, (R3) R2 $\leftarrow$ M[R3] 1
  ADD R2, R1 R2 $\leftarrow$ R1 + R2 1
  MOV (R3), R2 M[R3] $\leftarrow$ R2 1
  INC R3 R3 $\leftarrow$ R3 +1 1
  DEC R1 R1 $\leftarrow$ R1 - 1 1
  BNZ LOOP Branch on not zero 2
  HALT Stop 1

Assume that the content of memory location 3000 is 10 and the content of the register R3 is 2000. The content of each of the memory locations from 2000 to 2010 is 100. The program is loaded from the memory location 1000. All the numbers are in decimal.

Assume that the memory is word addressable. The number of memory references for accessing the data in executing the program completely is

  1. 10
  2. 11
  3. 20
  4. 21

Consider the following program segment. Here R1, R2 and R3 are the general purpose registers.

 

Instruction

Operation

Instruction size (no. of words)

 

MOV R1, (3000)

R1 $\leftarrow$ m[3000]

2

LOOP:

MOV R2, (R3)

R2 $\leftarrow$ M[R3]

1

 

ADD R2, R1

R2 $\leftarrow$ R1 + R2

1

 

MOV (R3), R2

M[R3] $\leftarrow$ R2

1

 

INC R3

R3 $\leftarrow$ R3 +1

1

 

DEC R1

R1 $\leftarrow$ R1 - 1

1

 

BNZ LOOP

Branch on not zero

2

 

HALT

Stop

1

Assume that the content of memory location 3000 is 10 and the content of the register R3 is 2000. The content of each of the memory locations from 2000 to 2010 is 100. The program is loaded from the memory location 1000. All the numbers are in decimal.

Assume that the memory is word addressable. After the execution of this program, the content of memory location 2010 is:

  1. 100
  2. 101
  3. 102
  4. 110

Consider the following program segment. Here R1, R2 and R3 are the general purpose registers.

 

Instruction

Operation

Instruction size (no. of words)

 

MOV R1, (3000)

R1 $\leftarrow$ m[3000]

2

LOOP:

MOV R2, (R3)

R2 $\leftarrow$ M[R3]

1

 

ADD R2, R1

R2 $\leftarrow$ R1 + R2

1

 

MOV (R3), R2

M[R3] $\leftarrow$ R2

1

 

INC R3

R3 $\leftarrow$ R3 +1

1

 

DEC R1

R1 $\leftarrow$ R1 - 1

1

 

BNZ LOOP

Branch on not zero

2

 

HALT

Stop

1

Assume that the content of memory location 3000 is 10 and the content of the register R3 is 2000. The content of each of the memory locations from 2000 to 2010 is 100. The program is loaded from the memory location 1000. All the numbers are in decimal.

 Assume that the memory is byte addressable and the word size is 32 bits. If an interrupt occurs during the execution  of the instruction “INC R3”, what return address will be pushed on to the stack?

  1. 1005
  2. 1020
  3. 1024
  4. 1040

Following table indicates the latencies of operations between the instruction producing the result and instruction using the result.

Instruction producing the result Instruction using the result Latency
ALU Operation ALU operation 2
ALU Operation Store 2
Load ALU Operation 1
Load Store 0

 

Consider the following code segment.

 Load R1, Loc 1;	 Load R1 from memory location Loc1
 Load R2, Loc 2;	 Load R2 from memory location Loc 2
 Add R1, R2, R1;	 Add R1 and R2 and save result in R1
 Dec R2;	 Decrement R2
 Dec R1;	 Decrement R1
 Mpy R1, R2, R3;	 Multiply R1 and R2 and save result in R3
 Store R3, Loc 3;	 Store R3 in memory location Loc 3


What is the number of cycles needed to execute the above code segment assuming each instruction takes one cycle to execute?

  1. 7
  2. 10
  3. 13
  4. 14

Which of the following must be true for the RFE (Return From Exception) instruction on a general purpose processor?

  1. It must be a trap instruction

  2. It must be a privileged instruction

  3. An exception cannot be allowed to occur during execution of an RFE instruction

  1. I only
  2. II only
  3. I and II only
  4. I, II and III only

Assume that EA = (X)+ is the effective address equal to the contents of location X, with X incremented by one word length after the effective address is calculated; EA = −(X) is the effective address equal to the contents of location X, with X decremented by one word length before the effective address is calculated; EA = (X)− is the effective address equal to the contents of location X, with X decremented by one word length after the effective address is calculated. The format of the instruction is (opcode, source, destination), which means (destination ← source op destination). Using X as a stack pointer, which of the following instructions can pop the top two elements from the stack, perform the addition operation and push the result back to the stack.

  1. ADD (X)−, (X)
  2. ADD (X), (X)−
  3. ADD −(X), (X)+
  4. ADD −(X), (X)

Consider a processor with byte-addressable memory. Assume that all registers, including program counter (PC) and Program Status Word (PSW), are size of two bytes. A stack in the main memory is implemented from memory location $(0100)_{16}$ and it grows upward. The stack pointer (SP) points to the top element of the stack. The current value of SP is $(016E)_{16}$. The CALL instruction is of two words, the first word is the op-code and the second word is the starting address of the subroutine (one word = 2 bytes). The CALL instruction is implemented as follows:

  • Store the current value of PC in the stack
  • Store the value of PSW register in the stack
  • Load the statring address of the subroutine in PC

The content of PC just before the fetch of a CALL instruction is $(5FA0)_{16}$. After execution of the CALL instruction, the value of the stack pointer is

 

  1. $(016A)_{16}$
  2. $(016C)_{16}$
  3. $(0170)_{16}$
  4. $(0172)_{16}$

Assume a machine has 4 registers (one of which is the accumulator $A$) and the following instruction set.

  • $LOAD$ and $STORE$ are indirect memory operations that load and store, using the address stored in the given register operand. Thus, $LOAD \: R$ loads the contents of $memory[R]$ into $A$, and $STORE \: R$ stores the contents of $A$ in $memory[R]$.
  • $MOV$ copies any register into any other register.
  • $ADD$ and $SUB$ operate on the accumulator and one other register, such that $A = A \: op \: R$.
  • $LDC$ stores a given 7-bit constant in the accumulator.
  • $BRA$, $BZ$, and $BNE$ are branch instructions, each taking a 5-bit offset.

Design an instruction encoding scheme that allows each of the above instructions (along with operands) to be encoded in 8 bits.

Answers: Machine Instructions

Selected Answer

Instruction Opcode Size => log2 40 => 6

Register operand size = log224 =>5

Total bits available => 32

Bits required for opcode + two register operands => 6 + 2 * 5 => 16

Bits available for immediate operand => 32 - 16 = 16 !

16 votes -- Akash ( 43.8k points)
Selected Answer
(a)
R2 $\impliedby$ R1 $\oplus$ R2
R1 $\impliedby$ R2 $\oplus$ R1
R2 $\impliedby$ R1 $\oplus$ R2

(b) A=1, B=1, C=1 should give output as 1 but as p is struck at 1 fault the output comes out to be 0.
6 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer
SHR R1 (Lower bit is lost and upper bit becomes 0 and all other bits shift right by 1)
SHL R1 (Upper bit is lost and lower bit becomes 0 and all other bits shift left by 1)

These two operations change the value of R1 if its lower bit is 1. So, the given program checks the lowest bit of R1 in each iteration and if its 1 then only increment R3 and loop terminates when R1 becomes 0. Thus at end, R3 will have the count of the number of bits set to 1 in R1.

a. R1 = 0, R3 = 2 as 101 has two 1's

b. R3 = #1 in R1.
7 votes -- Arjun Suresh ( 294k points)
Selected Answer
B. The code is counting the number of 1 bits in A0. When a 1 is moved to carry, B is incremented.
10 votes -- Arjun Suresh ( 294k points)
Selected Answer

49. A. RRC a, #1.  As the 8 bit register is rotated via carry 8 times.

a7a6a5a4a3a2a1a0
c0a7a6a5a4a3a2a1, now a0 is the new carry. So, after next rotation,
a0c0a7a6a5a4a3a2

So, after 8 rotations,

a6a5a4a3a2a1a0c0 and carry is a7.

Now, one more rotation will restore the original value of A0.

10 votes -- Arjun Suresh ( 294k points)
Selected Answer

option D

 Word size is 32 bits (4 bytes). Interrupt occurs after execution of HALT instruction NOT during,  So address of next instruction will be saved on to the stack which is 1028.  

(We have 5 instructions starting from address 1000, each of size 2, 1, 1, 2, 1 totaling 7 words = 7 *4 =28 bytes). 

1000+ 28 = 1028 ,

1028 is the starting address of NEXT Instruction .

After HALT instruction CPU enters a HALT state and if an interrupt happens the return address will be that of the instruction after the HALT. 

 References :

  1. http://x86.renejeschke.de/html/file_module_x86_id_134.html  [ X86 Instructors Manual ]
  2. http://electronics.stackexchange.com/questions/277735/what-happens-if-the-interrupt-occurs-during-the-execution-of-halt-instruction
23 votes -- Vikrant Singh ( 13.4k points)
Selected Answer

64. B.  24 cycles

Instruction Size Fetch and Decode + Execute
mov 2 2*2 + 3 = 7
mov 1 2*1 + 3 = 5
add 1 2*1 + 1 = 3
mov 2 2*2 + 3 = 7
halt 1 2*1 + 0 = 2
  Total 24 cycles

 

21 votes -- Vikrant Singh ( 13.4k points)
Selected Answer
Data forwarding means if CPU writes to a memory location and subsequently reads from the same memory location, the second instruction can fetch the value directly from the register used to do the write than waiting for the memory. So, this increases the performance.

Here, choices A, B and C doesn't really make any sense as the data was in R1 and it must be moved to R2, R3 and M[100]. So, (D) is the answer.
13 votes -- Arjun Suresh ( 294k points)
Selected Answer
Option c. 24 bit = 3 bytes instructions. So PC will have multiples of 3 in it.
14 votes -- anshu ( 3.2k points)
Selected Answer

MOV                 a, R1

 ADD               b, R1

 MOV               c, R2

 ADD               d, R2

 SUB                e, R2

 SUB                R1, R2

 MOV               R2, m

 Total no. of MOV Instruction = 3

 

21 votes -- Gate Keeda ( 19.1k points)
Selected Answer

Loop is executed 10 times and there are two memory reference in the loop (each MOV is loading 1 word, so 1 memory reference for each MOV inside the loop). So number of memory reference inside loop is 2 (MOV) * 10 ( times iteration)  * 1 ( 1 word access/ MOV) = 20 memory accesses.

One memory access is outside the loop for the first instruction

MOV R1, (3000)

So, totally $20+1 = 21$

14 votes -- Vicky Bajoria ( 4.9k points)
Selected Answer
The loop is executed 10 times and it modifies the contents from memory location 2000-2009. Memory location 2010 is untouched - contains 100 as before.
14 votes -- Vicky Bajoria ( 4.9k points)
The loop runs 10 times.

When R1=10 , Memory[2000] = 110,

When R1=9 , Memory[2001] = 109,

When R1=8 , Memory[2002] = 108,

When R1=7 , Memory[2003] = 107,

When R1=6 , Memory[2004] = 106,

When R1=5 , Memory[2005] = 105,

When R1=4 , Memory[2006] = 104,

When R1=3 , Memory[2007] = 103,

When R1=2 , Memory[2008] = 102,

When R1=1 , Memory[2009] = 101,

When R1=0 break the loop, Memory[2010]= 100
10 votes -- Arnab Bhadra ( 5.3k points)
Selected Answer

An interrupt is checked for after the execution of the current instruction and the contents of PC (address of next instruction to be executed) is pushed on to stack. Here, address of INC, R3 $= 1000 + (2 + 1 + 1 + 1) \times 32/8 = 1020$ and next instruction address $= 1020 + 4 = 1024$  which is pushed on to stack. 
Ref: http://www.ece.utep.edu/courses/web3376/Notes_files/ee3376-interrupts_stack.pdf

11 votes -- Vicky Bajoria ( 4.9k points)
Selected Answer

In the given question there are 7 instructions each of which takes 1 clock cycle to complete. (Pipelining may be used)
If an instruction is in execution phase and any other instructions can’t be in the execution phase. So, atleast 7 clock cycles will be taken.
Now, it is given that between two instructions latency or delay should be there based on their operation. Ex- 1st line of the table says that between two operations in which first is producing the result of an ALU operation and the 2nd is using the result there should be a delay of 2 clock cyles.

clock cycle :

im_07_41_1

1) Load R1, Loc 1; Load R1 from memory location Loc1
Takes 1 clock cycle, simply loading R1 on loc1.


2) Load R2, Loc 2; Load R2 from memory location Loc2
Takes 1 clock cycle, simply loading r2 on loc2.


3) Add R1, R2, R1; Add R1 and R2 and save result in R1
R1=R1+R2;

Hence, this instruction is using the result of R1 and R2, i.e. result of Instruction 1 and Instruction 2.

As instruction 1 is load operation and instruction 3 is ALU operation. So, there should be a delay of 1 clock cycle between instruction 1 and instruction 3.Which is already there due to I2.
As instruction 2 is load operation and instruction 3 is ALU operation. So, there should be a delay of 1 clock cycle between instruction 2 and instruction 3.

4) Dec R2; Decrement R2
This instruction is dependent on instruction 2 and there should be a delay of one clock cycle between Instruction 2 and Instruction 4. As instruction 2 is load and 4 is ALU . Which is already there due to Instruction 3.

5) Dec R1 Decrement R1
This instruction is dependent on Instruction 3
As Instruction I3 is ALU and I5 is also ALU so a delay of 2 clock cycles should be there between them of which 1 clock cycle delay is already there due to I4 so one clock cycle delay between I4 and I5.

6) MPY R1, R2, R3; Multiply R1 and R2 and save result in R3
R3=R1*R2;
This instruction uses the result of Instruction 5, as both instruction 5 and 6 are ALU so there should be a delay of 2 clock cycles.

7) Store R3, Loc 3 Store R3 in memory location Loc3
This instruction is dependent on instruction 6 which is ALU and instruction 7 is store so there should be a delay of 2 clock cycles between them.

Hence, a total of 13 clock cycles will be there.

15 votes -- Madhab Paul Choudhury ( 5.1k points)

Answer is (C)

Here each instruction takes 1 cycle but apart from that we have to consider latencies b/w instruction: If there are two ALU operations by I1 and I2 such that I2 uses the value produced by I1 in some register then I2 will be executed ONLY after waiting TWO more cycles after I1 has executed because latency b/w two ALU operations is 2

See here:

Clock         1       2         3         4         5            6           7           8         9          10           11          12           13             14

Inst.           I1       I2       -           I3       I4           -           I5           -          -             I6            -             -              I7

 

I3 is ALU operation which uses result of LOAD in I2 , so latency is of 1 cycle.

I5 is ALU operation using result of ALU in I3 therefore has to wait for 2 cycles after I3

I6 is ALU and uses result of ALU in I5 ,therefore waits 2 cycles

 

10 votes -- Sandeep_Uniyal ( 7.3k points)
Selected Answer

RFE (Return From Exception) is a privileged trap instruction that is executed when exception occurs, so an exception is not allowed to execute. (D) is the correct option.

Ref: http://www.cs.rochester.edu/courses/252/spring2014/notes/08_exceptions

5 votes -- Vikrant Singh ( 13.4k points)
Selected Answer
I think it should be A as 998<-1000+998.(i am writing only memory locations for sake of brevity). Lets say sp is 1000 initially then after it calculates the EA of source(which is 1000 as it decrements after the EA) the destination becomes 998 and that is where we want to store the result as stack is decrementing...in case of C and D it becomes 998<-998+998
12 votes -- Shaun Patel ( 6.9k points)
Selected Answer

first we have to consider here memory is byte-addressable 
 

The CALL instruction is implemented as follows:

  • Store the current value of PC in the stack

    pc is 2 byte it means when we store pc in stack it will increase by 2  
    so current value of SP is 
    (016E)16 +2


     
  • Store the value of PSW register in the stack
    psw is 2 byte it means when we store psw in stack it will increase by 2  
    so current value of SP is 
    (016E)16 +2+2 =(0172)16

     
32 votes -- Anoop Sonkar ( 4.9k points)
A processor can support a maximum memory of $4GB$, where the memory is word-addressable (a word consists of two bytes). The size of address bus of the processor is at least _________bits.

Answers: Memory Interfacing

Selected Answer

Size of Memory = No of words (Addresses)  $\times$ No of bits per word

$2^{32} B$ =  No of words (Addresses)  $\times$ $2B$

No of words (Addresses)  = $2^{31}$

Number of Address lines = $31$

25 votes -- Praveen Saini ( 53.6k points)

Max memory = 4 GB = 232 Bytes since 1 word=2B.. total number of words=232/2=231 .. so to address these words we need minimum 31 bit address bus

12 votes -- Abhilash Panicker ( 8.8k points)

A micro program control unit is required to generate a total of 25 control signals. Assume that during any microinstruction, at most two control signals are active. Minimum number of bits required in the control word to generate the required control signals will be

  1. 2
  2. 2.5
  3. 10
  4. 12

A micro instruction is to be designed to specify

  1. none or one of the three micro operations of one kind and
  2. none or upto six micro operations of another kind

The minimum number of bits in the micro-instruction is

  1. 9
  2. 5
  3. 8
  4. None of the above

Arrange the following configuration for CPU in decreasing order of operating speeds:

Hard wired control, Vertical microprogramming, Horizontal microprogramming.

 

  1. Hard wired control, Vertical microprogramming, Horizontal microprogramming.

  2. Hard wired control, Horizontal microprogramming, Vertical microprogramming.

  3. Horizontal microprogramming, Vertical microprogramming, Hard wired control.

  4. Vertical microprogramming, Horizontal microprogramming, Hard wired control.

 

Horizontal microprogramming

  1. does not require use of signal decoders
  2. results in larger sized microinstructions than vertical  microprogramming
  3. uses one bit for each control signal
  4. all of the above

The microinstructions stored in the control memory of a processor have a width of 26 bits. Each microinstruction is divided into three fields: a micro-operation field of 13 bits, a next address field ($X$), and a MUX select field ($Y$). There are 8 status bits in the input of the MUX.

    

How many bits are there in the $X$ and $Y$ fields, and what is the size of the control memory in number of words?

  1. 10, 3, 1024
  2. 8, 5, 256
  3. 5, 8, 2048
  4. 10, 3, 512

A CPU has only three instructions I1, I2 and I3, which use the following signals in time steps T1-T5:

I1 : T1 : Ain, Bout, Cin
      T2 : PCout, Bin
      T3 : Zout, Ain
      T4 : Bin, Cout
      T5 : End

I2 : T1 : Cin, Bout, Din
      T2 : Aout, Bin
      T3 : Zout, Ain
      T4 : Bin, Cout
      T5 : End

I3 : T1 : Din, Aout
      T2 : Ain, Bout
      T3 : Zout, Ain
      T4 : Dout, Ain
      T5 : End

Which of the following logic functions will generate the hardwired control for the signal Ain ?

  1. T1.I1 + T2.I3 + T4.I3 + T3
  2. (T1 + T2 + T3).I3 + T1.I1
  3. (T1 + T2 ).I1 + (T2 + T4).I3 + T3
  4. (T1 + T2 ).I2 + (T1 + T3).I1 + T3

A hardwired CPU uses $10$ control signals $S_1$ to $S_{10}$, in various time steps $T_1$ to $T_5$, to implement $4$ instructions $I_1$ to $I_4$ as shown below:

GATE2005-IT_45

Which of the following pairs of expressions represent the circuit for generating control signals $S_5$ and $S_{10}$ respectively?

($(I_j + I_k)T_n$ indicates that the control signal should be generated in time step $T_n$ if the instruction being executed is $I_j$ or $l_k$)

  1. $S_5 = T_1 + I_2 \cdot T_3$ and

    $S_{10} = (I_1 + I_3) \cdot T_4 + (I_2 + I_4) \cdot T_5$

  2. $S_5 = T_1 + (I_2 + I_4) \cdot T_3$ and

    $S_{10} = (I_1 + I_3) \cdot T_4 + (I_2 + I_4) \cdot T_5$

  3. $S_5 = T_1 + (I_2 + I_4) \cdot T_3$ and

    $S_{10} = (I_2 + I_3 + I_4) \cdot T_2 + (I_1 + I_3) \cdot T_4 + (I_2 + I_4) \cdot T_5$

  4. $S_5 = T_1 + (I_2 + I_4) \cdot T_3$ and

    $S_{10} = (I_2 + I_3) \cdot T_2 + I_4 \cdot T_3 + (I_1 + I_3) \cdot T_4 + (I_2 + I_4) \cdot T_5$

An instruction set of a processor has 125 signals which can be divided into 5 groups of mutually exclusive signals as follows:

Group 1 : 20 signals, Group 2 : 70 signals, Group 3 : 2 signals, Group 4 : 10 signals, Group 5 : 23 signals.

How many bits of the control words can be saved by using vertical microprogramming over horizontal microprogramming?

  1. 0
  2. 103
  3. 22
  4. 55

The data path shown in the figure computes the number of 1s in the 32-bit input word corresponding to an unsigned even integer stored in the shift register.
The unsigned counter, initially zero, is incremented if the most significant bit of the shift register is 1.

GATE2006-IT_41

The microprogram for the control is shown in the table below with missing control words for microinstructions I1, I2, ..... In.

Microinstruction reset_counter shift_left load_output
BEGIN 1 0 0
I1 ? ? ?
: :   :
In ? ? ?
END 0 0 1

The counter width (k), the number of missing microinstructions (n), and the control word for microinstructions I1, I2, ..... In are, respectively,

  1. 32, 5, 010
  2. 5, 32, 010
  3. 5, 31, 011
  4. 5, 31, 010

Consider a CPU where all the instructions require 7 clock cycles to complete execution. There are 140 instructions in the instruction set. It is found that 125 control signals are needed to be generated by the control unit. While designing the horizontal microprogrammed control unit, single address field format is used for branch control logic. What is the minimum size of the control word and control address register?

  1. 125, 7
  2. 125, 10
  3. 135, 9
  4. 135, 10

Consider the following sequence of micro-operations.
 

  MBR ← PC  MAR ← X  PC ← Y  Memory ← MBR


Which one of the following is a possible operation performed by this sequence?

(A) Instruction fetch

(B) Operand fetch

(C) Conditional branch

(D) Initiation of interrupt service

Answers: Microprogramming

Selected Answer
To generate 25 different control signals 5 bits are required....at any time atmost 2 signals are active..so control word length=5+5=10 bits
8 votes -- aravind90 ( 589 points)
Selected Answer

Actually the given question incorporates the concept of horizontal μprogramming (also known as decoded form of control signals) and vertical μprogramming(also known as encoded form of control signals)..

The a) part says :

none or one of the three micro operations of one kind 

This is referred to encoding form of vertical one since at most one signal can be active in vertical microprogramming since it involves use of external decoder to select one control signal out of the given control signals..

As we know no of bits required for vertical  microprogramming given n number of control signals = ceil ( log2 n )

Here n = 3

So no of bits required for a) part    =   ceil( log23 )

                                                  =   2

 

Now coming to b) part , it says :

none or upto six micro operations of another kind

So it says at maximum we can have at most 6 microoperations of another kind at a time..To accomodate that we need decoded form of control signals which is horizontal signals..

So no of bits required for b) part   =    No of control signals of b) kind   =  6

Therefore ,

Overall bits required to accomodate both a) and b)  =  2 + 6

                                                                          =   8 bits

Besides this , address field , flags etc are also there in a control word..That is why it is asked in the question :

 minimum number of bits in the micro-instruction required

Hence minimum no of bits required  =  8 bits

Hence C) is the correct answer.. 

14 votes -- HABIB MOHAMMAD KHAN ( 76.8k points)
Selected Answer
Hard wired control involves only hardware, whereas microprogramming is software approach. So, hardwire control should be faster than both microprogramming approaches.

Between vertical and horizontal microprogramming. Horizontal is faster because in this control signals are not encoded whereas in vertical microprogramming to save memory signals are encoded. So, it takes less time in horizontal microprogramming because decoding of signals is not required. Therefore, final order is :

hard wired control > horizontal microprogramming > vertical microprogramming
9 votes -- Shikhar Vashishth ( 5.7k points)
Selected Answer
option (d). All statements are true.

Ref: http://www.cs.virginia.edu/~cs333/notes/microprogramming.pdf
15 votes -- Suvojit Mondal ( 431 points)
Selected Answer
$x + y + 13 = 26  \rightarrow (1)$
$y = 3$ // y is no of bits used to represent 8 different states of multiplexer $ \rightarrow (2)$
$x$ is no of bits required represent size of control memory
$x = 10$ from (1) and (2)

size of control memory $= 2^x = 2^{10}= 1024$
17 votes -- Digvijay ( 47k points)
Selected Answer

We just have to see which all options give 1 whenever Ain is 1 and 0 otherwise. 

So, Ain is 1 in T3 of I1, I2 and I3. Also during T1 of I1, and T2 and T4 of I3. So, answer will be 

T1.I1 + T2.I3 + T4.I3 + T3.I1 + T3.I2 + T3.I3

Since CPU is having only 3 instructions, T3.I1 + T3.I2 + T3.I3 can be replaced with T3 (we don't need to see which instruction and Ain will be activated in time step 3 of all the instructions).

So, T1.I1 + T2.I3 + T4.I3 + T3 

is the answer. Option A. 

14 votes -- Arjun Suresh ( 294k points)
Selected Answer
4. is the option for this question.
If we look at the table, we need to find those time-stamps and instructions which are using these control signals.

For example, $S_5 = T_1$ has used control signal $S_5$ for all the instructions, or we can say irrespective of the instructions. Also, $S_5$ is used by instructions $I_2$ and $I_4$ for the time stamp $T_3$ so that comes to:
$$S_5 = T_1 + I_2 \cdot T_3 + I_4 \cdot T_3 = T_1 + (I_2+I_3) \cdot T_3$$

In the same way, we'll calculate for $S_{10}$.
It's an example of Hardwired CU Programming used in RISC processors. It gives accurate result, but isn't good for debugging since minor change will cause to restructure the control unit.
14 votes -- Manu Thakur ( 6k points)
Selected Answer

In horizontal microprogramming we need 1 bit for every control word, therefore total bits in

Horizontal Microprogramming=20+70+2+10+23=125

Now lets consider vertical microprogramming, In vertical microprogramming we use Decoder (n to 2n ) and output lines are equal to number of control words . A input is given according to what control word we have to select.

Now in this question these 5 groups contains mutually exclusive signals, i.e, they can be activated one at a time for a given group, we can safely use decoder.

group 1= ⌈ log220⌉=5 (Number of input bits for decoder, given output is number of control word in given group)

group 2= ⌈ log270⌉ =7

group 3= ⌈ log22 ⌉ =1

group 4= ⌈ log210⌉ =4

group 5= ⌈og223 ⌉=5

Total bits required in vertical microprogramming= 5+7+1+4+5=22

So number of control words saved= 125-22=103 hence (B) is answer

23 votes -- Prateeksha Keshari ( 2.1k points)
Selected Answer

Answer should be D.

Here i1 to in are microinstructions and reset_counter, shift_left and load_output are control singals to activate corresponding hardware(eg. Shift register or load output). 

Counter width ( k) is 5 bits as shift regiter uses 32 bit data Only.

Number of missing microinstructions (n)  should be 31 as shift register contain Only unsigned EVEN integer. LSB Will be always 0 so no need to shift for LSB.

 Control word contains:-

1 for active/enable.  0 for inactive or disable.

Reset counter is to reset the counter so it must be 0 for all microins.

Shift_left CS should be 1 to shift the given data in shift reg. 

And load output has no meaning to make output active for all micro instructions as it will be used in the END only so it should be 0.

5 votes -- khush tak ( 6.8k points)
Selected Answer
Its ans shuld be D becoz 140 instruction each requiring 7 cycles means...980 cycles which will take 10 bits

since its horizontal so for control word = 125 control signals + 10 bits =135 bits will be required
8 votes -- nagendra2016 ( 171 points)
Selected Answer

Here PC value is being stored in memory. which is done when either CALL RETURN involved or there is Interrupt. As, we will have to come back to execute current instruction.

so, option (A), (B) are clearly incorrect

option (C) is incorrect coz conditional branch does not require to save PC contents.

option (D) is correct as it matches the generic Interrupt Cycle : 

11 votes -- Himanshu Agarwal ( 16.2k points)

If an instruction takes $i$ microseconds and a page fault takes an additional $j$ microseconds, the effective instruction time if on the average a page fault occurs every $k$ instruction is:

  1. $i + \frac{j}{k}$

  2. $i + j^*k$

  3. $\frac{i+j}{k}$

  4. $({i+j})^*{k}$

 

Answers: Page Fault

Selected Answer
page fault rate=1/k

page hit rate=1-1/k

service time=i

page fault service time=i+j

now effective memory access time=1/k*(i+j)+(1-1/k)*i=(i+j)/k+i-i/k=i/k+j/k+i-i/k=i+j/k

so option A is correct..
11 votes -- shashi shekhar ( 515 points)
The stage delays in a $4$-stage pipeline are $800, 500, 400$ and $300$ picoseconds. The first stage (with delay $800$ picoseconds) is replaced with a functionality equivalent design involving two stages with respective delays $600$ and $350$ picoseconds. The throughput increase of the pipeline is ___________ percent.

Consider a $3 \ \text{GHz}$ (gigahertz) processor with a three stage pipeline and stage latencies $\tau_1,\tau_2$ and $\tau_3$ such that $\tau_1 =\frac{3 \tau_2}{4}=2\tau_3$. If the longest pipeline stage is split into two pipeline stages of equal latency , the new frequency is __________ $GHz$, ignoring delays in the pipeline registers.

 

An instruction pipeline consists of 4 stages – Fetch (F), Decode field (D), Execute (E) and Result Write (W). The 5 instructions in a certain instruction sequence need these stages for the different number of clock cycles as shown by the table below

 

No. of cycles needed for
Instruction F D E W
1 1 2 1 1
2 1 2 2 1
3 2 1 3 2
4 1 3 2 1
5 1 2 1 2

 

Find the number of clock cycles needed to perform the 5 instructions.

 

 

Comparing the time T1 taken for a single instruction on a pipelined CPU with time T2 taken on a non-pipelined but identical CPU, we can say that

  1. T1 ≤ T2
  2. T1 ≥ T2
  3. T1 < T2
  4. T1 and T2 plus the time taken for one instruction fetch cycle

 

An instruction pipeline has five stages where each stage take 2 nanoseconds and all instruction use all five stages. Branch instructions are not overlapped. i.e., the instruction after the branch is not fetched till the branch instruction is completed. Under ideal conditions,

  1. Calculate the average instruction execution time assuming that 20% of all instructions executed are branch instruction. Ignore the fact that some branch instructions may be conditional.
  2. If a branch instruction is a conditional branch instruction, the branch need not be taken. If the branch is not taken, the following instructions can be overlapped. When 80% of all branch instructions are conditional branch instructions, and 50% of the conditional branch instructions are such that the branch is taken, calculate the average instruction execution time.

Consider a 5-stage pipeline - IF (Instruction Fetch), ID (Instruction Decode and register read), EX (Execute), MEM (memory), and WB (Write Back). All (memory or register) reads take place in the second phase of a clock cycle and all writes occur in the first phase. Consider the execution of the following instruction sequence:

I1: sub r2, r3, r4; /* $r2 \leftarrow r3 - r4$ */
I2: sub r4, r2, r3; /* $r4 \leftarrow r2 - r3$ */
I3: sw r2, 100(r1) /* $M[r1+100] \leftarrow r2$ */
I4: sub r3, r4, r2 /* $r3 \leftarrow r4 -r2$ */
  1. Show all data dependencies between the four instructions.
  2. Identify the data hazards.
  3. Can all hazards be avoided by forwarding in this case.

 

The performance of a pipelined processor suffers if

  1. the pipeline stages have different delays
  2. consecutive instructions are dependent on each other
  3. the pipeline stages share hardware resources
  4. All of the above

 

For a pipelined CPU with a single ALU, consider the following situations

  1. The j+1 –st instruction uses the result of the j-th instruction as an operand

  2. The execution of a conditional jump instruction

  3. The j-th and j+1st instructions require the ALU at the same time.

Which of the above can cause a hazard

  1. I and II only
  2. II and III only
  3. III only
  4. All the three

A 4-stage pipeline has the stage delays as 150, 120, 160 and 140 nanoseconds respectively. Registers that are used between the stages have a delay of 5 nanoseconds each. Assuming constant clocking rate, the total time taken to process 1000 data items on this pipeline will be

  1. 120.4 microseconds

  2. 160.5 microseconds

  3. 165.5 microseconds

  4. 590.0 microseconds

Consider a pipeline processor with 4 stages S1 to S4. We want to execute the following loop:

for (i = 1; i < = 1000; i++) 
    {I1, I2, I3, I4} 

where the time taken (in ns) by instructions I1 to I4 for stages S1 to S4 are given below:

  S1 S2 S3 S4
I1: 1 2 1 2
I2: 2 1 2 1
I3: 1 1 2 1
I4: 2 1 2 1

The output of  I1 for i = 2 will be available after

  1. 11 ns
  2. 12 ns
  3. 13 ns
  4. 28 ns

A 5 stage pipelined CPU has the following sequence of stages:

  • IF – instruction fetch from instruction memory
  • RD – Instruction decode and register read
  • EX – Execute: ALU operation for data and address computation
  • MA – Data memory access – for write access, the register read at RD state is used.
  • WB – Register write back

Consider the following sequence of instructions:

  • $I_1$: L R0, loc 1; R0 <= M[loc1]
  • $I_2$: A R0, R0; R0 <= R0 +R0
  • $I_3$: S R2, R0; R2 <= R2-R0

Let each stage take one clock cycle

What is the number of clock cycles taken to complete the above sequence of instructions starting from the fetch of $I_1$?

  1. 8
  2. 10
  3. 12
  4. 15

We have two designs D1 and D2 for a synchronous pipeline processor. D1 has 5 pipeline stages with execution times of 3 nsec, 2 nsec, 4 nsec, 2 nsec and 3 nsec while the design D2 has 8 pipeline stages each with 2 nsec execution time How much time can be saved using design D2 over design D1 for executing 100 instructions?

  1. 214 nsec
  2. 202 nsec
  3. 86 nsec
  4. -200 nsec

A CPU has a five-stage pipeline and runs at 1 GHz frequency. Instruction fetch happens in the first stage of the pipeline. A conditional branch instruction computes the target address and evaluates the condition in the third stage of the pipeline. The processor stops fetching new instructions following a conditional branch until the branch outcome is known. A program executes 109 instructions out of which 20% are conditional branches. If each instruction takes one cycle to complete on average, the total execution time of the program is: 

  1. 1.0 second
  2. 1.2 seconds
  3. 1.4 seconds
  4. 1.6 seconds

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement X = (S - R * (P + Q))/T is given below. The values of variables P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.

 ADD  R5, R0, R1  ; R5 ← R0 + R1
 MUL  R6, R2, R5  ; R6 ← R2 * R5
 SUB  R5, R3, R6  ; R5 ← R3 - R6
 DIV  R6, R5, R4  ; R6 ← R5/R4
 STORE  R6, X  ; X  ← R6

The number of Read-After-Write (RAW) dependencies, Write-After-Read( WAR) dependencies, and Write-After-Write (WAW) dependencies in the sequence of instructions are, respectively,

  1. 2, 2, 4
  2. 3, 2, 3
  3. 4, 2, 2
  4. 3, 3, 2

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement X = (S - R * (P + Q))/T is given below. The values of variables P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.

 ADD  R5, R0, R1  ; R5 → R0 + R1
 MUL  R6, R2, R5  ; R6 → R2 * R5
 SUB  R5, R3, R6  ; R5 → R3 - R6
 DIV  R6, R5, R4  ; R6 → R5/R4
 STORE  R6, X  ; X  ← R6

 

The IF, ID and WB stages take 1 clock cycle each. The EX stage takes 1 clock cycle each for the ADD, SUB and STORE operations, and 3 clock cycles each for MUL and DIV operations. Operand forwarding from the EX stage to the ID stage is used. The number of clock cycles required to complete the sequence of instructions is

  1. 10
  2. 12
  3. 14
  4. 16

Consider a pipelined processor with the following four stages:

  • IF: Instruction Fetch
  • ID: Instruction Decode and Operand Fetch
  • EX: Execute
  • WB: Write Back

The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction needs 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?

ADD R2, R1, R0 R2 $\leftarrow$ R1 + R0
MUL R4, R3, R2 R4 $\leftarrow$ R3 * R2
SUB R6, R5, R4 R6 $\leftarrow$ R5 - R4

 

  1. 7
  2. 8
  3. 10
  4. 14

 

A processor takes 12 cycles to complete an instruction I. The corresponding pipelined processor uses 6 stages with the execution times of 3, 2, 5, 4, 6 and 2 cycles respectively. What is the asymptotic speedup assuming that a very large number of instructions are to be executed?

  1. 1.83
  2. 2
  3. 3
  4. 6

Which of the following are NOT true in a pipelined processor?

  1. Bypassing can handle all RAW hazards
  2. Register renaming can eliminate all register carried WAR hazards
  3. Control hazard penalties can be eliminated by dynamic branch prediction
  1. I and II only
  2. I and III only
  3. II and III only 
  4. I, II and III

Delayed branching can help in the handling of control hazards

For all delayed conditional branch instructions, irrespective of whether the condition evaluates to true or false,

  1. The instruction following the conditional branch instruction in memory is executed

  2. The first instruction in the fall through path is executed

  3. The first instruction in the taken path is executed

  4. The branch takes longer to execute than any other instruction

Delayed branching can help in the handling of control hazards

The following code is to run on a pipelined processor with one branch delay slot:

I1: ADD $R2 \leftarrow R7 + R8$

I2: Sub $R4 \leftarrow R5 – R6$

I3: ADD $R1 \leftarrow R2 + R3$

I4: STORE Memory $[R4] \leftarrow R1$

     BRANCH to Label if $R1 == 0$

Which of the instructions I1, I2, I3 or I4 can legitimately occupy the delay slot without any program modification?

  1. I1
  2. I2
  3. I3
  4. I4

A non pipelined single cycle processor operating at 100 MHz is converted into a synchro­nous pipelined processor with five stages requiring 2.5 nsec, 1.5 nsec, 2 nsec, 1.5 nsec and 2.5 nsec, respectively. The delay of the latches is 0.5 nsec. The speedup of the pipeline processor for a large number of instructions is

  1. 4.5
  2. 4.0
  3. 3.33
  4. 3.0

Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1, I2, I3, I4 in stages S1, S2, S3, S4 is shown below:

  S1 S2 S3 S4
I1 2 1 1 1
I2 1 3 2 2
I3 2 1 1 3
I4 1 2 2 2

 

What is the number of cycles needed to execute the following loop?

For (i=1 to 2) {I1; I2; I3; I4;}

  1. 16
  2. 23
  3. 28
  4. 30

A 5-stage pipelined processor has Instruction Fetch (IF), Instruction Decode (ID), Opearnd Fetch (OF), Perform Operation (PO) and Write Operand (WO) stages. The IF, ID, OF and WO stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD and SUB instructions, 3 clock cycles for MUL instruction and 6 clock cycles for DIV instruction respectively. Operand forwarding is used in the pipeline. What is the number of clock cycles needed to execute the following sequence of instructions?

Instruction    Meaning of instruction
$t_0: \text{MUL} \:\text{R}_2,\text{R}_0,\text{R}_1$ $\text{R}_2  \gets \text{R}_0*\text{R}_1$
$t_1: \text{DIV}\: \text{R}_5,\text{R}_3,\text{R}_4$ $\text{R}_5 \gets \text{R}_3 / \text{R}_4$
$t_2: \text{ADD}\: \text{R}_2,\text{R}_5,\text{R}_2$ $\text{R}_2 \gets \text{R}_5 + \text{R}_2$
$t_3: \text{SUB} \:\text{R}_5,\text{R}_2,\text{R}_6$   $\text{R}_5 \gets \text{R}_2 - \text{R}_6$
  1. 13
  2. 15
  3. 17
  4. 19

Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure.

What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?

(A) 4.0

(B) 2.5

(C) 1.1

(D) 3.0

 

 

Register renaming is done in pipelined processors

  1. as an alternative to register allocation at compile time
  2. for efficient access to function parameters and local variables
  3. to handle certain kinds of hazards
  4. as part of address translation

Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A program consisting of 12 instructions I1, I2, I3, …, I12 is executed in this pipelined processor. Instruction I4 is the only branch instruction and its branch target is I9. If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is

(A) 132      (B) 165      (C) 176      (D) 328

 

Consider a 6-stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle-time overhead of pipelining. When an application is executing on this 6-stage pipeline, the speedup achieved with respect to non-pipelined execution if 25% of the instructions incur 2 pipeline stall cycles is ____________
An instruction pipeline has five stages, namely, instruction fetch (IF), instruction decode and register fetch (ID/RF), instruction execution (EX), memory access (MEM), and register writeback (WB) with stage latencies 1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns, respectively (ns stands for nanoseconds). To gain in terms of frequency, the designers have decided to split the ID/RF stage into three stages (ID, RF1, RF2) each of latency 2.2/3 ns. Also, the EX stage is split into two stages (EX1, EX2) each of latency 1 ns. The new design has a total of eight pipeline stages. A program has 20% branch instructions which execute in the EX stage and produce the next instruction pointer at the end of the EX stage in the old design and at the end of the EX2 stage in the new design. The IF stage stalls after fetching a branch instruction until the next instruction pointer is computed. All instructions other than the branch instruction have an average CPI of one in both the designs. The execution times of this program on the old and the new design are $P$ and $Q$ nanoseconds, respectively. The value of $P/Q$ is __________.

Consider the following processors (ns stands for nanoseconds). Assume that the pipeline registers have zero latency. 

P1: Four-stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns. 

P2: Four-stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns. 

P3: Five-stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns. 

P4: Five-stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns. 

Which processor has the highest peak clock frequency?

  1. P1
  2. P2
  3. P3
  4. P4
Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of four. The same processor is upgraded to a pipelined processor with five stages; but due to the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume that there are no stalls in the pipeline. The speedup achieved in this pipelined processor is_______________.

Consider the sequence of machine instruction given below:

MUL R5, R0, R1
DIV R6, R2, R3
ADD R7, R5, R6
SUB R8, R7, R4

In  the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first register shows the result of the operation performed on the second and the third registers. This sequence of instructions is to be executed in a pipelined instruction processor with the following 4 stages: (1) Instruction Fetch and Decode (IF), (2) Operand Fetch (OF), (3) Perform Operation (PO) and (4) Write back the result (WB). The IF, OF and WB stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD and SUB instruction, 3 clock cycles for MUL instruction and 5 clock cycles for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage. The number of clock cycles taken for the execution of the above sequence of instruction is _________.

Consider the following reservation table for a pipeline having three stages $S_1, S_2 \text{ and } S_3$.

$Time \: \rightarrow$
  1 2 3 4 5
$S_1$ $X$       $X$
$S_2$   $X$   $X$  
$S_3$     $X$    

The minimum average latency (MAL) is ______

Instruction execution in a processor is divided into 5 stages, Instruction Fetch (IF), Instruction Decode (ID), Operand fetch (OF), Execute (EX), and Write Back (WB). These stages take 5, 4, 20, 10 and 3 nanoseconds (ns) respectively. A pipelined implementation of the processor requires buffering between each pair of consecutive stages with a delay of 2 ns. Two pipelined implementation of the processor are contemplated:

(i) a naive pipeline implementation (NP) with 5 stages and

(ii) an efficient pipeline (EP) where the OF stage is divided into stages OF1 and OF2 with execution times of 12 ns and 8 ns respectively.

The speedup (correct to two decimal places) achieved by EP over NP in executing 20 independent instructions with no hazards is _________ .

A machine $\mathcal{M}$ has the following five pipeline stages; their respective time requirements in nanoseconds (ns) are given within parentheses:

  • $F$-stage — instruction fetch (9 ns),
  • $D$-stage — instruction decode and register fetch (3 ns),
  • $X$-stage — execute/address calculation (7 ns),
  • $M$-stage — memory access (9 ns),
  • $W$-stage — write back to a register (2 ns).

Assume that for each stage, the pipeline overhead is 1 ns. A program $P$ having 100 machine instructions runs on $\mathcal{M}$, where every 3rd instruction needs a 1-cycle stall before the $X$-stage. Calculate the CPU time in seconds for completing $P$.

Answers: Pipelining

Selected Answer
In pipeline ideally  $CPI=1$
So in 1 cycle 1 instruction gets completed
Throughout is instructions in unit time
In pipeline 1, cycle time=max stage delay $=800  psec$
In $800 * psec,$ we expect to finish 1 instuction
So, in 1s,   $\frac{1}{800}$  instructions are expected to be completed, which is also the throughput for pipeline 1.

Similarly pipeline 2, throughput$=\frac{1}{600}$
Throughput increase in percentage
$=\frac{new-old }{old} *100$
$= \frac{\frac{1}{600}-\frac{1}{800}}{\frac{1}{800} }   *100$
$=\frac{200}{600} *100$
$=33.33 \%$
44 votes -- Anurag Semwal ( 8k points)

Maximum throughput of a Pipeline i.e in best case without any stalls is equal to Clock Frequency of the pipeline

In first case Clock cycle time = Max Stage Delay = Max(800,500,400 and 300) = 800. So clock Frequency = 1/800 (Ignore the units as we have to calculate percentage only)

In Second Case Clock cycle time = Max(600,350,500,400 and 300) = 600. So clock Frequency = 1/600.

Percentage increase in throughput of pipeline = percentage in Clock Frequency = (1/600 - 1/800)/1/800 = 33.33%

11 votes -- Mehak Sharma ( 1.5k points)
Selected Answer

Answer is 4 GHz.

Given 3 stage pipeline , with 3 GHz processor.

Given , e1 = 3 e2 / 4  = 2 e3

Put e1 = 6x 

  we get, e2 = 8x  , e3 = 3x

Now largest stage time is 8x.

So, frequency is $\frac{1}{8x}$

=> $\frac{1}{8x}$ = 3 GHz  

=>  $\frac{1}{x}$ = 24 GHz ---------(1)


Now, we divide e2  into two stages  of 4x & 4x.

New processor has 4 stages -

6x , 4x, 4x, 3x.

Now largest stage time is 6x.

So, new frequency is 

$\frac{1}{6x}$ = $\frac{24}{6}$ =  4 GHz (Ans)

    ------- from (1) 

43 votes -- Himanshu Agarwal ( 16.2k points)
Selected Answer

answer = $15$ cycles are required.

6 votes -- Amar Vashishth ( 28.7k points)
Selected Answer

Here we are comparing the execution time of only a single instruction. Pipelining in no way increases the execution time of a single instruction (the time from its start to end). It increases the overall performance by splitting the execution to multiple pipeline stages so that the following instructions can use the finished stages of the previous instructions. But in doing so pipelining causes some problems also as given in the below link, which might slow some instructions. So, (B) is the answer. 

http://www.cs.wvu.edu/~jdm/classes/cs455/notes/tech/instrpipe.html

22 votes -- Arjun Suresh ( 294k points)
Selected Answer

Each stage is 2ns. So, after 5 time units each of 2ns, the first instruction finishes (i.e., after 10ns), in every 2ns after that a new instruction gets finished. This is assuming no branch instructions. Now, once the pipeline is full, we can assume that the initial fill time doesn't matter our calculations and average execution time for each instruction is 2ns assuming no branch instructions.

(a) Now, we are given that 20% of instructions are branch (like JMP) and when a branch instruction is executed, no further instruction enters the pipeline. So, we can assume every 5th instruction is a branch instruction. So, with this assumption, total time to finish 5 instruction will be 5 * 2 + 8 = 18 ns (as when a branch instruction enters the pipeline and before it finishes, 4 pipeline stages will be empty totaling 4 * 2 = 8 ns, as it is mentioned in question that the next instruction fetch starts only when branch instruction completes). And this is the same for every set of 5 instructions, and hence the average instruction execution time = 18/5 = 3.6 ns

(b) This is just a complex statement. But what we need is to identify the % of branch instructions which cause a branch to be taken as others will have no effect on the pipeline flow.
20% of branch instructions are branch instructions. 80% of branch instructions are conditional.
That means .2*.8 = 16% of instructions are conditional branch instructions and it is given that 50% of those result in a branch being taken.
So, 8% of instructions are conditional branches being taken and we also have 20% of 20% = 4% of unconditional branch instructions which are always taken.

So, percentage of instructions where a branch is taken is 8+4 = 12% instead of 20% in (a) part.

So, in 100 instructions there will be 12 branch instructions. We can do a different calculation here as compared to (a) as 12 is not a divisor of 100. Each branch instruction causes a pipeline delay of 4*2 = 8 ns. So, 12 instructions will cause a delay of 12 * 8 = 96 ns. For 100 instructions, we need 100 * 2 = 200 ns without any delay and with delay we require 200 + 96 = 296 ns for 100 instructions.

So, average instruction execution time = 296/100 = 2.96 ns

(We can also use this method for part (a) which will give 100 * 2 + 20*8 = 360 ns for 100 instructions)

 

21 votes -- Arjun Suresh ( 294k points)

if an instruction branches then it takes $2ns \times 5 = 10ns$, coz if branch is taken then the instruction after that branch instruction is not fetched until entire current branch instruction is completed, this means it will go through all stages.

if an instruction is non-branch or branching does not happen then, it takes $2ns$ to get completed. 

Q.a) $\text{average time taken} = 0.8 \times 2ns + 0.2 \times 10ns = 3.6ns$

Q.b) 

$$\text{Average time taken} = 0.8 \times 2ns + 0.2 \Big( 0.2 \times 10ns + 0.8 \big( 0.5\times 10ns + 0.5 \times 2ns\big)\Big) =2.96ns$$

17 votes -- Amar Vashishth ( 28.7k points)
Selected Answer

4 RAW

3 WAR 

With operand forwarding:

 

Without it:
(both tables represent the same pipeline)


 

13 votes -- Amar Vashishth ( 28.7k points)
Selected Answer
Answer: D

A: Yes. Total delay = Max (All delays) + Register Delay.

B: Yes, if data forwarding is not there.

C: Yes, like ID and EX shares ID/EX register.
14 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer

1. Data hazard
2. Control hazard
3. Structural hazard as only one ALU is there

So, D all of these. 

http://www.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/hazards.html

16 votes -- Arjun Suresh ( 294k points)
Selected Answer
Pipelining requires all stages to be synchronized meaning, we have to make the delay of all stages equal to the maximum pipeline stage delay which here is 160.

Time for execution of the first instruction = (160+5) * 3 + 160 = 655 ns (5 ns for intermediate registers which is not needed for the final stage).

Now, in every 165 ns, an instruction can be completed. So,

Total time for 1000 instructions = 655 + 999*165 = 165.49 microseconds
18 votes -- Arjun Suresh ( 294k points)
Selected Answer
time 1 2 3 4 5 6 7 8 9 10 11 12 13
I1 s1 s2 s2 s3 s4 s4              
I2   s1 s1 s2 s3 s3 s4            
I3       s1 s2 - s3 s3 s4        
I4         s1 s1 s2 - s3 s3 s4    
I1             s1 - s2 s2 s3 s4 s4

so total time would be=13 ns

so option (c).

correct me if i am wrong...

17 votes -- Suvojit Mondal ( 431 points)
Selected Answer

Answer here is option A

Without data forwarding:

13 clock - WB and RD state non overlapping .

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13
IF RD EX MA WB                
  IF       RD EX MA WB        
          IF       RD EX MA WB

Here , WB and RD stage operate in Non- Overlaping mode .

 

11 clock - WB and RD state  overlapping .

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11
IF RD EX MA WB            
  IF     RD EX MA WB      
        IF     RD EX MA WB

Split Phase access between WB and RD means :

WB stage produce the output during the rising edge of the clock and RD stage fetch the output during the falling edge.

 

In Question it is  mentioned

for write access, the register read at RD state is used.

This means that for writing operands back to memory, register read at RD state is used (no operand forward for STORE instructions).

Note

  • As in any question in any subject unless otherwise stated we always consider the best case. So, do overlap - unless otherwise stated. But this is for only WB/RD
  1. Why there is stall for I2 in T3 and T4 ?
     RD is instruction decode and register read . IF we execute RD of I2 in T3 . data from memory will not get stored to R0 hence proper operands are not available at T3 . Perhaps I2 has to wait untill I1 write values to memory
  2. WB of I1 and RD of I2 are operating in same clock why it is so ?
    If nothing has mentioned in question . This senario is taken into consideration by default . It is because after MA operands will be available in register so RD and WB could overlap .

 

With data forwarding 

(Should be the case here as question says no operand forwarding for memory register for STORE instructions)

8 clock cycles

  1. Why there is a stall I2 in T4 ?
    Data is being forwarded from MA of I1 EX of I2 .MA operation of I1 must complete so that correct data will be available in register .
  2. Why RD of I2 in T3 ? Will it not fetch incorrect information if executed before Operand are forwarded from MA of I1 ?
     Yes. RD of I2 will definetly fetch INCORRECT data at T3 . But don't worry about it Operand Forwarding technique will take care of it .
  3. Why can't RD of I2 be placed in T4 ?
    Yes . We can place RD of I2 in T4 as well. But what is the fun in that ? pipeline is a technique used to reduce the execution time of instrcutions . Why do we need to make an extra stall ? Moreover there is one more problem which is discussed just below .After reading the below point  Just think if we had created a stall at T3 !
  4. Why can't RD of I3 be placed at  T4 ?
    This cannot be done . I3 cannot use RD because Previous instrction I2 should start next stage (EX) before current (I3) could utilise that(RD) stage . It is because data will be residing in buffers.  
  5. Can an operand being forwarded from one clock cycle to same clock cycle ?
     No . the previous clock cycle  must complete before data being forwarded . Unless split phase technique is used
  6. Cant there be a forwarding from EX stage(T3) of I1 to EX stage(T4) of I2 ?
    This is not possible . See what is happening in I1 . It is Memory Read .So data will be available in register after memory read only .So data cannot be forwarded from EX of I1 .
  7. In some case data is forwarded from MA and some case data is forwarded from EX Why it is so ?
    Data is forwarded when it is ready . It solely depends on the type of instruction .
  8. When to use Split-Phase ?

  [mostly when it is given in question that there is operand forwarding from A stage to B stage eg:http://gateoverflow.in/8218/gate2015-2_44 ]

Split-Phase can be used even when  no Operand Forwarding beacause they aren't releated

 

References

Similar Question

Discussions

32 votes -- pC ( 21.4k points)

answer = option A

$8$ cycles required with operand forwarding.

it is not given that RD and WB stage could overlap.


18 votes -- Amar Vashishth ( 28.7k points)
Selected Answer
(B) is the correct option for this question.

Execution time for Pipeline = (K+n-1)*execution_time   where k = no of stages in pipeline n = no of instructions
execution time = Max(all stages execution time)

D1 = (5+100-1)*4 = 416

D2 = (8+100-1)*2 = 214

Time saved using D2 = 416-214  =202
12 votes -- Manu Thakur ( 6k points)
Selected Answer
Delay slots in pipeline caused due to a branch instruction is 2 as after the 3rd stage of current instruction (during 4th stage) IF of next begins. Ideally this should be during 2nd stage.

So, for total no. of instructions = $10^9$ and $20\%$ branch, we have $0.2 \times 2 \times 10^9 = 4 \times 10^8 $ cycle penalty.

Since, clock speed is 1GHz and each instruction on average takes 1 cycle, total execution time in seconds will be

$10^9 / 10^9 + 4 \times 10^8 /10^9 \\= 1.4$
18 votes -- Arjun Suresh ( 294k points)
Selected Answer

(C) is the correct option for this question:

RAW

  1. I1 - I2 (R5)
  2. I2 - I3 (R6)
  3. I3 - I4 (R5)
  4. I4 - I5 (R6)


WAR 

  1. I2 - I3 (R5)
  2. I3 - I4 (R6)


WAW

  1. I1 - I3 (R5)
  2. I2 - I4 (R6)
16 votes -- Manu Thakur ( 6k points)
Selected Answer


This is what i have solved. so answer is 12

28 votes -- Manu Thakur ( 6k points)

answer = option D = $16$ cycles are required

14 votes -- Amar Vashishth ( 28.7k points)
Selected Answer

Answer: option B
considering EX to EX data forwarding.

  1 2 3 4 5 6 7 8
I1 IF ID EX WB        
I2   IF ID EX EX EX WB  
I3     IF ID     EX WB

 

18 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer

For non pipeline processor we have n instruction and each instruction take 12 cycle so total 12n instruction.

For pipeline processor we have each stage strict to 6ns so time to complete the n instruction is 6*6+ (n-1)*6

Lim n->∞     12n / 36 + (n-1)*6 = 12 / 6 =2

 

23 votes -- Arpit Dhuriya ( 3k points)
Selected Answer
(B) I and III

I - False    Bypassing can't handle all RAW hazard, consider when any instruction depends on the result of LOAD instruction, now LOAD updates register value at Memory Access Stage (MA), so data will not be available directly on Execute stage.

II - True, register renaming can eliminate all WAR Hazard.

III- False, It cannot completely eliminate, though it can reduce Control Hazard Penalties
28 votes -- Prateeksha Keshari ( 2.1k points)
Selected Answer

76. Answer is A. In order to avoid the pipeline delay due to conditional branch instruction, a suitable instruction is placed below the conditional branch instruction such that the instruction will be executed irrespective of whether branch is taken or not and won't affect the program behaviour. 

77. Answer is D) I4. The STORE instruction can be moved below the conditional branch instruction. Whether the branch is taken or not, STORE will be executed as the next instruction after conditional branch instruction, due to delayed branching.

Here, I3 is not the answer because the branch conditional variable R1 is dependent on it. Same for I1. Similarly, I4 has a dependency on I2 and hence I2 must be executed before I4. 

23 votes -- Arjun Suresh ( 294k points)
Selected Answer

What is Delayed Branching ?
One way to maximize the use of the pipeline, is to find an instruction that can be safely exeucted whether the branch is taken or not, and execute that instruction. So, when a branch instruction is encountered, the hardware puts the instruction following the branch into the pipe and begins executing it, just as in predict-not-taken. However, unlike in predict-not-taken, we do not need to worry about whether the branch is taken or not, we do not need to clear the pipe because no matter whether the branch is taken or not, we know the instruction is safe to execute.
More Read : https://www.cs.umd.edu/class/fall2001/cmsc411/projects/branches/delay.html


Moving I1 after branch

  • I1 is updating the value of  R2
  • R2 which is used to determine branch condition R1
  • Value of R2 is available after branch
    $\Rightarrow$ Cannot be moved

Moving I3 after branch

  • value of R1 is computed in this instruction
  • R1 is the branch condition
    $\Rightarrow$ Cannot be moved

Moving I4 after branch

  • I4 is simple store instruction used to store R1 in memory

  • program execution will have no effect if this is placed after conditional branch
    $\Rightarrow$ Can be moved

Moving I2 after branch

  • It update the memory location to place the storing of conditional branch instruction R1
  • If moved after branch , when compiler reaches I4 program execution will stop
    $\Rightarrow $ Canot be moved
  • However I2 Iboth can be moved after the branch instruction

Apt choice will be I

Hence Option D

13 votes -- pC ( 21.4k points)
Selected Answer

Here we have to keep in mind the phrase : 

a non pipelined single cycle processor

This signifies that instruction in a non pipelined scenario is incurring only a single cycle to execute entire instruction..Hence no conept of stage comes in case of single cycle non pipelined system..

The cycle time can be calculated from clock frequency given in non pipelined system = 100 MHz

Therefore clock cycle time in non pipelined system        =    1 / (100 * 106) s

                                                                                  =    10 ns

Now cycle time in pipelined system                               = max(stage delay + interface delay)

                                                                                  =  2.5 + 0.5

                                                                                  =  3 ns

Therefore , 

                 Speedup                                              =  CPInon pipeline * Cycle timenon pipeline / (CPIpipeline * Cycle timepipeline)

                                                                            =  1 * 10 / ( 1 * 3 )

                                                                            =  3.33

 [Since in case of non pipeline we have single cycle processor , so CPInon pipeline = 1 and CPIpipeline by default = 1]

 

Hence C) is the correct answer..

16 votes -- HABIB MOHAMMAD KHAN ( 76.8k points)
answer is C..

explanation:

for non pipeline system time required = 2.5  + 1.5 + 2.0 + 1.5 + 2.5 = 10

for pipelined system = max(stage delay) + max(latch delay) = 2.5 + 0.5 = 3

speedup = time in non pipelie / time in pipeline = 10/3 = 3.33
17 votes -- jayendra ( 8.1k points)
Selected Answer

Here bound of the loop are constants, therefore compiler will do the loop unrolling(If compiler won't then prefetcher will do) to increase the instruction level parallelism. And after loop unrolling 23 cycles are required for execution. Therefore correct answer would be (B).

PS: We assume the buffers between the pipeline stages can store multiple results in the form of a queue.

14 votes -- suraj ( 5.1k points)
this is the loop level level paralellism question but when we apply the loop level parallelism then we get 25 cycles so dis is not in option so we have to do it without loop level parallelism and the frst time loop output the result at 15 cc so total 30 cc for 2 iterations
13 votes -- Shreyans Dhankhar ( 2.6k points)
Selected Answer
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15
IF ID OF PO PO PO WO                
  IF ID OF     PO PO PO PO PO PO WO    
    IF ID     OF           PO WO  
      IF     ID           OF PO WO

Operand forwarding allows an output to be passed for the next instruction. Here from the output of PO stage of DIV instruction operand is forwarded to the PO stage of ADD instruction and similarly between ADD and SUB instructions. Hence, $15$cycles required.

http://www.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/forward.html

22 votes -- Arjun Suresh ( 294k points)
Selected Answer

Answer is (B) 2.5

In pipeline system Time taken is determined by the max delay at any stage i.e., 11ns plus the delay incurred by pipeline stages i.e., 1ns = 12ns. In non-pipeline system Delay = 5ns + 6ns + 11ns + 8ns = 30ns.

\therefore \text{ The speedup is } \dfrac{30}{12}=2.5 \text{ ns. }

 

26 votes -- Sona Praneeth Akula ( 4k points)
Selected Answer

Register renaming is done to eliminate WAR (Write after Read) and WAW (Write after Write) dependency between instructions which could have caused pipieline stalls. Hence (C) is the answer.

Example:

I1: Read A to B
I2: Write C to A

Here, there is a WAR dependency and pipeline would need stalls. In order to avoid it register renaming is done and 

Write C to A 
will be 
Write C to A' 

WAR dependency is actually called anti-dependency and there is no real dependency except the fact that both uses same memory location. Register renaming can avoid this. Similarly WAW also. 

http://people.ee.duke.edu/~sorin/ece252/lectures/4.2-tomasulo.pdf

21 votes -- Arjun Suresh ( 294k points)
Selected Answer

After pipelining we have to adjust the stage delays such that no stage will be waiting for another to ensure smooth pipelining (continuous flow). Since we can not easily decrease the stage delay, we can increase all the stage delays to the maximum delay possible. So, here maximum delay is 10ns. Buffer delay given is 1ns. So, each stage takes 11ns in total. 

FI of I9 can start only after the EI of I4. So, the total execution time will be

$$15 \times 11 = 165$$

 

  T1  T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15
I1 FI DI FO EI WO                    
I2   FI DI FO EI WO                  
I3     FI DI FO EI WO                
I4       FI DI FO EI WO              
          stall                    
            stall                  
              stall                
I9               FI DI FO EI WO      
I10                 FI DI FO EI WO    
I11                   FI DI FO EI WO  
I12                     FI DI FO EI WO

 

40 votes -- gatecse ( 13.4k points)

answer = option B

cycles in pink are stall cycles, at EI-4 it was notified to the system that instruction 9 has to be loaded next. 
We have completed execution in a total of 15 cycles where each cycle was (10+1)ns long,

Hence, answer = $15 \times 11 = 165$ns

13 votes -- Amar Vashishth ( 28.7k points)
Selected Answer
Time without pipeline = 6 stages=6 cycles

Time with pipeline =1+stall freqency*stall cycle

                           =1+.25*2

                            =1.5

Speed up = 6/1.5

               =4
23 votes -- aravind90 ( 589 points)
Selected Answer

 five stages:

(IF), instruction decode and register fetch (ID/RF),

instruction execution (EX),

memory access (MEM), and register writeback (WB)


  P old  design:

 with stage latencies 1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns

MAX( 1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns) = 2.2nsec

AVG instruction execution time is 

Tavg=(1+no of stalls*branch penality)*cycle time

      =(1+0.20*2)2.2                { branch peanlity is 2 because the next instruction pointer at the end                                                                       of the EX stage in the old design.}

       =3.08 nsec


Q :new DESIGN:

the designers decided to split the ID/RF stage into three stages (ID, RF1, RF2) each of latency 2.2/3 ns. Also, the EX stage is split into two stages (EX1, EX2) each of latency 1 ns. The new design has a total of eight pipeline stages.

time of stages in new design ={1 ns, 0.73ns, 0.73ns, 0.73ns , 1ns,1ns, 1 ns, and 0.75 ns}

(IF), instruction decode

register fetch (ID/RF) ----> further divided into 3 ie with latency 0.73 of each 

instruction execution (EX)--->further divided int 1 nsec of each)

memory access (MEM)

 register writeback (WB)

MAX( 1 ns, 0.73ns, 0.73ns, 0.73ns , 1ns,1ns, 1 ns, and 0.75 ns) =1nsec

 

AVG instruction execution time is 

Tavg=(1+no of stalls*branch penality)*cycle time

      =(1+0.20*5)1                { branch peanlity is 5 because the next instruction pointer at the end                                                                        of the EX2 stage in the new design.}

       =2 nsec


final result 

P/Q=3.08/2  =1.54

 

  

18 votes -- kunal ( 20.8k points)
cpi for first case = 2.2(1+2*.2) as the stall required is 2 and 2.2 is the maximum stage delay.

cpi for second state = 1*(1+5*.2) as now stall increase to 5 as there are five stages before the address is calculated and the maximum stage delay now is 1.

cpu_time1/cpu_time2 = 3.08/2 = 1.54
18 votes -- Arpit Dhuriya ( 3k points)
Selected Answer

frequency = 1 / max( time in stages)
for P3 it is 1/1 GHz

for P1 it is 1/2 = 0.5 GHz
for P2, it is 1/1.5 = 0.67 GHz
for P4, it is 1/1.1 GHz 

18 votes -- Arpit Dhuriya ( 3k points)
Selected Answer
Speed up = Old execution time/New execution time

Old execution time = CPI/2.5 = 4/2.5 = 1.6 ns

With pipelining, each instruction needs old execution time * old frequency/new frequency (without pipelining) = 1.6 * 2.5 / 2 = 2 ns

There are 5 stages and when there is no pipeline stall, this can give a speed up of up to 5 (happens when all stages take same number of cycles). In our case this time will be 2/5 = 0.4 ns. But clock frequency being 2 GHz, clock cycle is 1/2 GHz = 0.5 ns and a pipeline stage cannot be faster than this.

So, average instruction execution time after pipelining = max (0.4, 0.5) = 0.5 ns.

So, speed up compared to non-pipelined version = 1.6 / 0.5 = 3.2.
25 votes -- Arjun Suresh ( 294k points)

answer = 3.2

To compute cycle time, we know that a 2.5GHz processor means it completes 2.5G cycles in a second. so, for an instruction which on an average takes 4 cycles to get completed will take $4/2.5G$ seconds. 

On successful piplelining(i.e one which has no stalls) $CPI = 1$ as during it an instruction takes just one cycle time to get completed. So,

 

44 votes -- naresh1845 ( 1.4k points)
Selected Answer
  t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15
I1 IF OF PO PO PO WB                  
I2   IF OF - - PO PO PO PO PO WB        
I3     IF - - - - - - - OF PO WB    
I4       - - - - - - - IF - OF PO WB

It is mentioned in the question that operand forwarding takes place from PO stage to OF stage and not to PO stage. So, 15 clock cycles.

But since operand forwarding is from PO-OF, we can do like make the PO stage produce the output during the rising edge of the clock and OF stage fetch the output during the falling edge. This would mean the final PO stage and OF stage can be done in one clock cycle making the total number of cycles = 13. And 13 is the answer given in GATE key. 

  t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13
I1 IF OF PO PO PO WB              
I2   IF OF - - PO PO PO PO PO WB    
I3     IF - - - - - - OF PO WB  
I4       - - - - - - IF OF PO WB

Ref: http://www.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/forward.html

35 votes -- Arjun Suresh ( 294k points)
Selected Answer

Ref: Page 24 http://www2.cs.siu.edu/~cs401/Textbook/ch3.pdf

S1 is needed at time 1 and 5, so its forbidden latency is 5-1 = 4.

S2 is needed at time 2 and 4, so its forbidden latency is 4-2 = 2.

So, forbidden latency = (2,4,0) (0 by default is forbidden)

Allowed latency = (1,3,5) (any value more than 5 also). 

Collision vector (4,3,2,1,0) = 10101 which is the initial state as well.

From initial state we can have a transition after "1" or "3" cycles and we reach new states with collision vectors (10101 >> 1 + 10101 = 11111) and (10101 >> 3 + 10101 = 10111) respectively. These 2 becomes states 2 and 3 respectively. For "5" cycles we come back to state 1 itself.

From state 2 (11111), the new collision vector is 11111. We can have a transition only when we see first 0 from right. So, here it happens on 5th cycle only which goes to initial state. (Any transition after 5 or more cycles goes to initial state as we have 5 time slices). 

From state 3 (10111), the new collision vector is 10111. So, we can have a transition on 3, which will give (10111 >> 3 + 10101 = 10111) third state itself. For 5, we get the initial state. Thus all the transitions are complete.

 

 

State\Time 1 3 5
1 (10101) 2 3 1
2 (11111) - - 1
3 (10111) - 3 1

So, minimum length cycle is of length 3 either from 3-3 or from 1-3, 3-1. 

 

Not asked in question, still.

Pipeline throughput is the number of instructions initiated per unit time. So, with MAL = 3, we have 2 initiations in 1+3 = 4 units of time (one at time unit 1 and another at time unit 4). So, throughput = 2/4 = 0.5.

Pipeline efficiency is the % of time every stage of pipeline is being used. For the given question we can extend the reservation table and taking MAL = 3, we can initiate new tasks after every 3 cycles. So, we can consider the time interval from 4-6 in below figure. (The red color shows a stage not being used- affects efficiency).

Time→
  1 2 3 4 5 6 7 8 9 10 11
S1 X     Y X \ Z Y   A Z
S2   X   X Y \ Y Z   Z A
S3     X \  \ Y     Z    

 

Here (during cycles 4-6), stage 1 is used 2/3, stage 2 is used 2/3 and stage 3 is used 1/3. So, total stage utilization = (2+2+1)/9 = 5/9 and efficiency = 500/9% = 55.55%.


 

For simulation, Ref: http://www.ecs.umass.edu/ece/koren/architecture/ResTable/SimpRes/

Similar Question

11 votes -- Arjun Suresh ( 294k points)
Selected Answer
CASE 1

stages 5, max delay= 22(after adding buffer delay), number of instructions= 20

CASE 2

stages 6(since OF is split), max delay=14, number of instructions=20

so execution time is (K+N-1)*Max delay

speedup =528/350=1.508 (Execution time case 1/Execution time case 2)

so answer is 1.508
8 votes -- sriv_shubham ( 2.7k points)
Selected Answer
As there are 5 stages so 1 istruction will take 5 clock cycles, now all subsequent instructions will take 1-1 cycle each so Now without any stall cycle total cycles are - 5+(100-1) =104

now each 3rd instruction is taking one stall cycle so total stall cycles in 100 instructions are 100/3=33

total cycle to complete the 100 instructions are - 104+33=137cycles.

Max time taken from all the stages are 9ns and 1 ns is overhead then 10ns can be given as clock cycle time so total time taken is: 137*10=1370ns   ...:)
2 votes -- shayal chhabra ( 821 points)

Suppose a processor does not have any stack pointer registers, which of the following statements is true?

  1. It cannot have subroutine call instruction
  2. It cannot have nested subroutines call
  3. Interrupts are not possible
  4. All subroutine calls and interrupts are possible

The use of multiple register windows with overlap causes a reduction in the number of memory accesses for

  1. Function locals and parameters

  2. Register saves and restores

  3. Instruction fetches

  1. I only
  2. II only
  3. III only
  4. I, II and III

Answers: Runtime Environments

Selected Answer
i think ans is B.

because in nested subroutine calls we used to push old subroutines into stack and pointing most recent call with stack pointer.
14 votes -- jayendra ( 8.1k points)
Selected Answer

I. Functions locals and parameters
this is true because overlapped registers eliminates the need for memory accesses. we here got to use registers instead.

II. Register saves and restores
this is false bc we need to see where memory accesses are reduced here before also we were using register as it says Register saves... later also (i.e. after using multiple register windows) registers will are refered. So NO memory accesses are reduced here.

III. Instruction fetches
it has nothing to do with reduction in memory accesses.

Hence, option A is correct.

11 votes -- Amar Vashishth ( 28.7k points)

In an enhancement of a design of a CPU, the speed of a floating point until has been increased by 20% and the speed of a fixed point unit has been increased by 10%. What is the overall speedup achieved if the ratio of the number of floating point operations to the number of fixed point operations is 2:3 and the floating point operation used to take twice the time taken by the fixed point operation in the original design?

  1. 1.155
  2. 1.185
  3. 1.255
  4. 1.285
Consider two processors $P_1$ and $P_2$ executing the same instruction set. Assume that under identical conditions, for the same input, a program running on $P_2$ takes 25% less time but incurs 20% more CPI (clock cycles per instruction) as compared to the program running on $P_1$. If the clock frequency of $P_1$ is 1GHZ, then the clock frequency of $P_2$ (in GHz) is ______.

Answers: Speedup

Selected Answer
Speed up = Original time taken/ new time taken
Let x be the time for a fixed point operation
Original time taken = (3x + 2*2x)/5 = 7x/5
New time taken = ((3x/1.1) + (4x/1.2))/5 = 8x/1.32*5
So, speed up = 7*1.32/8 = 1.155
22 votes -- gatecse ( 13.4k points)
Selected Answer

CPU TIME (T) = No. of Instructions( I ) x No. of Cycles Per Instruction (c) x Cycle Time (t)

                                                                  OR

CPU TIME (T) = $\frac{No. of Instructions(I) \times No. of Cycles Per Instruction (c)}{Clock frequency (f)}$

$\rightarrow T = I_{c} \times CPI \times F^{-1}$

$\rightarrow \frac{T \times F}{CPI} = I_{c}$

P1 & P2 executing same instruction set So,  No. of Instructions same for both = I= I= I

If P1 takes T1 time $\rightarrow$  T2 = 0.75 x T1 $\rightarrow$  $\frac{T_{2}}{ T_{1}}$ = 0.75

If P1 incurs C1 clock cycles per instruction $\rightarrow$  C2 = 1.2 x  C1 $\rightarrow \frac{C_{2}}{C_{1}}$ = 1.2

Since I is same for both $\rightarrow \frac{ ( f_{1} \times T_{1} )}{c1} = \frac{ ( f_{2} \times T_{2} )}{c2}$  and  f1 = 1 GHz 

 $\rightarrow$ F2 =$(\frac{C_{2}}{C_{1}}) \times (\frac{T_{1}}{T_{2}}) \times F_{1}$ = $\frac{1.2 \times 1 GHz}{0.75}$ = 1.6 GHz

Hence,the clock frequency of P2  is = 1.6 GHz.

28 votes -- Suraj Kaushal ( 361 points)

Execution time (T) = CPI * #instructions * time for a clock
= CPI * #instructions / clock frequency (F)

Given P1 and P2 execute the same set of instructions and 
T2 = 0.75 T1,
CPI2 = 1.2 CPIand 
F1 = 1GHz.

So, 

$\frac{T_1}{CPI_1} \times F_1 =\frac{T_2}{CPI_2} \times F_2$

$\frac{T_1}{CPI_1} \times 1 GHz = \frac{0.75 T_1} {1.2 CPI_1} \times F_2 $

$ \implies F_2 = \frac {1.2}{0.75} GHz$

$=1.6 GHz$

10 votes -- Arjun Suresh ( 294k points)

03. Choose the correct alternatives (more than one may be correct) and write the corresponding letters only:

(iii) The total size of address space in a virtual memory system is limited by

  1. the length of MAR
  2. the available secondary storage
  3. the available main memory
  4. all of the above
  5. none of the above

 

Consider a system with a two-level paging scheme in which a regular memory access takes 150 nanoseconds, and servicing a page fault takes 8 milliseconds. An average instruction takes 100 nanoseconds of CPU time, and two memory accesses. The TLB hit ratio is 90%, and the page fault rate is one in every 10,000 instructions. What is the effective average instruction execution time?

  1. 645 nanoseconds
  2. 1050 nanoseconds
  3. 1215 nanoseconds
  4. 1230 nanoseconds

In an instruction execution pipeline, the earliest that the data TLB (Translation Lookaside Buffer) can be accessed is

  1. before effective address calculation has started

  2. during effective address calculation

  3. after effective address calculation has completed

  4. after data cache lookup has completed

Answers: Virtual Memory

Selected Answer
Answer is (a) and (b)

Virtual memory concept is independent of size of main memory and depends only on the availability of the secondary storage.

MAR holds the address generated by CPU and this obviously limits the total virtual memory address space.
14 votes -- Kalpna Bhargav ( 3.2k points)
Selected Answer

Average Instruction execution time

 = Average CPU execution time + Average time for getting data(instruction operands from memory for each instruction)

 =   Average CPU execution time
   + Average address translation time for each instruction
   + Average memory fetch time for each instruction
   + Average page fault time for each instruction

 = $100+2\Big(0.9 (0) + 0.1 (2 \times 150)\Big) + 2\times 150 + \frac{1}{10000} \times 8 \times 10^6$ 

(Page Fault Rate per 10,000 instruction is directly given in question.Two memory accesses per instruction and  hence we need 2 $\times$ address translation time for average instruction execution time)

[ TLB access time assumed as 0 and 2 page tables need to be accessed in case of TLB miss as the   system uses two-level paging ]

= $100 + 60 + 300 + 800$

= $1260 ns$

56 votes -- Arjun Suresh ( 294k points)

(gate 2004 qus)

EAIET=[CPU execution time]+(TLB Hit*2*Memory access time +TLB miss ratio(page table access time+2*memory access time ))+{page fault probability*page fault service time}
That is 100ns+(0.9*2*150+0.1(150+150+2*150))+800ns.....[800ns herecomes from 0.0001*8*10^6]

=100ns+2*(135+30)ns+800ns
=100ns+2*165ns+800ns
=1230ns

12 votes -- Rohan Ghosh ( 1.9k points)
Selected Answer
C as only after the calculation of Virtual address you can look up in the TLB
13 votes -- Shaun Patel ( 6.9k points)

Which of the following is/are example(s) of stateful application layer protocol?

  1. HTTP
  2. FTP
  3. TCP
  4. POP3

 

  1. (i) and (ii) only
  2. (ii) and (iii) only
  3. (ii) and (iv) only
  4. (iv) only

Consider the three commands : PROMPT, HEAD and RCPT.
Which of the following options indicate a correct association of these commands with protocols where these are used?

  1. HTTP, SMTP, FTP
  2. FTP, HTTP, SMTP
  3. HTTP, FTP, SMTP
  4. SMTP, HTTP, FTP

HELO and PORT, respectively, are commands from the protocols

  1. FTP and HTTP
  2. TELNET and POP3
  3. HTTP and TELNET
  4. SMTP and FTP

What is the maximum size of data that the application layer can pass on to the TCP layer below?

  1. Any size

  2. $2^{16}$ bytes - size of TCP header

  3. $2^{16}$ bytes

  4. 1500 bytes

Provide the best matching between the entries in the two columns given in the table below:

I. Proxy Server a. Firewall
II. Kazaa, DC++ b. Caching
III. Slip c. P2P
IV. DNS d. PPP
  1. I-a, II-d, III-c, IV-b
  2. I-b, II-d, III-c, IV-a
  3. I-a, II-c, III-d, IV-b
  4. I-b, II-c, III-d, IV-a

Consider the different activities related to email.

  • m1: Send an email from mail client to mail server
  • m2: Download an email from mailbox server to a mail client
  • m3: Checking email in a web browser

Which is the application level protocol used in each activity?

  1. m1: HTTP  m2: SMTP  m3: POP
  2. m1: SMTP  m2: FTP  m3: HTTP
  3. m1: SMTP  m2: POP  m3: HTTP
  4. m1: POP  m2: SMTP  m3: IMAP
The protocol data unit (PDU) for the application layer in the Internet stack is

(A) Segment
(B) Datagram
(C) Message
(D) Frame

Answers: Application Layer Protocols

Selected Answer

HTTP - stateless

FTP - stateful

TCP - not application layer protocol

POP3 - Stateful

And according to options answer would be C)
 

29 votes -- Abhilash Panicker ( 8.8k points)
Selected Answer
RCPT->Recipient to,As the name suggest it is used in SMTP(Simple Mail Transfer protocol)

HEAD->this is used in HTTP to get the meta-information,to decide the category of packet.

Prompt->turns off prompting for individual files when using the mget or mput commands
14 votes -- nagalla pruthvi ( 883 points)
Selected Answer
Answer: D

Ref:

http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#SMTP_transport_example

http://en.wikipedia.org/wiki/File_Transfer_Protocol#Protocol_overview
6 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer
The answer is C.

Where quick response is needed, there UDP is preferred.
15 votes -- Gate Keeda ( 19.1k points)
Selected Answer

OPTION A

Its transport layers responsibility to divide data in to fragments/ packets. Application layer need not worry about it.

14 votes -- Desert_Warrior ( 9.6k points)
Selected Answer
Ans is C) I-a, II-c, III-d, IV-b

I. Proxy Server ==> Proxy Server and Firewall can be combined => a. Firewall

II. Kazaa, DC++ => These are P2P application. c. P2P

III. Slip => . P2P Slip is predecessor of PPP. => d. PPP

IV. DNS => DNS responses are often catedh = > b. Caching
7 votes -- Akash ( 43.8k points)
Selected Answer
Sender/Client send mail from client mailbox to server mail box with the help of SMTP protocol whereas Receiver or Server retreive the mail from its mail box to reading using POP3 protocal.

When we want to take the help process to see email in browser in that case we HTTP.Because It create buetiful page of mailbox with the help of process.
10 votes -- Paras Singh ( 5.5k points)
Selected Answer
(C) Message is answer.

For Application, Presentation and Session layers, the PDU is message

For Transport layer, PDU is segment for TCP and datagram for UDP

For Network layer, PDU is packet

For Datalink layer, PDU is frames

For physical layer, PDU is stream of bits
29 votes -- gatecse ( 13.4k points)

A bit-stuffing based framing protocol uses an 8-bit delimiter pattern of 01111110. If the output bit-string after stuffing is 01111100101, then the input bit-string is

  1. 0111110100
  2. 0111110101
  3. 0111111101
  4. 0111111111

Answers: Bit Stuffing

Selected Answer
011111 *one zero emitted here* 0101
14 votes -- abhishek1317 ( 299 points)

Consider the diagram shown below where a number of LANs are connected by (transparent) bridges. In order to avoid packets looping through circuits in the graph, the bridges organize themselves in a spanning tree. First, the root bridge is identified as the bridge with the least serial number. Next, the root sends out (one or more) data 
units to enable the setting up of the spanning tree of shortest paths from the root bridge to each bridge. 

Each bridge identifies a port (the root port) through which it will forward frames to the root bridge. Port conflicts are always resolved in favour of the port with the lower index value. When there is a possibility of multiple bridges forwarding to the same LAN (but not through the root port), ties are broken as follows: bridges closest to the root get preference and between such bridges, the one with the lowest serial number is preferred.

For the given connection of LANs by bridges, which one of the following choices represents the depth first traversal of the spanning tree of bridges?

  1. B1, B5, B3, B4, B2
  2. B1, B3, B5, B2, B4
  3. B1, B5, B2, B3, B4
  4. B1, B3, B4, B5, B2

Consider the diagram shown below where a number of LANs are connected by (transparent) bridges. In order to avoid packets looping through circuits in the graph, the bridges organize themselves in a spanning tree. First, the root bridge is identified as the bridge with the least serial number. Next, the root sends out (one or more) data 
units to enable the setting up of the spanning tree of shortest paths from the root bridge to each bridge. 

Each bridge identifies a port (the root port) through which it will forward frames to the root bridge. Port conflicts are always resolved in favour of the port with the lower index value. When there is a possibility of multiple bridges forwarding to the same LAN (but not through the root port), ties are broken as follows: bridges closest to the root get preference and between such bridges, the one with the lowest serial number is preferred.

Consider the spanning tree B1, B5, B3, B4, B2 for the given connection of LANs by bridges, that represents the depth first traversal of the spanning tree of bridges. Let host H1 send out a broadcast ping packet. Which of the following options represents the correct forwarding table on B3?

Answers: Bridges

Selected Answer
  • First select B1 as the root bridge. This selection is based on lower serial ID as given in the question.
  • All ports of root bridge are designated ports and they are in forwarding state.

 

 

 


 

  • Every non-root bridge must have a root port. All root ports are placed in forwarding state.
  • Root port is the port that is closest to the root bridge
  • For example, we observe bridge $B3$.
  • It has two ports leading to the root bridge. If we assume bridge-to-bridge cost as $1$ unit, both these paths have the same cost. Then we will select the lower port index as given in the question as the root port for the bridge $B3$.
  • port 3 of $B3$ becomes the root port.

 

 

 

  • Using the same logic we will find out the root ports for $B5$ also.

 

  • Coming to $B4$ for root port selection.

 

 

  • We have again two different paths with the same cost. We will select port $1$ as the root port for $B4$ 
  • Using the same logic port $1$ is selected as root port for $B2$ as well.

 

 

 


 

  • Now we have to consider the designated ports
  • The designated ports are the ports responsible for forwarding traffic onto a network segment
  • We have total $6$ network segments or $LAN$'s.  Each segment will have one designated ports.

 

 

  • $S1$ and $S2$ are connected to the root bridge itself via two designated ports. So no issue with segments $S1$ and $S2$ traffic.
  • Let's consider other segments.
  • For example $S3$.

 

 

  • $B2$,$B3$,$B4$,$B5$ all can forward traffic to this segment $S3$
  • According to the question in this situation, we will consider only those bridges which are nearer to the root bridge $B1$.
  • $B5$ and $B3$ are both nearer to the root bridge.
  • Then we will break this tie by selecting the lower bridge serial ID i.e. $B3$ is selected and designated port is port $1$ of $B3$ for the segment $S3$

 

 

  • Similarly, we can choose designated ports for $S4$ , $S5$ and $S6$

 

 

 

 


 

  • Remaining all ports are blocked ports.

 

 

  • This the final spanning Tree

 

 

 

  • DFS traversal will give answer as A

28 votes -- Debashish Deka ( 51.4k points)

 

 

This makes it clear that:
Q.82 answer = option A
Q.83 answer = option A

13 votes -- Amar Vashishth ( 28.7k points)
Selected Answer

Option is A  see this as we go with options , option A match only with this picture.

 

4 votes -- Bikram ( 44.7k points)

Match the pairs in the following questions:

(A) Cyclic redundancy code (p) Error correction
(B) Serial communication (q) Wired-OR
(C) Open collector (r) Error detection
(D) Hamming code  (s) RS-232-C

 

Assume that each character code consists of 8 bits. The number of characters that can be transmitted per second through an asynchronous serial line at 2400 baud rate, and with two stop bits is

  1. 109
  2. 216
  3. 218
  4. 219

 

Purpose of a start bit in RS 232 serial communication protocol is

  1. to synchronize receiver for receiving every byte
  2. to synchronize receiver for receiving a sequence of bytes
  3. a parity bit
  4. to synchronize receiver for receiving the last byte

 

In serial data transmission, every byte of data is padded with a '0' in the beginning and one or two '1's at the end of byte because

  1. receiver is to be synchronized for byte reception
  2. receiver recovers lost '0's and '1's from these padded bits
  3. padded bits are useful in parity computation
  4. none of the above

How many 8-bit characters can be transmitted per second over a 9600 baud serial communication link using asynchronous mode of transmission with one start bit, eight data bits, two stop bits and one parity bit?

  1. 600
  2. 800
  3. 876
  4. 1200

A serial transmission T1 uses 8 information bits, 2 start bits, 1 stop bit and 1 parity bit for each character. A synchronous transmission T2 uses 3 eight-bit sync characters followed by 30 eight-bit information characters. If the bit rate is 1200 bits/second in both cases, what are the transfer rates of  T1 and T2?

  1. 100 characters/sec, 153 characters/sec
  2. 80 characters/sec, 136 characters/sec
  3. 100 characters/sec, 136 characters/sec
  4. 80 characters/sec, 153 characters/sec

Let us consider a statistical time division multiplexing of packets. The number of sources is 10. In a time unit, a source transmits a packet of 1000 bits. The number of sources sending data for the first 20 time units is 6, 9, 3, 7, 2, 2, 2, 3, 4, 6, 1, 10, 7, 5, 8, 3, 6, 2, 9, 5 respectively. The output capacity of multiplexer is 5000 bits per time unit. Then the average number of backlogged of packets per time unit during the given period is

  1. 5
  2. 4.45
  3. 3.45
  4. 0

A broadcast channel has 10 nodes and total capacity of 10 Mbps. It uses polling for medium access. Once a node finishes transmission, there is a polling delay of 80 μs to poll the next node. Whenever a node is polled, it is allowed to transmit a maximum of 1000 bytes. The maximum throughput of the broadcast channel is

  1. 1 Mbps
  2. 100/11 Mbps
  3. 10 Mbps
  4. 100 Mbps

Consider a source computer $(S)$ transmitting a file of size $10^{6}$ bits to a destination computer $(D)$ over a network of two routers $(R_{1}\text{ and }R_{2})$ and three links $(L_{1},L_{2},\text{ and } L_{3})$. $L_{1}$ connects $S$ to $R_{1}$; $L_{2}$ connects $R_{1}$ to $R_{2}$; and $L_{3}$ connects $R_{2}$ to $D$. Let each link be of length 100 km. Assume signals travel over each link at a speed of $10^{8}$ meters per second. Assume that the link bandwidth on each link is 1 Mbps. Let the file be broken down into 1000 packets each of size 1000 bits. Find the total sum of transmission and propagation delays in transmitting the file from S to D?

(A) 1005 ms
(B) 1010 ms
(C) 3000 ms
(D) 3003 ms

Answers: Communication

Selected Answer
(A) Cyclic redundancy code  (r) Redundancy checking technique (error detection)
(B) Serial communication (s) RS-232-C
(C) Open collector (q) wired OR
(D) Hamming code  (p) error correction method

Explainations
A) A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data

B) RS-232C is the physical interface that your computer uses to talk to and exchange data with your modem and other serial devices

C) ref: http://www.ni.com/white-paper/3544/en/

D)  The Hamming code is an error correction method using redundant bits. By rearranging the order of bit transmission of the data units, the Hamming code can correct burst errors.

1 votes -- Lokesh . ( 9.8k points)
Selected Answer
Total bit per character = 8 bit data + 2 stop bit +1 start bit (#) = 11 bits
no of characters = 2400/11 = 218.18

Since it is asked for transmitted characters we take floor and answer is 218.
13 votes -- Digvijay ( 47k points)
A) Because RS 232 requires  a start before each byte transmission for synchronization..
2 votes -- Hunaif ( 485 points)
Selected Answer
In serial communication in beginning '0' is padded as start bit and one or two '1's are padded as stop bit.

and those bits are for synchronize receiver

http://www.powerbasic.com/support/help/pbcc/start_and_stop_bits.htm

http://esd.cs.ucr.edu/labs/serial/serial.html
6 votes -- srestha ( 58.4k points)
Selected Answer

The baud rate is the rate at which information is transferred in a communication channel. Serial ports use two-level (binary) signaling, so the data rate in bits per second is equal to the symbol rate in bauds. Ref: https://en.wikipedia.org/wiki/Serial_port#Speed.

"9600 baud" means that the serial port is capable of transferring a maximum of 9600 bits per second."

So, transmission rate here = 9600 bps

An eight bit data (which is a char) requires 1 start bit, 2 stop bits and 1 parity bit = 12 bits.

So, number of characters transmitted per second = 9600 / 12 = 800

15 votes -- Arjun Suresh ( 294k points)
Selected Answer
  1. T1: 1 char. = ( 8 + 2 + 1 + 1) = 12 bit

          Transfer Rate = 1200/12 = 100 char/sec.

          T2: Transfer character in bits = 24 + 240 = 264 bits

          In  264 = 30 character

          Then  1200 = ?

          264/30 = 1200/X

          X = 136.3 character / sec.

          so correct option is (C)

14 votes -- Shreyans Dhankhar ( 2.6k points)
Selected Answer
Answer is B

Here we can spent at max 5 packets per Time unit 5000 /1000.

So Whatever which is not sent is backlog.

So

First Time Unit => 6 ,

Backlog in First time unit => 6-5 => 1 This one gets added to next Time units load

Second time unit => 9 + 1 (One from Previous Time Unit)

Backlog in Second time Unit = 10-5 => 5 (This one gets added to next Time Units load.)

 

Total Backlog this way  = 1+5+3+5+2+0+0+0+0+1+0+5+7+7+10+8+9+6+10+10=89

Avg Backlog=89/20=4.45

The average number of backlogged of packets per time unit during the given period is 4.45 , (Option B) .
19 votes -- Akash ( 43.8k points)
Selected Answer
Propagation time is not given so that's negligible here.
efficiency = transmission time/(transmission time + polling time)
Tx=1000 bytes/10Mbps =800μs.
Delay because of polling is = 80 μs
Efficiency of channel , e =transmission delay/ (total delay) =800/(800+80)= 10/11
Maximum throughput is =(10/11) * 10 Mbps= 100/11 Mbps
23 votes -- Manu Thakur ( 6k points)
Selected Answer
routers are store and forward devices.
Propagation time = 100km/10^8m/s = 1 milli second
Transmission time for a pocket = 1000/10^6 = 1 milli second

Packets will be forwarded in pipelined manner, after the first packet reaches the receiver, in every 1 ms a new one arrives.

now Time taken by packet no 1 to reach destination is :
1 ms (TT at sender) + 1 ms (PT from sender to R1) + 1ms  (TT at R1) + 1ms(PT from R1 to R2) + 1ms (TT at R2) + 1ms ( PT from R2 to destination)
= 6ms

So, time for packet 1000 = 6ms + 999ms
                  = 1005ms
31 votes -- Digvijay ( 47k points)

On a TCP connection, current congestion window size is Congestion Window = 4 KB. The window size advertised by the receiver is Advertise Window = 6 KB. The last byte sent by the sender is LastByteSent = 10240 and the last byte acknowledged by the receiver is LastByteAcked = 8192. The current window size at the sender is

  1. 2048 bytes
  2. 4096 bytes
  3. 6144 bytes
  4. 8192 bytes

In the slow start phase of the TCP congestion algorithm, the size of the congestion window

  1. does not increase

  2. increase linearly

  3. increases quadratically

  4. increases exponentially

Consider an instance of TCP’s Additive Increase Multiplicative Decrease (AIMD) algorithm where the window size at the start of the slow start phase is 2 MSS and the threshold at the start of the first transmission is 8 MSS. Assume that a timeout occurs during the fifth transmission. Find the congestion window size at the end of the tenth transmission.

(A) 8 MSS
(B) 14 MSS
(C) 7 MSS
(D) 12 MSS

Let the size of congestion window of a TCP connection be 32 KB when a timeout occurs. The round trip time of the connection is 100 msec and the maximum segment size used is 2 KB. The time taken (in msec) by the TCP connection to get back to 32 KB congestion window is _________.

Answers: Congestion Control

Selected Answer

Ans should be (B)

Current Sender window = min (Congestion Window, Advertised Window)= min(4KB, 6KB)= 4KB

29 votes -- srestha ( 58.4k points)
Selected Answer

increase is exponential in the Slow Start Phase.

answer = option D

9 votes -- Amar Vashishth ( 28.7k points)
Selected Answer

The Answer is correct , but method of solving is wrong .

At

t=1, =>2mss

t=2, =>4mss

t=3, =>8mss

t=4, =>9mss (after threshold additive increase)

t=5, =>10mss (fails)

Threshold will be reduced by n/2 i.e. 10/2 = 5.

t=6, =>1mss, 

(The window size is always = 1 mss after a time out irrespective of what value it started from, http://www.cs.ccsu.edu/~stan/classes/CS490/Slides/Networks4-Ch3-5.pdf )

t=7 =>2mss

t=8, =>4mss

t=9, =>5mss

t=10, =>6mss.

So at the end of 10th sucessful transmission ,the the congestion window size will be (6+1) = 7mss.

18 votes -- Harsh181996 ( 2.7k points)

At

t=1, =>2mss

t=2, =>4mss

t=3, =>8mss

t=4, =>9mss (after threshold additive increase)

t=5, =>10mss (fails)

Threshold will be reduced by n/2 i.e. 10/2 = 5.

t=6, =>2mss

t=7 =>4mss

t=8, =>5mss

t=9, =>6mss

t=10, =>7mss.

So at the end of 10th transmission congestion window size will be 8 mss.

24 votes -- Gate Keeda ( 19.1k points)
Selected Answer

Ans:  Given that at the time of Time Out, Congestion Window Size is $32KB$ and RTT = $100ms$ ,

          When Time Out occurs, for the next round of Slow Start, 

          Threshold = $\frac{size\ of\ congestion\ window}{2}$ ,

          Threshold = 16KB

Suppose  we have a slow start ==>> $2KB \mid 4KB \mid 8KB \mid 16KB$ (As the threshold is reached,  Additive increase starts) $\mid 18KB \mid 20KB \mid 22KB \mid 24KB \mid 26KB \mid 28KB \mid 30KB \mid 32KB$

Here | (vertical line)  is representing RTT so the total number of vertical lines is $11 * 100ms$ ==>> $1100 msec$ and so this is the answer...

 

50 votes -- Jay ( 1.2k points)

Consider the following message M = 1010001101. The cyclic redundancy check (CRC) for this message using the divisor polynomial x5 + x4 + x2 + 1 is :

  1. 01110
  2. 01011
  3. 10101
  4. 10110

The message 11001001 is to be transmitted using the CRC polynomial $x^3 +1$ to protect it from errors. The message that should be transmitted is:

 

  1. 11001001000

  2. 11001001011

  3. 11001010

  4. 110010010011

 

A computer network uses polynomials over $GF(2)$ for error checking with 8 bits as information bits and uses $x^{3}+x+1$ as the generator polynomial to generate the check bits. In this network, the message 01011011 is transmitted as

(A) 01011011010                                                        (B) 01011011011

(C) 01011011101                                                        (D) 01011011100

Answers: Crc Polynomial

Selected Answer
Answer: A

Divide 101000110100000 by 110101 to get 01110 as remainder. And as we know, remainder is the CRC.
11 votes -- Rajarshi Sarkar ( 35k points)

crc calculation is

ans is a

 

11 votes -- kvkumar ( 4.1k points)
Selected Answer

answer - B

degree of generator polynomial is 3 hence 3 bits are appended before performing division

after performing division using 2's complement arithmetic remainder is 011

the remainder is appended to original data bits

By Anurag Pandey

14 votes -- ankitrokdeonsns ( 9.1k points)
Selected Answer
Solution:

 

6 votes -- Smriti012 ( 3.1k points)

Anarkali digitally signs a message and sends it to Salim. Verification of the signature by Salim requires.

  1. Anarkali's public key.
  2. Salim's public key.
  3. Salim's private key.
  4. Anarkali's private key.

 

Answers: Cryptography

Selected Answer

In digital signature,
Alice/Anarkali/sender :P
First encrypts with own private key then again encrypts with Receivers/Bob/Salim's Public key.

Thus to decrypt, receiver will need sender's/Anarkali's public key after decrypting it with own/receiver's private key.

So answer is A !

23 votes -- Shashank Chavan ( 3.4k points)

A network has a data transmission bandwidth of $20 \times 10^{6}$ bits per second. It uses CSMA/CD in the MAC layer. The maximum signal propagation time from one node to another node is $40$ microseconds. The minimum size of a frame in the network is __________ bytes.

 

 

Which of the following statements is TRUE about CSMA/CD

  1. IEEE 802.11 wireless LAN runs CSMA/CD protocol
  2. Ethernet is not based on CSMA/CD protocol
  3. CSMA/CD is not suitable for a high propagation delay network like satellite network
  4. There is no contention in a CSMA/CD network

A network with CSMA/CD protocol in the MAC layer is running at 1 Gbps over a 1 km cable with no repeaters. The signal speed in the cable is 2 x 108 m/sec. The minimum frame size for this network should be

  1. 10000 bits
  2. 10000 bytes
  3. 5000 bits
  4. 5000 bytes

The minimum frame size required for a CSMA/CD based computer network running at 1 Gbps on a 200m cable with a link speed of 2 × 108m/s is

  1. 125 bytes
  2. 250 bytes
  3. 500 bytes
  4. None of the above

Consider a CSMA/CD network that transmits data at a rate of 100 Mbps ($10^8$ bits per second) over a 1 km (kilometer) cable with no repeaters. If the minimum frame size required for this network is 1250 bytes, what is the signal speed (km/sec) in the cable?

 

  1. 8000
  2. 10000
  3. 16000
  4. 20000

Answers: Csma Cd

Selected Answer

Since CSMA/CD
Transmission Delay = RTT
hence,
L=B  $\times$RTT
L=B  $\times$ 2 $\times$ Tpropagation delay
L=(20 $\times$ 106) $\times$ 2 $\times$ 40 $\times$ 10-6
  =20 $\times$ 2$\times$ 40
 =1600bits
 =200bytes
Hence 200Bytes is the answer.

16 votes -- Shashank Chavan ( 3.4k points)
Selected Answer
Answer->C

CSMA/CD was used in early days,802.3 not in 802.11

There will be contention in this protocol.

Ethernet is based on csma/cd early in 1980s,
12 votes -- nagalla pruthvi ( 883 points)
Selected Answer

Minimum frame size is needed to ensure that collisions are detected properly. The minimum frame size ensures that before a frame is completely send, it would be notified of any possible collision and hence collision detection works perfectly.

In CSMA/CD a sender won't send a packet if it senses that another sender is using it. So, assume a sender A and a receiver B. When sender sends a packet, receiver might use the cable until it is notified that a packet is being send to it. The receiver will be notified as soon as the first bit arrives that a packet is coming and it won't send any packet after this until that packet is finished. So, in the worst case for collision, receiver will transmit a packet back to the sender just before the first bit of the packet reaches it. (If $t_d$ is the propagation delay of the channel, this time would be just $t_d$). In this case, surely there will be collision. But for the sender to detect it, it should be notified of B's packet before the sending of the first packet finishes. i.e., when B's packet arrives at A (takes another $t_d$ time), A shouldn't have finished transmission of the first packet for it to detect a collision. i.e., A should be still continuing the sending of the packet in this time interval of $2 \times t_d$. Thus,

The amount of bits that can be transmitted by A in $2\times t_d$ time should be less than the frame size (S) (sending of the frame shouldn't finish in this time)

Amount of bits transmitted in time $t$ is $\text{bandwidth} \times t$ and propagation delay- $t_d$ is $\frac{\text{distance}}{\text{link speed}}$

So, $S  \geq 2\times \text{bandwidth} \times t_d$

$\geq 2 \times 10^9 \times \frac{ 1000}{ 2 \times 10^8}$

$\geq 10000$ bits

17 votes -- Arjun Suresh ( 294k points)
Selected Answer

Minimum frame size is needed to ensure that collisions are detected properly. The minimum frame size ensures that before a frame is completely send, it would be notified of any possible collision and hence collision detection works perfectly.

In CSMA/CD a sender won't send a packet if it senses that another sender is using it. So, assume a sender A and a receiver B. When sender sends a packet, receiver might use the cable until it is notified that a packet is being send to it. The receiver will be notified as soon as the first bit arrives that a packet is coming and it won't send any packet after this until that packet is finished. So, in the worst case for collision, receiver will transmit a packet back to the sender just before the first bit of the packet reaches it. (If $t_d$ is the propagation delay of the channel, this time would be just $t_d$). In this case, surely there will be collision. But for the sender to detect it, it should be notified of B's packet before the sending of the first packet finishes. i.e., when B's packet arrives at A (takes another $t_d$ time), A shouldn't have finished transmission of the first packet for it to detect a collision. i.e., A should be still continuing the sending of the packet in this time interval of $2 \times t_d$. Thus,

The amount of bits that can be transmitted by A in $2\times t_d$ time should be less than the frame size (S) (sending of the frame shouldn't finish in this time)

Amount of bits transmitted in time $t$ is $\text{bandwidth} \times t$ and propagation delay- $t_d$ is $\frac{\text{distance}}{\text{link speed}}$

So, $S  \geq 2\times \text{bandwidth} \times t_d$

$\geq 2 \times 10^9 \times \frac{ 200}{ 2 \times 10^8}$

$\geq 2000$ bits

$\geq 250$ bytes

17 votes -- Arjun Suresh ( 294k points)
Selected Answer

For collision to be detected, the frame size should be such that the transmission time of the frame should be greater than twice the propagation delay (So, before the frame is completely sent, any possible collision will be discovered).

So, 1250 * 8 /(108) >= 2 * 1 / x

x = 2 * 104 = 20000

18 votes -- Arjun Suresh ( 294k points)

Count to infinity is a problem associated with

  1. link state routing protocol.
  2. distance vector routing protocol
  3. DNS while resolving host name
  4. TCP for congestion control

For the network given in the figure below, the routing tables of the four nodes A, E, D and G are shown. Suppose that F has estimated its delay to its neighbors, A, E, D and G as 8, 10, 12 and 6 msecs respectively and updates its routing table using distance vector routing technique.

           

 

GATE2007-IT_60GATE2007-IT_60

 

  1.  A 8
     B  20
     C  17
     D  12
     E  10
     F  0
     G  6
  2.  A  21
     B  8
     C  7
     D  19
     E  14
     F  0
     G  22
  3.  A  8
     B  20
     C  17
     D  12
     E  10
     F  16
     G  6
  4.  A  8
     B  8
     C  7
     D  12
     E  10
     F  0
     G  6

Consider a network with 6 routers R1 to R6 connected with links having weights as shown in the following diagram.

All the routers use the distance vector based routing algorithm to update their routing tables. Each router starts with its routing table initialized to contain an entry for each neighbour with the weight of the respective connecting link. After all the routing tables stabilize, how many links in the network will never be used for carrying any data?

  1. 4
  2. 3
  3. 2
  4. 1

Consider a network with 6 routers R1 to R6 connected with links having weights as shown in the following diagram.


Suppose the weights of all unused links are changed to 2 and the distance vector algorithm is used again until all routing tables stabilize. How many links will now remain unused?

  1. 0
  2. 1
  3. 2
  4. 3

 

Consider a network with five nodes, N1 to N5, as shown as below.

The network uses a Distance Vector Routing protocol. Once the routes have been stabilized, the distance vectors at different nodes are as following.

N1: (0, 1, 7, 8, 4)

N2: (1, 0, 6, 7, 3)

N3: (7, 6, 0, 2, 6)

N4: (8, 7, 2, 0, 4)

N5: (4, 3, 6, 4, 0)

Each distance vector is the distance of the best known path at that instance to nodes, N1 to N5, where the distance to itself is 0. Also all links are symmetric and the cost is identical in both directions. In each round, all nodes exchange their distance vectors with their respective neighbors. Then all nodes update their distance vectors. In between two rounds, any change in cost of a link will cause the two incident nodes to change only that entry in their distance vectors.

 

The cost of link N2-N3 reduces to 2 (in both directions). After the next round of updates, what will be the new distance vector at node, N3?

  1. (3, 2, 0, 2, 5)
  2. (3, 2, 0, 2, 6)
  3. (7, 2, 0, 2, 5)
  4. (7, 2, 0, 2, 6)

Consider a network with five nodes, N1 to N5, as shown as below.

The network uses a Distance Vector Routing protocol. Once the routes have been stabilized, the distance vectors at different nodes are as following.

N1: (0, 1, 7, 8, 4)

N2: (1, 0, 6, 7, 3)

N3: (7, 6, 0, 2, 6)

N4: (8, 7, 2, 0, 4)

N5: (4, 3, 6, 4, 0)

Each distance vector is the distance of the best known path at that instance to nodes, N1 to N5, where the distance to itself is 0. Also all links are symmetric and the cost is identical in both directions. In each round, all nodes exchange their distance vectors with their respective neighbors. Then all nodes update their distance vectors. In between two rounds, any change in cost of a link will cause the two incident nodes to change only that entry in their distance vectors.

The cost of link N2-N3 reduces to 2 (in both directions). After the next round of updates, the link N1-N2 goes down. N2 will reflect this change immediately in its distance vector as cost, $\infty$. After the NEXT ROUND of update, what will be the cost to N1 in the distance vector of N3?

  1. 3
  2. 9
  3. 10
  4. $\infty$

Answers: Distance Vector Routing

Selected Answer
Answer->B

Distance vector routing
7 votes -- nagalla pruthvi ( 883 points)
Selected Answer
Answer: A

Distance from F to F is 0 which eliminates option C.

Using distance vector routing protocol, F -> D -> B yields distance as 20 which eliminates option B and D.
17 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer
Answer (C)

Following will be distance vectors of all nodes.

Shortest Distances from R1 to R2, R3, R4, R5 and R6
R1 (5, 3, 12, 12, 16)
Links used: R1-R3, R3-R2, R2-R4, R3-R5, R5-R6

Shortest Distances from R2 to R3, R4, R5 and R6
R2 (2, 7, 8, 12)
Links used: R2-R3, R2-R4, R4-R5, R5-R6

Shortest Distances from R3 to R4, R5 and R6
R3 (9, 9, 13)
Links used: R3-R2, R2-R4, R3-R5, R5-R6

Shortest Distances from R4 to R5 and R6
R4 (1, 5)
Links used: R4-R5, R5-R6

Shortest Distance from R5 to R6
R5 (4)
Links Used: R5-R6

If we mark, all the used links one by one, we can see that following links are never used.
R1-R2
R4-R6
11 votes -- pC ( 21.4k points)
C is the right answer.. The links R1-R2  and R4-R6 will never be used for data transfer because there are shorter paths available in any case.
16 votes -- Hunaif ( 485 points)
Selected Answer
First we need to find which are the unused links in the graph
For that we need not make distance vector tables,
We can do this by simply looking into the graph or else DVT can also give the answer.
So, R1-R2 and R4-R6 will remain unused.

Now If We changed the unused links to value 2.
R5-R6 will Now remain unused.

So the Correct answer is option B)
13 votes -- saif ahmed ( 3.8k points)

 

Only one link is not used

 

 

11 votes -- pC ( 21.4k points)
Selected Answer

Q:52 Answer is (A)

1. As soon as N2-N3 reduces to 2,both N2 and N3 instantly updates their distance to N3 and N2 to 2 respectively. So N2: (1, 0, 2, 7, 3), N3: (7, 2, 0, 2, 6) becomes this.

After this starts first round of update in which each node shares its table with their respective neighbors ONLY. BUT KEEP IN MIND THAT ONLY OLD TABLES WILL BE SHARED.What I mean is tables that will be used for updation at this moment contain the values as N1: (0, 1, 7, 8, 4),N2: (1, 0, 2, 7, 3),N3: (7, 2, 0, 2, 6),N4: (8, 7, 2, 0, 4),N5: (4, 3, 6, 4, 0).

SEE at this time all the entries are old EXCEPT in N2 and N3 where value changes to 2 instead of 6.

Question asks for N3. So focus on that.

N3 receives tables from N2: (1, 0, 2, 7, 3) and N4: (8, 7, 2, 0, 4). Using THIS ONLY original N3: (7, 2, 0, 2, 6) updates to N3(3,2,0,2,5) .(For updation and forming the tables for this refer FOROUZAN.)

So answer is (A).

24 votes -- Sandeep_Uniyal ( 7.3k points)
Selected Answer

First, as soon as N1-N2 goes down, N2 and N1 both update that entry in their tables as infinity.So N2 at this moment will be N2(inf,0,2,_,_). I have left blank coz that details are not important.

Now for N3 to get updated in the subsequent round it will get tables from N2 and N4 only. But first we need to find the N4 calculated in previous update. So in previous question N4 received updates from N3 and N5 which are N3: (7, 6, 0, 2, 6),N5: (4, 3, 6, 4, 0).

NOW THIS IS VERY IMPORTANT AS WHY N4 DID NOT GET UPDATED TABLES FROM N3. SO ANSWER IS THAT these tables were shared at the same moment and so in a particular round of update old values of all the tables are used and not the updated values.

N3 was updates AFTER IT PASSED ITS OLD table to its neighbors AS WHY WOULD N4 WAIT FOR N3 to GET UPDATED first !!! So N4 will update its table (in prev question) to N4(8,7,2,0,4).

See here path to N1 exists via N5 and not via N3 bcoz  when table was shared by N3 it contained path to N1 as 7 and N1 via N3 sums to 7+2 =9 . Now when N3 receives tables from N2(inf,0,_,_,_) and N4(8,7,2,0,4).

At first it will see its distance to N1 as "Inf" and NOT 3 because "inf" is the new distance with the same Next hop N2 (If next hop is same, new entry is updated even though it is larger than previous entry for the same NEXT HOP).

But at the same time it sees distance to N1 from N4 as 8 and so updates with the value (N3-N4 + N4-N1)= (2+8)=10. So N3-N1 distance in N3(10,_,0,_,_) is 10.

So answer is (C)

Ref: http://www.cs.princeton.edu/courses/archive/spr11/cos461/docs/lec14-distvector.pdf

12 votes -- Sandeep_Uniyal ( 7.3k points)

Assume that "host1.mydomain.dom" has an IP address of 145.128.16.8. Which of the following options would be most appropriate as a subsequence of steps in performing the reverse lookup of 145.128.16.8? In the following options "NS" is an abbreviation of "nameserver".

  1. Query a NS for the root domain and then NS for the "dom" domains
  2. Directly query a NS for "dom" and then a NS for "mydomain.dom" domains
  3. Query a NS for in-addr.arpa and then a NS for 128.145.in-addr.arpa domains
  4. Directly query a NS for 145.in-addr.arpa and then a NS for 128.145.in-addr.arpa domains

Answers: Dns

Selected Answer

Answer is C)

A & B are clearly wrong as we are doing Reverse lookup.

C is most closest answer to process given in RFC 1033. We need to get NS for  in-addr.arpa before doing query to 8.16.128.145.in-addr.arpa

D is not correct, it is not close to process.

 

Relevant stuff From https://tools.ietf.org/html/rfc1033 ==>

IN-ADDR.ARPA

   The structure of names in the domain system is set up in a
   hierarchical way such that the address of a name can be found by
   tracing down the domain tree contacting a server for each label of
   the name.  Because of this 'indexing' based on name, there is no easy
   way to translate a host address back into its host name.

   In order to do the reverse translation easily, a domain was created
   that uses hosts' addresses as part of a name that then points to the
   data for that host.  In this way, there is now an 'index' to hosts'
   RRs based on their address.  This address mapping domain is called
   IN-ADDR.ARPA.  Within that domain are subdomains for each network,
   based on network number.  Also, for consistency and natural
   groupings, the 4 octets of a host number are reversed.

   For example, the ARPANET is net 10.  That means there is a domain
   called 10.IN-ADDR.ARPA.  Within this domain there is a PTR RR at
   51.0.0.10.IN-ADDR that points to the RRs for the host SRI-NIC.ARPA
   (who's address is 10.0.0.51).  Since the NIC is also on the MILNET
   (Net 26, address 26.0.0.73), there is also a PTR RR at 73.0.0.26.IN-
   ADDR.ARPA that points to the same RR's for SRI-NIC.ARPA.  The format
   of these special pointers is defined below along with the examples
   for the NIC.
 The PTR record is used to let special names point to some other
   location in the domain tree.  They are mainly used in the IN-
   ADDR.ARPA records for translation of addresses to names.  PTR's
   should use official names and not aliases.

   For example, host SRI-NIC.ARPA with addresses 10.0.0.51 and 26.0.0.73
   would have the following records in the respective zone files for net
   10 and net 26:

           51.0.0.10.IN-ADDR.ARPA.  PTR   SRI-NIC.ARPA.
           73.0.0.26.IN-ADDR.ARPA.  PTR   SRI-NIC.ARPA.
8 votes -- Akash ( 43.8k points)

In the 4B/5B encoding scheme, every 4 bits of data are encoded in a 5-bit codeword. It is required that the codewords have at most 1 leading and at most 1 trailing zero. How many such codewords are possible?

  1. 14
  2. 16
  3. 18
  4. 20

Answers: Encoding

Selected Answer

Answer is (C)

It says we have 5 bit codeword such that "it can't have two consecutive zeros in first and second bit" and also" can't have two consecutive zeros in last two bits.

Code word with first two bits zero = 0|0|x|x|x| =8

Code word with last two bits zero = |x|x|x|0|0| =8

Code word with first  and last two bits zero = 0|0|x|0|0| =2

Code word with first  OR last two bits zero = 8+8-2=14

Therefore possible codewords =32-14 =18

15 votes -- Sandeep_Uniyal ( 7.3k points)
Consider a 3-bit error detection and 1-bit error correction hamming code for 4-bit datq. The extra parity bits required would be ___ and the 3-bit error detection is possible because the code has a minimum distance of ____

What is the distance of the following code 000000, 010101, 000111, 011001, 111111?

  1. 2
  2. 3
  3. 4
  4. 1

 

In a communication network, a packet of length $L$ bits takes link $L_1$ with a probability of $p_1$ or link $L_2$ with a probability of $p_2$. Link $L_1$ and $L_2$ have bit error probability of $b_1$ and $b_2$ respectively. The probability that the packet will be received without error via either $L_1$ or $L_2$ is

  1. $(1 - b_1)^Lp_1 + (1 - b_2)^Lp_2$
  2. $[1 - (b_1 + b_2)^L]p_1p_2$
  3. $(1 - b_1)^L (1 - b_2)^Lp_1p_2$
  4. $1 - (b_1^Lp_1 + b_2^Lp_2)$

An error correcting code has the following code words: 00000000, 00001111, 01010101, 10101010, 11110000. What is the maximum number of bit errors that can be corrected?

  1. 0
  2. 1
  3. 2
  4. 3

Data transmitted on a link uses the following $2D$ parity scheme for error detection:
Each sequence of $28$ bits is arranged in a $4\times 7$ matrix (rows $r_0$ through $r_3$, and columns $d_7$  through $d_1$) and is padded with a column $d_0$ and row $r_4$ of parity bits computed using the Even parity scheme. Each bit of column $d_0$ (respectively, row $r_4$) gives the parity of the corresponding row (respectively, column). These $40$ bits are transmitted over the data link.

GATE2008-IT_66

The table shows data received by a receiver and has $n$ corrupted bits. What is the mini­mum possible value of $n$?

  1. 1
  2. 2
  3. 3
  4. 4

Let $G(x)$ be the generator polynomial used for CRC checking. What is the condition that should be satisfied by $G(x)$ to detect odd number of bits in error?
 

  1. $G(x)$ contains more than two terms
  2. $G(x)$ does not divide $1+x^k$, for any $k$ not exceeding the frame length
  3. $1+x$ is a factor of $G(x)$
  4. $G(x)$ has an odd number of terms.

Answers: Error Detection

Selected Answer

The Hamming distance between two bit strings is the number of bits that would have to flip to make the strings identical.



To  detect d errors requires a minimum Hamming distance of d + 1  .

Correcting d bit flips requires a minimum Hamming distance of 2 * d + 1 , where d is number of bit in errors .

 For the first blank , each error detection we need 1 parity bit

for 3 bit error detection we need 3 parity bit  ......  So, 3 parity bit requires here. answer is 3.

Also we can calculate this way, formula is d+p+1 <= 2p where d=data bits , p = parity bits , d=4 bit given.

according to 1st question, d= 4  so 4+p+1<= 2p 

p+5 <= 2p  now if  p=2 it becomes 7 <= 4 , Not satisfy . p = 3 it becomes 8 <= 8 , satisfy.

  so p must be 3 .[ Minimum value of p is 3 ]

The second blank the 3-bit error detection is possible because the code has a minimum distance of ____ answer is 3+1=4,  where d=3. Formula used d+1.

Answer for 2 blanks are  [ 3,4 ]

2 votes -- Bikram ( 44.7k points)
let minimum Hamming distance is t.
so with this hamming distance t-1 bit error detection as well as (t-1)/2 bit error correction is possible..

for 3 bit error detection minimum Hamming distance = 3+1 =4
for 1 bit error correction  minimum Hamming distance = 2*1+1 = 3
no of parity bits = p
p + t + 1 <= 2^p
p + 4 +1 <= 2^p
p=3
12 votes -- Digvijay ( 47k points)
Selected Answer

Distance (also called min-distance) of a block code is the minimum number of positions in which any two distinct codes differ. Here, min-distance occurs for the codes 2 and 3 and they differ only in 2 positions. So, d = 2.

https://en.wikipedia.org/wiki/Block_code

16 votes -- Arjun Suresh ( 294k points)
Selected Answer

 Probability of choosing link $L_1=p_1$

Probability for no bit error (for any single bit)$=(1-b_1)$

Similarly for link $L_2$ 

Probability of no bit error $=(1-b_2)$

Packet can go either through link $L_1$ or $L_2$ they are mutually exclusive events (means one event happens other won't be happening and so we can simply add their respective probabilities for the favorable case).

Probability packet will be received without any error = Probability of $L_1$ being chosen and no error in any of the $L$ bits + Probability of $L_2$ being chosen and no error in any of the $L$ bits

$=(1-b_1)^Lp_1+(1-b_2)^Lp_2.$

Hence answer is option A .

---------

Why option D is not correct choice here ?

Option D here is giving the probability of a frame being arrived with at least one bit correctly - i.., all the bits are not errors. 

24 votes -- Pooja Palod ( 32.4k points)
Selected Answer
Answer: B

For correction: Floor of [(Hamming Distance - 1)/2] = Floor of [1.5] = 1 bit error.

For detection: Hamming Distance - 1 = 3 bit error.
18 votes -- Rajarshi Sarkar ( 35k points)
Selected Answer

Here we need t o change minmum 3 bits, so by doing it correct we get correct parity column wise and row wise (Correction marked by dark number).

C is answer 

11 votes -- Prashant Singh ( 49.2k points)
Selected Answer

Let me first explain building blocks to this problem. Before answering this, we should know the relationship between Sent codeword, Received codeword, CRC generator and error polynomial.

let's take an example:

Sent codeword = 10010     (=x4+x)

Received codeword = 10110 (error at 2nd bit )  (=x4+x2+x)

Now, i can write Sent codeword = Received codeword + error  (10010 =  10110 + 00100, here we do modulo 2 arithmetic i.e 1+1 =0 without carry   )

in polynomial also we can see x4+x = x4+x2+x+x2   x4+2x2+x  = x4+x (here multiplier with 2 means 0 bcoz it coressopnds to binary modulo 2 arithmetic which is 1+1 = 0 (not 2) )  

OR 

We can also write  Received codeword = Sent codeword + error   (Check it using same mrthod as above)

Sent codeword C(x), Received codeword R(x) and  error E(x).

Now we have R(x) = C(x) + E(x). and let CRC polynomial be G(x). G(x) always divides C(x), and if there is an error then G(x) should not divide R(x). Lets check - 

R(x) mod G(x) =(C(x) + E(x)mod G(x) (for simplicity i am writing mod as division  )

If G(x) divides E(x) also this would mean G(x) divides R(x). We know that, If G(x) does not properly divide R(x) then there is an error but we are never sure if there is error or not when G(x) divides R(X).

As we saw, G(x) divides R(x) or not totally depends on G(x) divides E(x) or not. whole strength of G(x) lies if it does not divide any possible E(x). 

Lets see again E(x). if there is an error in 3rd and 4th bit from left (LSb is  0th bit ) then E(X) = x4+x3.( it does not matter error is from toggling 1 to 0 or 0 to 1) Check with above example.

 

Now come to question. it says G(x) should detect odd number of bits in error?. If number of bits are odd then terms in E(x) would be odd.

for instance if 1st, 2nd and 5th bit got corrupted then E(x) = x5+x2+x.

It is clear that if any function $f(x)$ has a factor of x-k, then at x=k, $f(x)$ would be zero. I.e. $f(x) = 0$ at x=k.

  • We want to detect odd number of bits that means received message R(x) contains an odd number of inverted bits, then E(x) must contain an odd number of terms with coefficients equal to 1.
  • As a result, E(1) must equal to 1 (remeber 1+1 = 0, 1+1+1 = 1. Any Odd number of times sum of one's is = 1). E(1) is not zero, this means x+1 is not a factor of E(x).
  • Now I want G(x) not to be a factor of E(x), So that G(x) wont divide E(x) and i would happily detect odd number of bits.
  • So, if we make sure that G(1) = 0, we can conclude that G(x) does not divide any E(x) corresponding to an odd number of error bits. In this case, a CRC based on G(x) will detect any odd number of errors.
  • As long as $1+x$ is a factor of G(x), G(x) can never divide E(x). Because we know E(x) dont have factor of $1+x$.

Option C.

 

(Option B might confuse you, If  G(x) has some factor of the form $x^k+1$ then also G(x) would detect all odd number of errors, But in Option B, language is changed, and that too we should not have any upper bound on k. )

15 votes -- Sachin Mittal ( 7.1k points)

In an Ethernet local area network, which one of the following statements is TRUE?

  1. A station stops to sense the channel once it starts transmitting a frame.
  2. The purpose of the jamming signal is to pad the frames that are smaller than the minimum frame size.
  3. A station continues to transmit the packet even after the collision is detected.
  4. The exponential back off mechanism reduces the probability of collision on retransmissions.

 

 

A and B are the only two stations on an Ethernet. Each has a steady queue of frames to send. Both A and B attempt to transmit a frame, collide, and A wins the first backoff race. At the end of this successful transmission by A, both A and B attempt to transmit and collide. The probability that A wins the second backoff race is

  1. 0.5
  2. 0.625
  3. 0.75
  4. 1.0

Which of the following statements is TRUE?

  1. Both Ethernet frame and IP packet include checksum fields
  2. Ethernet frame includes a checksum field and IP packet includes a CRC field
  3. Ethernet frame includes a CRC field and IP packet includes a checksum field
  4. Both Ethernet frame and IP packet include CRC fields
Determine the maximum length of the cable (in km) for transmitting data at a rate of 500 Mbps in an Ethernet LAN with frames of size 10,000 bits. Assume the signal speed in the cable to be 2,00,000 km/s.

(A) 1      (B) 2      (C) 2.5      (D) 5

Answers: Ethernet

Selected Answer
On Ethernet

A) This is false because station need not stop to listen to stuff !

B) No, this is not purpose of jamming singlal.

C) No, stations sends jamming signal if collusion is detected. This is reason why B is false.

So answer is D )
12 votes -- Akash ( 43.8k points)
Selected Answer

Find this solution. 

24 votes -- Bijendra Behera ( 171 points)
Selected Answer

Ethernet frame

 

IP packet 

 

 

5 votes -- Prateek kumar ( 6k