Why we are not considering the fact that whenever 1 byte word is ready, it is sent to memory? Disk buffer is of 1B. Thus, we cannot accumulate 4B data in disk buffer and send it in one go. Also, we cannot do things in parallel. That is once disk buffer gets 1B data, it is output on system bus, then next byte and so on for 4 bytes. So, we need to send 1B at a time, even though bus width is 4B. So full 4B bandwidth of bus is never utilised and hence we need 4 cycles of 40 ns each, instead of 1 cycle of 40 ns. All this time system bus need to be reserved / stolen by DMA from processor. If disk buffer D would have larger than system bus S, we would have sent S bytes as soon as received and then next S bytes and so on, till all D bytes are sent. So parallelization can be done when disk buffer is bigger than bus width, but not when bus width is bigger than disk buffer. Or is it like there is some intermediate buffer wherein we empty the disk buffer. And only after 4B are there in intermediate buffer, we send it over system bus.