I think you are taking transfer time for every Byte ( i.e. 12 cycles x 16 B = 192 cycles and 1 cycle=2.5 microsec so 192*2.5=480 microsec.) as its operating in cycle stealing mode.
But its not always true that a transfer of 1B takes place in cycle stealing. It may also happen that the disk buffer is allowed to get filled by x B amount here 16B and then send it to memory.
Hence initialization happens for only once and transfer cycles occur for 2*16=32 times.
So time to send initialize DMA and transfer data to memory takes 10+32=42 cycles. So 42*2.5microsec=105microsec=0.105ms
Prep time is 16B*0.001=.016s=16ms.[ Take KB in power of 10 in case of freq/bandwidth cases.]
% CPU blocked=transfer time/prep time = 0.105/16 x100=0.65