Since the cache line size is 8 bytes, the smallest unit of data transfer into cache from L2 cache or memory is 8 bytes. So if we have a miss for A[0], both A[0] and A[1] get fetched into cache.
2. The cache is addressed by the lower bits of the address. However the address is byte address, and since a cache line can hold 8 bytes, the lower three bits of the address are used to address bytes inside a cache line. Since the cache is 2K bytes large, it has 2K/8 = 256 cache lines, which are addressed by 8 bits. Hence:
bits 0-2 form the “offset”, which is used to address inside a cache line
bits 3 through 10 of the address from the cache line address.
Bits 11-32 form the TAG. (assume a 32 bit architecture)
Now, Consider the sequence: (2 iterations of the loop)
load A[0] → causes A[0] and A[1] at cache line 0
load B[0] → *also addresses cache line 0* - so overwrites A[0] & A[1] above
store A[0] → Nothing happens to cache (no write allocate) 8 bytes are written (8 is the unit of transfer)
load A[1] → Accesses the SAME cache line as A[0] So we load A[0] and A[1] again into line 0
load B[2] → addresses cache line 1 – load B[2] and B[3] into cache line 1
store A[1] → Again nothing happens as above.
So the pattern might be obvious:
At every iteration of the loop, B[2*i] accesses a new cache line, and A overwrites the cache lines every two iterations. Since the loop is 256 iterations, B will just reach cache line 255 when the loop will finish.
Since A has been erasing B half as fast, we would have A in the top half of the cache and B in the bottom half.
Thus the cache contains:
A[0] –A[255] (In the top half) and B[256] – B[511] in the bottom half.
Also, since the cache is write through, the entries in the cache will always be the freshly written entries. Since this is write through, we have to write 256 words = 1024 bytes (all of A) back to the next level (L2 cache or memory)
Assuming that the minimum transfer from a cache to a lower level is a cache line, this translated to 2048 bytes