Consider below code written in C
Loop A
sum = 0;
for (i = 0; i < 128; i++)
for (j = 0; j < 64; j++)
sum += A[i][j];
Loop B
sum = 0;
for (j = 0; j < 64; j++)
for (i = 0; i < 128; i++)
sum += A[i][j];
The matrix A is stored contiguously in memory in row-major order. Consider a 4KB direct-mapped data cache with 8-word (32-byte) cache lines.
Calculate the number of cache misses that will occur when running Loop A.
i)1392 misses
ii) 1024 misses
iii)1020 misses
iv)1323 misses