I think page fault will occur to reference elements from both of matrices B and A.
B is accessed in row major order and A is in column major order. Both are stored in memory in row major order.
As page size is 2^10 and each entry size 2^3, number of pages required for one row is 2^9.
At the beginning of the first iteration on referring B[0,0] there will be 1 miss and next 127 element will be loaded so to access element elements of B matrix upto next 127 reference there will be no miss, again after the 128th element (including b[0, 0]) there will be 1 miss and next 127 element will be loaded. Since one row contains 2^16 element at the first iteration to access elements of B there will be total 2^9 misses and for 2^16 iteration there will be total (2^9)*(2^16) misses.
For accessing elements of A a the beginning of first iteration there will be 1 miss and next 127 elements i.e. A[0,1], A[0,2]...A[0,127] will be loaded but the next reference is A[1, 0] which is again a miss. By proceeding this way at the end of the first iteration out of 8 frame in 1 frame elements of B will be there and in rest 7 frame elements of A will be there and every reference to elements of A will be a miss. So total miss for matrix A is (2^16)*(2^16)
So total number of page fault will be = (2^9)*(2^16) + (2^16)*(2^16)
As LRU is used the latest B frame will never be replaced by any A's frame because B's frame will be accessed at every go, so at one frame there will be B's elements. At the rest 7 frames there will be A's elements sometimes one frame will be obsoleted B's frame. These 7 frame will get continually replaced.
Please correct me if I am wrong.