All the answers here are a tad confusing, I want to share my solution with evidences from the book "Computer Organisation and Embedded System" by Hamacher et al
In the book it is clearly mentioned that
Instruction processing consists of two phases: the fetch phase and the execution phase. It is convenient to divide the processor hardware into two corresponding sections. One section fetches instructions and the other executes them.
The section that fetches instructions is also responsible for decoding them and for generating the control signals that cause appropriate actions to take place in the execution section. The execution section reads the data operands specified in an instruction, performs the required computations, and stores the results.
Considering this in mind we can approach this question with each stage in the Execution cycle to be taking 1 clock cycle each, that is :
Execution Cycle : OF + Compute + WB
OF : 1 clock cycle for Sin←R0 and Tin←R1 both (as both can be done parellely)
Compute : 1 clock cycle for ALUout←S+T
WB : 1 clock cycle for writing the result R0in←ALUout
So in total 3 clock cycles are needed for the Execution cycle.
P.S - We don't have to assume anything at our end until and unless it is explicitly stated in the question, and the statement
The instruction “add R0, R1” has the register transfer interpretation
doesn't mean that only register operations are to be considered for the clock cycle but it is providing the interpretation for the instruction to be R0 <= R0 + R1 and nothing else, please don't misinterpret it.