We can consider the total transmission time as propagation delay remains same irrespective of the packet size.
If each packet (excluding header) is $x$ we will have $n$ packets where $n = \left \lceil\dfrac{24}{x}\right \rceil$
Total transmission time $ = 4 \times \dfrac{\text{Packet Size}}{BW}\;(1$ transmission from source followed by $3$ transmissions from $3$ intermediate nodes $) + (n-1) \times$ transmission time.
$\qquad =\dfrac{1}{BW} (4 \times \text{Packet Size} + (n-1) \times \text{Packet Size} $
$\qquad =\dfrac{1}{BW} (4 \times (x+8) + (n-1) \times (x+8) $
$\qquad =\dfrac{1}{BW} (4 \times (x+8) + \left(\left \lceil \dfrac{24}{x}\right \rceil-1\right) \times (x+8) $
$\qquad =\dfrac{1}{BW} (4 \times x + 32 + 24 -x +192/x -8 ) $
$\qquad =\dfrac{1}{BW} (3 x + 48 +192/x ) $
For optimal packet size, differentiating w.r.t. $x,$ time should be $0.$
$\implies 3 - 192/x^2 = 0 \implies x^2 = 64.$
$\implies x = 8.$
So, optimal packet size including header $ = 8 + 8 = 16 $ bytes.