Question 1
For IEEE754 the value (N
) is calculated as follows:
- For
1 ≤ E ≤ 254, N = (-1)^S × 1.F × 2^(E-127)
. These numbers are in the so-called normalized form. The sign-bit represents the sign of the number. Fractional part (1.F
) are normalized with an implicit leading 1. The exponent is bias (or in excess) of 127
, so as to represent both positive and negative exponent. The range of exponent is -126
to +127
.
- For
E = 0, N = (-1)^S × 0.F × 2^(-126)
. These numbers are in the so-called denormalized form. The exponent of 2^-126
evaluates to a very small number. Denormalized form is needed to represent zero (with F=0
and E=0
). It can also represents very small positive and negative number close to zero.
- For
E = 255
, it represents special values, such as ±INF
(positive and negative infinity) and NaN
(not a number). This is beyond the scope of this article.
https://www3.ntu.edu.sg/home/ehchua/programming/java/datarepresentation.html
Question 2
IEEE 754 Single Precision (With Normalization)
With Normalization $\Rightarrow (-1)^{S} (1.M)_2 \times 2^{E-\text{bias}} $
$(1.100101)_2 \times 2^{3} $
$E-\text{bias}=\text{True Exponent}$
$E-127=3 \Rightarrow E=130$
$\underbrace{{\color{Red} {\textbf{0}} }}_{\text{sign}} |\underbrace{{\color{Blue} {\textbf{1000 0010}} }}_{\text{Exponent}} |\underbrace{{\color{Green} {\textbf{1001 0100 0000 0000 0000 000}} }}_{\text{Mantissa}}$
IEEE 754 Single Precision (Without Normalization)
Without Normalization $\Rightarrow (-1)^{S} (0.M)_2 \times 2^{E-\text{bias}} $
$(0.1100101)_2 \times 2^{4} $
$E-\text{bias}=\text{True Exponent}$
$E-127=4 \Rightarrow E=131$
$\underbrace{{\color{Red} {\textbf{0}} }}_{\text{sign}} |\underbrace{{\color{Blue} {\textbf{1000 0011}} }}_{\text{Exponent}} |\underbrace{{\color{Green} {\textbf{1100 1010 0000 0000 0000 000}} }}_{\text{Mantissa}}$
- from hamacher computer organization
Normalized Minimum $\pm N_{min}$
When S= $\pm \Rightarrow 0 or 1$ , E=1 ,M=0
$0 | 0000 0001 | 0000 0000 0000 0000 0000 000|$
$(-1)^{S} \times (1.0)_2 \times 2^{1-127} $
$\pm N_{min}= (-1)^{S} \times (1.0)_2 \times 2^{-126} $
Normalized Maximum $\pm N_{max}$
When S= $\pm \Rightarrow 0 or 1$ , E=254 ,M=$2^{23}-1$
$0 | 1111 1110 | 1111 1111 1111 1111 1111 111$
$(-1)^{S} \times (1.1111 1111 1111 1111 1111 111)_2 \times 2^{254-127} $
$\pm N_{max}= (-1)^{S} \times (1.1.1111 1111 1111 1111 1111 111)_2 \times 2^{127} $
Denormalized Minimum $\pm D_{min}$
When S= $\pm \Rightarrow 0 or 1$ , E=0 ,M=1
$0 | 0000 0000 | 0000 0000 0000 0000 0000 001|$
$\pm D_{min}= (-1)^{S} \times (0.0000 0000 0000 0000 0000 001)_2 \times 2^{-126} $
Denormalized Maximum $\pm D_{max}$
When S= $\pm \Rightarrow 0 or 1$ , E=0 ,M=$2^{23}-1$
$0 | 0000 0000 | 1111 1111 1111 1111 1111 111$
$\pm D_{max}= (-1)^{S} \times (0.1.1111 1111 1111 1111 1111 111)_2 \times 2^{-126} $
https://www3.ntu.edu.sg/home/ehchua/programming/java/datarepresentation.html
Similarly
Excess 64 (With Normalization)
With Normalization $\Rightarrow (-1)^{S} (1.M)_2 \times 2^{E-\text{bias}} $
$(1.100101)_2 \times 2^{3} $
$E-\text{bias}=\text{True Exponent}$
$E-64=3 \Rightarrow E=67$
$\underbrace{{\color{Red} {\textbf{0}} }}_{\text{sign}} |\underbrace{{\color{Blue} {\textbf{1000 011}} }}_{\text{Exponent}} |\underbrace{{\color{Green} {\textbf{1001 0100}} }}_{\text{Mantissa}}$
Excess 64 (Without Normalization)
Without Normalization $\Rightarrow (-1)^{S} (0.M)_2 \times 2^{E-\text{bias}} $
$(0.1100101)_2 \times 2^{4} $
$E-\text{bias}=\text{True Exponent}$
$E-64=4 \Rightarrow E=68$
$\underbrace{{\color{Red} {\textbf{0}} }}_{\text{sign}} |\underbrace{{\color{Blue} {\textbf{1000 100}} }}_{\text{Exponent}} |\underbrace{{\color{Green} {\textbf{1100 1010 }} }}_{\text{Mantissa}}$
Rounding Off
Suppose 3 digits are allowed in mantissa then :-
$\begin{align*} [113. +(-111.)]+7.51 \\ =&[002.]+7.51\\ =&[2.00]+7.51\\ =&9.51\\ \end{align*}$
$\begin{align*} 113. +[(-111.)+7.51] \\ =&113.+[-111.+008.] \\ =&113.+(-103.)\\ =&010.\\ \end{align*}$.