2.7k views

Consider the values of $A$ = 2.0 x 10$^{30}$, $B$ = -2.0 x 10$^{30}$, $C$ = 1.0, and the sequence

   X:= A + B          Y:= A + C
X:= X + C          Y:= Y + B

executed on a computer where floating point numbers are represented with $32$ bits. The values for $X$ and $Y$ will be

1. $X = 1.0, Y = 1.0$
2. $X = 1.0, Y = 0.0$
3. $X = 0.0, Y = 1.0$
4. $X = 0.0, Y = 0.0$
edited | 2.7k views

Given 32 bits representation. So, the maximum precision can be 32 bits (In 32-bit IEEE representation, maximum precision is 24 bits but we take best case here). This means approximately 10 digits.

A = 2.0 * 1030, C = 1.0

So, A + C should make the 31st digit to 1, which is surely outside the precision level of A (it is 31st digit and not 31st bit). So, this addition will just return the value of A which will be assigned to Y.

So, Y + B will return 0.0 while X + C will return 1.0.

B choice.

Sample program if any one wants to try:

#include<stdio.h>
int main()
{
float a = 2.0e30;
float b = -2.0e30;
float c = 1.0;
float y = a+c;
printf("a = %0.25f y = %0.25f\n",a, y);
y = y + b;
float x = a + b;
printf("x = %0.25f\n",x);
x = x + c;
printf("x = %0.25f\n",x);
}
selected by
+1

@Arjun sir Thanks a lot for clarifying

But, one small doubt :

precision decreases from 32 bits possible to 24 bits.

Shouldn't it be 23 bits

Out of 32 :

1 bit for sign, 8 for Biased exponent and remaining 23 for Mantissa

0

Why precision is 10 digits?Shoudn't it be 8 digits?

Maximum number in 24 bits is $2^{24-1}$

Now $2^{24}$=$10^x$

=> x = log($2^{24}$)base 10 =>log($10^{8}$)base 10 ( $2^{10}$=$10^{3}$)

x=8,so maximum precision should be approx 8.Please verify once

+1

Given 32 bits representation. So, the maximum precision can be 32 bits (In 32-bit IEEE representation, maximum precision is 24 bits but we take best case here). This means approximately 10 digits.

0

@RamSharma1 ,is is answer to ,my above comment? I didnt get you.

@Bikaram Sir,Arjun Sir:-

A = 2.0 * 1030, C = 1.0

When we add 1 ,then it will set Least significant digit as 1 so we will need only 30 digits only.Why adding one number will increase digits by 1,why will number of digits increases from 30 to 31?

If i say 2*10^3 + 1 = 2001. Number of digits are same as 2*10^3.

Can you clarify?

+1

@rahul

first of all we are not considering here IEEE representation so we are using all 32 bits instead of 24 bits to represent the mantissa.

and 2*1030 is 31 bits ... 2 followed by 30 zeros....

0
ok .adding one i will set least significant digit to 1.I am not adding any extra digit. So it means that the given A itself will not be represent with precision of 31 digits,because if A can be represented then surely A+C can also be represented?
0
@rahul

read above comments by me and by arjun sir ..ur doubt will be cleared
0
thanx a lot Bikram sir. Very cogent solution.
0
@vs can you give a small versionn of example it wil clarify more.
0

that means 32 bits are equal to 10 decimal digits .

So, this has nothing to do with the context of the question, since it s not mentioned that it is IEEE 754 representation. Am I right ?

It is given in the question that "floating point numbers are represented with $32\ bits$"
so from 32 bits we can get  $2^{32} = 4, 294 , 967 , 296$ = total $10$ digits in decimal .
that means 32 bits are equal to 10 decimal digits.

$A = 2.0 \times 10^{30}$ this represents 31 digits and $C = 1.0$ this is 1 digit.

So $A+C = total\ (31+1) = 31$ digits.(addition in decimal)
A is one 2 followed by thirty 0's = 31 digits and C is 1 digit.
This 31st digit is outside the precision level of A

As we need to do $Y = A + C$, so it does not take the value of $C$.

Y = A is assigned and at max, it takes 10 digits and rest are overflow that's why this addition only return value of A, one extra digit it cannot take
This addition will return the value of A which will be assigned to Y.

So $Y = A+C = A$
and $Y = Y + B = ( 2.0 \times 10^{30} ) + ( - 2.0 \times 10^{30} ) = 0 .0$

$X = A+B = ( 2.0 \times 10^{30} ) + ( - 2.0 \times 10^{30} ) = 0 .0$
and $X = X+C = 0.0 + 1.0 = 1.0$

$\therefore$ $B$ is the correct option.

edited by
0
In your explanation the overflow causes the lower order bits to get truncated (or gets rounded off which is implementation dependent). What if the overflow (truncation) happens from the higher order bits?

then Y = A + C, this will fetch us least order 10 bits from A +1, when subtracted from least order 10 bits of B will get us value 1, isn't it?
0
How overflow will occur from most significant bits. Never heard this.

This is not Left shift.
0

I have always seen it that way (higher order bits getting discarded) the value stored is (actual value)%range which makes only the lower order bits to stay and higher order gets discarded. Though its not a standard similar behavior but this is common.

https://en.wikipedia.org/wiki/Integer_overflow

"The most common result of an overflow is that the least significant representable digits of the result are stored; the result is said to wraparound the maximum (i.e. modulo a power of the radix, usually two in modern computers, but sometimes ten or another radix)." - source wiki

1