7,306 views

Consider the values of $A = 2.0 \times 10^{30}, B = -2.0 \times 10^{30}, C = 1.0,$ and the sequence

   X:= A + B          Y:= A + C
X:= X + C          Y:= Y + B

executed on a computer where floating point numbers are represented with $32$ bits. The values for $X$ and $Y$ will be

1. $X = 1.0, Y = 1.0$
2. $X = 1.0, Y = 0.0$
3. $X = 0.0, Y = 1.0$
4. $X = 0.0, Y = 0.0$

Subscribe to GO Classes for GATE CSE 2022

Given $32$ bits representation. So, the maximum precision can be $32$ bits (In $32$-bit IEEE representation, maximum precision is $24$ bits but we take the best case here). This means approximately $10$ digits.

$A = 2.0 \times 10^{30}, C = 1.0$

So, $A + C$ should make the $31$st digit to $1,$ which is surely outside the precision level of $A$ (it is $31$st digit and not $31$st bit). So, this addition will just return the value of $A$ which will be assigned to $Y.$

So, $Y + B$ will return $0.0$ while $X + C$ will return $1.0.$

B choice.

Sample program if anyone wants to try:

#include<stdio.h>
int main()
{
float a = 2.0e30;
float b = -2.0e30;
float c = 1.0;
float y = a+c;
printf("a = %0.25f y = %0.25f\n",a, y);
y = y + b;
float x = a + b;
printf("x = %0.25f\n",x);
x = x + c;
printf("x = %0.25f\n",x);
}
by

@Bikram sir

as we need to do Y = A + C , so it does not take the value of C .( Y = A is assigned and at max it takes 10 digits and rest are overflow thats why this addition only return value of A, one extra digit it can not take )

here when Y=A+C takes place, How A is presented? A is a 31 digit no. Whereas floating point number is represented by 32 bits i.e. it can represent a 10 digit no. So overflow happens. When overflow happens, how result of addition Y= A + C becomes Y= A only. (After overflow A's original value is lost). Please explain in details.

A = 2.0 * 1030, it  represent 31 digits i.e. one 2 followed by 30 zeros. And  C = 1.0

So,  A+C = 31 digits ,

and 31st digit is 1 which is outside of the precision of A hence not taken.

hence Y = A returns

@Bikram sir

I think he meant that

A is a 31 digit no.

But, according to A's precision it can only store 10 digits.

So, Y=A+C=A is fine but the question is How this final Y=A is stored ,I mean

1) Y=A is a 31 digit no.=2*1030   or

2) Y=A is a 10 digit no. as A's precision is of 10 digits.

It is A=2.0*1030   , which  represent 31 digits i.e. one 2 followed by 30 zeros.

Then what abt precision thing?
I've also same confusion. Probably I'm missing some points.

@Bikram sir. Please clarify how can we accommodate 31 digit no using 32 bits.
@VS It will be stored imprecisely. That is exactly the use of floating point representation. Though in this question it is not clear which representation is used for floating point (as whichever be the representation precise storage is not possible), if we assume IEEE representation we store mantissa and exponent separately which increases the range of values which can be stored -- though precision decreases from 32 bits possible to 24 bits.

@Arjun sir Thanks a lot for clarifying

But, one small doubt :

precision decreases from 32 bits possible to 24 bits.

Shouldn't it be 23 bits

Out of 32 :

1 bit for sign, 8 for Biased exponent and remaining 23 for Mantissa

Why precision is 10 digits?Shoudn't it be 8 digits?

Maximum number in 24 bits is $2^{24-1}$

Now $2^{24}$=$10^x$

=> x = log($2^{24}$)base 10 =>log($10^{8}$)base 10 ( $2^{10}$=$10^{3}$)

x=8,so maximum precision should be approx 8.Please verify once

Given 32 bits representation. So, the maximum precision can be 32 bits (In 32-bit IEEE representation, maximum precision is 24 bits but we take best case here). This means approximately 10 digits.

@RamSharma1 ,is is answer to ,my above comment? I didnt get you.

@Bikaram Sir,Arjun Sir:-

A = 2.0 * 1030, C = 1.0

When we add 1 ,then it will set Least significant digit as 1 so we will need only 30 digits only.Why adding one number will increase digits by 1,why will number of digits increases from 30 to 31?

If i say 2*10^3 + 1 = 2001. Number of digits are same as 2*10^3.

Can you clarify?

@rahul

first of all we are not considering here IEEE representation so we are using all 32 bits instead of 24 bits to represent the mantissa.

and 2*1030 is 31 bits ... 2 followed by 30 zeros....

ok .adding one i will set least significant digit to 1.I am not adding any extra digit. So it means that the given A itself will not be represent with precision of 31 digits,because if A can be represented then surely A+C can also be represented?
@rahul

read above comments by me and by arjun sir ..ur doubt will be cleared
thanx a lot Bikram sir. Very cogent solution.
@vs can you give a small versionn of example it wil clarify more.

that means 32 bits are equal to 10 decimal digits .

So, this has nothing to do with the context of the question, since it s not mentioned that it is IEEE 754 representation. Am I right ?

Shouldn’t both A and C also need to be represented in 32 bits i.e., we only take the first 10 bits . so,

A=200...0  (9 zeros)

C=000...1 (9 zeros)

so, A+C = 200...1 (10digits)

Can any one please clear this doubt
@Pratyush Priyam Kuan

Same doubt.
Few questions –

1) What are the value of A and B that will be stored in 32 bits?

2) If we are assuming 32 bit precision, then how are we storing the sign of B as it itswlf require 1 bit.

3) How are the things happening at 32 bit-level ? Like if A's precision is 10 bits then how is A able to store 31 digits, same with B. And if A is truncated after 10 digits, then when we add C to A, why are we not adding C to this truncated value of A?

It is given in the question that "floating point numbers are represented with $32\ bits$"
so from 32 bits we can get  $2^{32} = 4, 294 , 967 , 296$ = total $10$ digits in decimal .
that means 32 bits are equal to 10 decimal digits.

$A = 2.0 \times 10^{30}$ this represents 31 digits and $C = 1.0$ this is 1 digit.

So $A+C = total\ (31+1) = 31$ digits.(addition in decimal)
A is one 2 followed by thirty 0's = 31 digits and C is 1 digit.
This 31st digit is outside the precision level of A

As we need to do $Y = A + C$, so it does not take the value of $C$.

Y = A is assigned and at max, it takes 10 digits and rest are overflow that's why this addition only return value of A, one extra digit it cannot take
This addition will return the value of A which will be assigned to Y.

So $Y = A+C = A$
and $Y = Y + B = ( 2.0 \times 10^{30} ) + ( - 2.0 \times 10^{30} ) = 0 .0$

$X = A+B = ( 2.0 \times 10^{30} ) + ( - 2.0 \times 10^{30} ) = 0 .0$
and $X = X+C = 0.0 + 1.0 = 1.0$

$\therefore$ $B$ is the correct option.

by

In your explanation the overflow causes the lower order bits to get truncated (or gets rounded off which is implementation dependent). What if the overflow (truncation) happens from the higher order bits?

then Y = A + C, this will fetch us least order 10 bits from A +1, when subtracted from least order 10 bits of B will get us value 1, isn't it?

I have always seen it that way (higher order bits getting discarded) the value stored is (actual value)%range which makes only the lower order bits to stay and higher order gets discarded. Though its not a standard similar behavior but this is common.

https://en.wikipedia.org/wiki/Integer_overflow

"The most common result of an overflow is that the least significant representable digits of the result are stored; the result is said to wraparound the maximum (i.e. modulo a power of the radix, usually two in modern computers, but sometimes ten or another radix)." - source wiki

This 31st digit is outside the precision level of A

what is precession level of A?

First 10 digits

" floating point numbers are represented with 32 bits " ..

so from 32 bits we can represent a max of $2^32$-1  = 4, 294 , 967 , 295 which contains total 10 digits in decimal .

that means 32 bits are equivalent to 10 decimal digits .

A = 2.0 *  $10^3$$^0 = 2000000000000000000000000000000(i.e. 2 followed by 30 zeroes) it is not a floating point number B =-2.0 * 10^3$$^0$ = -2000000000000000000000000000000(i.e. -2 followed by 30 zeroes)  it is not a floating point number

A,B are represented here in scientific notation but they are not floating point numbers

so they will not be represented by 32 bits but

$C= 1.0$              it is a floating point number

it is represented by 32 bits

X: = A + B

x= (2.0-2.0) * $10^3$$^0 = 0.0 X = X+C = 0.0 + 1.0 = 1.0 Y: = A + C if A is added with C it makes A+C as floating point and assigns it to Y but Y can only store 32 bits or 10 digits due to which A+C will return A only and A will get assigned to Y Y=2.0 * 10^3$$^0$

and Y = Y + B = ( 2.0 * $10^3$$^0 ) + ( - 2.0 * 10^3$$^0$  ) = 0 .0

so

X = 1.0 Y=0.0

correct me if I am wrong

by