@Arjun sir Thanks a lot for clarifying

But, one small doubt :

precision decreases from 32 bits possible to 24 bits.

Shouldn't it be 23 bits

Out of 32 :

1 bit for sign, 8 for Biased exponent and remaining 23 for Mantissa

The Gateway to Computer Science Excellence

First time here? Checkout the FAQ!

x

+23 votes

Consider the values of $A$ = 2.0 x 10$^{30}$, $B$ = -2.0 x 10$^{30}$, $C$ = 1.0, and the sequence

X:= A + B Y:= A + C X:= X + C Y:= Y + B

executed on a computer where floating point numbers are represented with $32$ bits. The values for $X$ and $Y$ will be

- $X = 1.0, Y = 1.0$
- $X = 1.0, Y = 0.0$
- $X = 0.0, Y = 1.0$
- $X = 0.0, Y = 0.0$

+24 votes

Best answer

Given 32 bits representation. So, the maximum precision can be 32 bits (In 32-bit IEEE representation, maximum precision is 24 bits but we take best case here). This means approximately 10 digits.

A = 2.0 * 10^{30}, C = 1.0

So, A + C should make the 31^{st} digit to 1, which is surely outside the precision level of A (it is 31^{st} digit and not 31^{st} bit). So, this addition will just return the value of A which will be assigned to Y.

So, Y + B will return 0.0 while X + C will return 1.0.

B choice.

Sample program if any one wants to try:

#include<stdio.h> int main() { float a = 2.0e30; float b = -2.0e30; float c = 1.0; float y = a+c; printf("a = %0.25f y = %0.25f\n",a, y); y = y + b; float x = a + b; printf("x = %0.25f\n",x); x = x + c; printf("x = %0.25f\n",x); }

+1

@Arjun sir Thanks a lot for clarifying

But, one small doubt :

precision decreases from 32 bits possible to 24 bits.

Shouldn't it be 23 bits

Out of 32 :

1 bit for sign, 8 for Biased exponent and remaining 23 for Mantissa

0

Why precision is 10 digits?Shoudn't it be 8 digits?

Maximum number in 24 bits is $2^{24-1}$

Now $2^{24}$=$10^x$

=> x = log($2^{24}$)base 10 **=>**log($10^{8}$)base 10 ( $2^{10}$=$10^{3}$)

x=8,so maximum precision should be approx 8.Please verify once

+1

Please check below sentences of the answer.

Given 32 bits representation. So, the maximum precision can be 32 bits (In 32-bit IEEE representation, maximum precision is 24 bits but we take best case here). This means approximately 10 digits.

0

@RamSharma1 ,is is answer to ,my above comment? I didnt get you.

@Bikaram Sir,Arjun Sir:-

in the answer

A = 2.0 * 10

^{30}, C = 1.0

When we add 1 ,then it will set Least significant digit as 1 so we will need only 30 digits only.Why adding one number will increase digits by 1,why will number of digits increases from 30 to 31?

If i say 2*10^3 + 1 = 2001. Number of digits are same as 2*10^3.

Can you clarify?

+1

@rahul

first of all we are not considering here IEEE representation so we are using all 32 bits instead of 24 bits to represent the mantissa.

and 2*10^{30} is 31 bits ... 2 followed by 30 zeros....

+3 votes

It is given in the question that "floating point numbers are represented with $32\ bits$"

so from 32 bits we can get $2^{32} = 4, 294 , 967 , 296$ = total $10$ digits in decimal .

**that means 32 bits are equal to 10 decimal digits**.

$A = 2.0 \times 10^{30}$ this represents 31 digits and $C = 1.0$ this is 1 digit.

So $A+C = total\ (31+1) = 31$ digits.(addition in decimal)

A is one 2 followed by thirty 0's = 31 digits and C is 1 digit.

**This 31st digit is outside the precision level of A**.

As we need to do $Y = A + C$, so it does not take the value of $C$.

**Y = A **is assigned and at max, it takes **10 digits** and rest are **overflow** that's why this addition only return value of A, **one extra digit it cannot take**

This addition will return the value of A which will be assigned to Y.

So $Y = A+C = A$

and $Y = Y + B = ( 2.0 \times 10^{30} ) + ( - 2.0 \times 10^{30} ) = 0 .0 $

$X = A+B = ( 2.0 \times 10^{30} ) + ( - 2.0 \times 10^{30} ) = 0 .0$

and $X = X+C = 0.0 + 1.0 = 1.0$

$\therefore$ $B$ is the correct option.

0

In your explanation the overflow causes the lower order bits to get truncated (or gets rounded off which is implementation dependent). What if the overflow (truncation) happens from the higher order bits?

then Y = A + C, this will fetch us least order 10 bits from A +1, when subtracted from least order 10 bits of B will get us value 1, isn't it?

then Y = A + C, this will fetch us least order 10 bits from A +1, when subtracted from least order 10 bits of B will get us value 1, isn't it?

0

I have always seen it that way (higher order bits getting discarded) the value stored is (actual value)%range which makes only the lower order bits to stay and higher order gets discarded. Though its not a standard similar behavior but this is common.

https://en.wikipedia.org/wiki/Integer_overflow

"The most common result of an overflow is that the least significant representable digits of the result are stored; the result is said to *wrap*around the maximum (i.e. modulo a power of the radix, usually two in modern computers, but sometimes ten or another radix)." - source wiki

- All categories
- General Aptitude 1.6k
- Engineering Mathematics 7.5k
- Digital Logic 3k
- Programming & DS 4.9k
- Algorithms 4.3k
- Theory of Computation 6k
- Compiler Design 2.1k
- Databases 4.2k
- CO & Architecture 3.5k
- Computer Networks 4.2k
- Non GATE 1.4k
- Others 1.5k
- Admissions 584
- Exam Queries 566
- Tier 1 Placement Questions 23
- Job Queries 72
- Projects 18

50,108 questions

53,221 answers

184,627 comments

70,462 users