Dark Mode

Arjun
asked
in Programming
Oct 19, 2016

1,010 views
8 votes

int foo1(float a) { int b = a; return b/2; } int foo2(double a) { int b = a; return b/2; } int foo3(unsigned int a) { int b = a; return b/2; }

Consider the above three C functions and choose the best option given below. (Assume IEEE floating point representation and 4 bytes for int)

- All 3 functions return the same value for all input integer values
- foo1 and foo2 throw compile time error
- foo2 and foo3 return the same value for all input integer values but not foo1
- All 3 functions return different value for some input integer values

@Arjun __ how can (C) be the correct option?__ because foo2 and foo3 will behave differently if i input the the max possible

For example unsigned int and signed into takes **16 bits**, the max possible unsigned value will be $2^{16}-1$ if i pass 65535 as an input, means all 1's, so a=65535, then int b=a; will take it as -1 because it's signed hence will take it as 2's complement number, so b/2 will be 0.

while foo2(65535) will give some different output. Following is for your reference:

1

edited
Jun 21, 2020
by Lakshman Patel RJIT

float and double:

**Precision: **There are an infinite number of values between the numbers 0 and 1. It should be unsurprising, then, that when we use a finite number of bits to represent all possible floating point values, some precision will be lost. A float is said to represent *single-precision* floating point whereas a double is said to represent *double-precision *floating point. (Since a double has 64 bits, it can dedicate more bits to both the mantissa and exponent fields, allowing for more precision.)

0

5 votes

Best answer

Answer to this is C. The answer depends on IEEE floating point representation. Fortunately or not, that is also part of GATE syllabus. So lets see. IEEE 754 representation for float has

- 1 sign bit
- 8 exponent bits with a bias 127. i.e., actual exponent value needs to be subtracted by 127.
- 23 mantissa bits

So, for say 0.25 we get

- Sign bit 0
- Exponent bits are 01111101
- Mantissa bits are all 0s.

So, how we get 0.25 from these bits? Just apply the IEEE 754 formula

Signbit. (1.mantissa bits) 2^(EXponent bits - bias)

1 is added before "." because if exponent is nonzero, IEEE 754 uses normalized representation.

$= 1.0 \times 2^{125 -127} = 1.0 \times 2^{-2} = 0.25$.

Now, say for $2^{31} - 1$, we get a similar representation but the catch here is in Integer representation we have 31 bits for this, while in float representation, we have only 24 (23 + 1 implicit from normalized representation). i.e., we loose the information held by those 7 bits and when we convert that float value back to integer, we get a different one - approximated to the nearing integer which could be represented by float. For example, consider the following code

#include<stdio.h> void printbits(char a) { int i;int flag = 1<<7; for(i=0; i< 8; i++){ printf("%hu",(a & flag) > 0 ); flag>>=1; } printf(" | "); } void printbinary(int a) { int i;char *p = ((char*)&a) + sizeof a - 1; for(i=sizeof(a); i >0; i--) { printbits(*p); p--; } printf("\n\n"); } void printbinaryf(float a) { int i;char *p = ((char*)&a) + sizeof a - 1; for(i=sizeof(a); i >0; i--) { printbits(*p); p--; } printf("\n\n"); } void printbinaryd(double a) { int i;char *p = ((char*)&a) + sizeof a - 1; for(i=sizeof(a); i >0; i--) { printbits(*p); p--; } printf("\n\n"); } int foo1(float a) { printf("\nFloat: %f\n", a); printbinaryf(a); int b = a; return b/2; } int foo2(double a) { printf("\nDouble: %lf\n", a); printbinaryd(a); int b = a; return b/2; } int foo3(unsigned int a) { printf("\nUnsigned Int: %d\n", a); printbinary(a); int b = a; return b/2; } int main() { int a = (1<< (8 * sizeof(int) - 2)) - 1; //printf("%d %d %d", foo1(a), foo2(a), foo3(a)); printf("\nInt: %d\n", (int)a); printbinary(foo1(a)); printbinary(foo2(a)); printbinary(foo3(a)); }

Output is:

Int: 1073741823 Float: 1073741824.000000 01001110 | 10000000 | 00000000 | 00000000 | 00100000 | 00000000 | 00000000 | 00000000 | Double: 1073741823.000000 01000001 | 11001111 | 11111111 | 11111111 | 11111111 | 10000000 | 00000000 | 00000000 | 00011111 | 11111111 | 11111111 | 11111111 | Unsigned Int: 1073741823 00111111 | 11111111 | 11111111 | 11111111 | 00011111 | 11111111 | 11111111 | 11111111 |

0

@Arjun sir, how to solve this in the exam?

Is it feasible to solve this by taking a **big example** in the exam & even if we go for a **small example** it will take more than 5 minutes to write the floating-point representation for each & every function return value?

Please guide sir.

0