1,010 views
int foo1(float a)
{
int b = a;
return b/2;
}
int foo2(double a)
{
int b = a;
return b/2;
}
int foo3(unsigned int a)
{
int b = a;
return b/2;
}

Consider the above three C functions and choose the best option given below. (Assume IEEE floating point representation and 4 bytes for int)

1. All 3 functions return the same value for all input integer values
2. foo1 and foo2 throw compile time error
3. foo2 and foo3 return the same value for all input integer values but not foo1
4. All 3 functions return different value for some input integer values

@Arjun how can (C) be the correct option? because foo2 and foo3 will behave differently if i input the the max possible unsigned value of a to foo3().

For example unsigned int and signed into takes 16 bits, the max possible unsigned value will be $2^{16}-1$ if i pass 65535 as an input, means all 1's, so a=65535, then int b=a; will take it as -1 because it's signed hence will take it as 2's complement number, so b/2 will be 0.

while foo2(65535) will give some different output. Following is for your reference:

http://tpcg.io/WRqcer

edited

float and double:

Precision: There are an infinite number of values between the numbers 0 and 1. It should be unsurprising, then, that when we use a finite number of bits to represent all possible floating point values, some precision will be lost. A float is said to represent single-precision floating point whereas a double is said to represent double-precision floating point. (Since a double has 64 bits, it can dedicate more bits to both the mantissa and exponent fields, allowing for more precision.)

Answer to this is C. The answer depends on IEEE floating point representation. Fortunately or not, that is also part of GATE syllabus. So lets see. IEEE 754 representation for float has

• 1 sign bit
• 8 exponent bits with a bias 127. i.e., actual exponent value needs to be subtracted by 127.
• 23 mantissa bits

So, for say 0.25 we get

• Sign bit 0
• Exponent bits are 01111101
• Mantissa bits are all 0s.

So, how we get 0.25 from these bits? Just apply the IEEE 754 formula

Signbit. (1.mantissa bits) 2^(EXponent bits - bias)

1 is added before "." because if exponent is nonzero, IEEE 754 uses normalized representation.

$= 1.0 \times 2^{125 -127} = 1.0 \times 2^{-2} = 0.25$.

Now, say for $2^{31} - 1$, we get a similar representation but the catch here is in Integer representation we have 31 bits for this, while in float representation, we have only 24 (23 + 1 implicit from normalized representation). i.e., we loose the information held by those 7 bits and when we convert that float value back to integer, we get a different one - approximated to the nearing integer which could be represented by float. For example, consider the following code

#include<stdio.h>

void printbits(char a)
{
int i;int flag = 1<<7;
for(i=0; i< 8; i++){
printf("%hu",(a & flag) > 0 );
flag>>=1;
}
printf(" | ");
}
void printbinary(int a)
{
int i;char *p = ((char*)&a) + sizeof a - 1;
for(i=sizeof(a); i >0; i--)
{
printbits(*p);
p--;
}
printf("\n\n");
}
void printbinaryf(float a)
{
int i;char *p = ((char*)&a) + sizeof a - 1;
for(i=sizeof(a); i >0; i--)
{
printbits(*p);
p--;
}
printf("\n\n");
}
void printbinaryd(double a)
{
int i;char *p = ((char*)&a) + sizeof a - 1;
for(i=sizeof(a); i >0; i--)
{
printbits(*p);
p--;
}
printf("\n\n");
}

int foo1(float a)
{
printf("\nFloat: %f\n", a);
printbinaryf(a);
int b = a;
return b/2;
}
int foo2(double a)
{
printf("\nDouble: %lf\n", a);
printbinaryd(a);
int b = a;
return b/2;
}
int foo3(unsigned int a)
{
printf("\nUnsigned Int:  %d\n", a);
printbinary(a);
int b = a;
return b/2;
}
int main()
{
int a = (1<< (8 * sizeof(int) - 2)) - 1;
//printf("%d %d %d", foo1(a), foo2(a), foo3(a));

printf("\nInt: %d\n", (int)a);
printbinary(foo1(a));
printbinary(foo2(a));
printbinary(foo3(a));

}

Output is:

Int: 1073741823

Float: 1073741824.000000
01001110 | 10000000 | 00000000 | 00000000 |

00100000 | 00000000 | 00000000 | 00000000 |

Double: 1073741823.000000
01000001 | 11001111 | 11111111 | 11111111 | 11111111 | 10000000 | 00000000 | 00000000 |

00011111 | 11111111 | 11111111 | 11111111 |

Unsigned Int:  1073741823
00111111 | 11111111 | 11111111 | 11111111 |

00011111 | 11111111 | 11111111 | 11111111 |
by

int i;char *p = ((char*)&a) + sizeof a - 1;

arjun sir what is this line doing

Understood the solution but not able to understand the code how its printing the binary ?

edited

@Arjun sir, how to solve this in the exam?

Is it feasible to solve this by taking a big example in the exam & even if we go for a small example it will take more than 5 minutes to write the floating-point representation for each & every function return value?