# GATE2007-IT-65

4.3k views

Consider a selection of the form $\sigma_{A\leq 100} (r)$, where $r$ is a relation with $1000$ tuples. Assume that the attribute values for $A$ among the tuples are uniformly distributed in the interval $[0, 500].$ Which one of the following options is the best estimate of the number of tuples returned by the given selection query ?

1. $50$
2. $100$
3. $150$
4. $200$

recategorized
1
Can the attribute values for A  be given distinctly  like 0,1/2,1,1.5,2,.........??
0

I think actual answer should be 202 because it is $σ_{A≤100}(r)$ not $σ_{A <100}(r)$

ping @Krish__, @rahul sharma 5,  @Red_devil, @Shivam Chauhan, @Tuhin Dutta, @Anu007,  @Ashwin Kulkarni @reena_kandari  and @srestha ji.

1
yes, I too think so

but nearest option of 202 is 200

So, D) :)
1
It should be $\frac{101}{501}*\small1000 = 201.59$ to be precise. And since it is closer to 202 than 201, I'd go for 202.
1
Why is everyone simply ignoring the fact that relational algebra doesn't return repeat values? it will give distinct values only right, the we will get only 100 tuples and 101 if 0 is included.
0

@shaz No where it is given A is primary key...so it can be repeat..

example-

A B

1 0

1 1

2 1

2 2

Etc.(if A is not ck)

0
But there is no "B" mentioned as well. so in that Selection will give unique results..isn't it?
2
@shaz

actually here we are having a selection operator so its not that we are selecting only one column "A" ,there can be many other attributes in a tuple  and all will combine to give unique tuples and hence our answer must be 200 and not 100.
0
condition should be A< 100
0
"Relational Algebra doesn't allow repetition"

Well, it's selection query, not Projection. Selection by default will never have duplicates, since all tuples have to be different.
0
If whole tuple will repeat then it will ignore repeating tuple. But Here may be more attributes apart from A...And here A is repeating but combination of other attribute with A may not repeat...bcz of this 200 is ans instead of 100.

If projection(A) has also given with this question then No doubt 100 would be answer.

$\sigma_{A \leq 100}(r)$
$r$ has $1000$ tuples

Values for A among the tuples are uniformly distributed in the interval $[0, 500].$ This can be split to $5$ mutually exclusive (non-overlapping) and exhaustive (no other intervals) intervals of same width of $100$ $([0-100], [101-200], [201-300], [301-400], [401-500],$ $0$ makes the first interval larger - this must be a typo in question) and we can assume all of them have same number of values due to Uniform distribution. So, number of tuples with A value in first interval should be

$\frac{\text{Total no. of tuples}}{5} = 1000/5 = 200$

Correct Answer: $D$

edited
0
each tupple can choose number from 0 to 500.. So the probability of choosing a number from 0 to 100 for each tupple is 101 / 501..

So number of tupple return is = expected number of tupple choose number from 0 to 100

= 101/501 + 101/501..... 1000 times

around 200 tupples..

(D)
1
Numbers between 0 - 500 so number less than equal to 100 is 101(0- 100) . so answer should be 202.

why we take 1- 500 since closed bracket used so 0 and 500 are inclusive.
3
yes, but that must be a typo. And if we include 0, answer must be 101/501*1000. Also we cant assume values are integers.
0
Ok sir :)
0
how did you spilt [0-500]? why only 5 intervals ? why cant we have more intervals? didn't get it at all. please help!
2
@Arjun Sir selection of A <= 100 means only 100 tuples will be returned right? Because selection operator removes duplicates??
1

I think you are confusing it with projection(\Pi).

–2
yes that person is confusing
0
What's the meaning of typo  question ????
1
is RA allow duplicate ??

if not then how 200 will be the  ans ??
0
how did you spilt [0-500]? why only 5 intervals ? why cant we have more intervals?

0
since in question given...we have only nos. 0 to 500 to distribute among 1000 tuples for attribute A "UNIFORMLY". So if we use each element from 0 to 500 twice, we will have 1002 element which is suffice for distributing among 1000 tuples. we will use any of the 499 elements only and two times each. so, in this way we'll have 1000 values for A of relation r, distributed uniformly on 0 to 500.

so now when we'll select all the A values such that a<=100, we will get min 200(if 0 to 499 selected) and max 202(if 1 to 500 selected) such tuples.

so both the answers are correct, better go with the options now, whichever is given select it. if both given the select 202. bcz its the max possibility.
0

@ashwina did u get how split is made... i am also not getting this?

2

we've numbers [0,500] & tuples 1000. Now we've to uniformly or equally distribute this 501 numbers into 1000 tuples.

i.e each number will get, 1000/501 = 1.996 tuples.

i.e each number will get 2 tuples.( here floating point comes because of typo,if the number range would have [1,500] then we'll get perfectly 2 tuples for each number).

So the RA expression will select 101*2 = 202 tuples technically.

But here answer should be taken as 200 because of typo.

1
but tell me this

but in relational algebra, in the final table ,the tuples are distinct,so if each number is repeated two times....even through number of distinct tuples should be 100
2

It would be if it was $\pi_{A\leq 100} (r)$

but here it's selecting tuple,not a single attribute.

0

What if we choose interval like [0-50],[51-100].....[950-1000] the we would be having 10 intervals of size 50 each and each interval like others can contain a attribute value twice, then in this case we get vatue as 100 instead of 200 As

Total number of tuple /10 =1000/10 =100

0

@MRINMOY_HALDER

Can u tell me one thing, if we can accommodate <=100 values , then why do we need [0,500] interval to accommodate it or vice versa??

0

why do we need [0,500] interval to accommodate it or vice versa

what is the meaning of this line ???

here A attribute can take value in the range (0 - 500), &  these 0 - 500 values will be distributed in 1000 places,that too uniformly(equal frequency of each values).

0
Then what is meaning of <=100??
0

total￼total we have 1000

Now we splitting [0-500]---> 0-250, 251-500

Combinely we have 0-250,251-500,0-1000 in the 200 tuple is <=100

0

Since we have 1000 tuples and values between 0 and 500.So we can distribute 0-499 initially.So by that time 500 tuples would have been completed.

Now the next value is 500 which is 501 st  tuple.Now if we try to wrap around and repeat same interval again,since we have 499 tuples remained with values possible from 0 to 499.

So for values A < 100 we can have initial 101 + final 101 =202 tuples in total.

So the best estimate matches 200 tuples?

Can we consider such a distribution?

option D

total numbers are 1000 and they have said that it is uniformly distrubuted between [0,500] it means every number is 2 times thats the only way we can distribute it uniformaly and as per our condition A<=100 at max 100 tuples can be there and and every one can be repeated 2 times so it sums up to 200 hence it is answer

1
should it not be 202 tupples as 0 is inclusive in the range [0-500]
0
that is considered as a typo in the question....it should be from 1

that is what is assumed  @anmeet
0
What does typo means?????
0
typing error in question :D
2
but in relational alzebra, in the final table ,the tuples are distinct,so if each number is repeated two times....even through number of distinct tuples should be 100
0
yes, I also think so.
0

Here it is not mentioned which attributes we are projecting, so even if the value of A may be same the other attribute's value may be different making the tuples different, hence 200 is the correct answer

0
does select operator removes redundant tuples?
There must be typo in question. As clearly 1000 tuples written in question. And values of A are uniformly distributed.

By taking (0 500] , A values can be 0.5, 1,1.5.... 100, 100.5....499, 499.5, 500

Total 1000 values.

By which values <=100 are 200.

may be this is best and simple method to understand this question

## Related questions

1
8.5k views
A demand paging system takes $100$ time units to service a page fault and $300$ time units to replace a dirty page. Memory access time is $1$ time unit. The probability of a page fault is $p$. In case of a page fault, the probability of page being dirty is also $p$. It is observed that the average access time is $3$ time units. Then the value of $p$ is $0.194$ $0.233$ $0.514$ $0.981$
In a multi-user operating system on an average, $20$ requests are made to use a particular resource per hour. The arrival of requests follows a Poisson distribution. The probability that either one, three or five requests are made in $45$ minutes is given by : $6.9 \times 10^6 \times e^{-20}$ $1.02 \times 10^6 \times e^{-20}$ $6.9 \times 10^3 \times e^{-20}$ $1.02 \times 10^3 \times e^{-20}$
Consider a hash function that distributes keys uniformly. The hash table size is $20$. After hashing of how many keys will the probability that any new key hashed collides with an existing one exceed $0.5$. $5$ $6$ $7$ $10$
Suppose there are two coins. The first coin gives heads with probability $\dfrac{5}{8}$ when tossed, while the second coin gives heads with probability $\dfrac{1}{4}.$ ... $\left(\dfrac{1}{2}\right)$ $\left(\dfrac{7}{16}\right)$ $\left(\dfrac{5}{32}\right)$