Log In
29 votes

Consider a selection of the form $\sigma_{A\leq 100} (r)$, where $r$ is a relation with $1000$ tuples. Assume that the attribute values for $A$ among the tuples are uniformly distributed in the interval $[0, 500].$ Which one of the following options is the best estimate of the number of tuples returned by the given selection query ?

  1. $50$
  2. $100$
  3. $150$
  4. $200$
in Databases
recategorized by
Can the attribute values for A  be given distinctly  like 0,1/2,1,1.5,2,.........??

I think actual answer should be 202 because it is $σ_{A≤100}(r)$ not $σ_{A <100}(r)$

ping @Krish__, @rahul sharma 5,  @Red_devil, @Shivam Chauhan, @Tuhin Dutta, @Anu007,  @Ashwin Kulkarni @reena_kandari  and @srestha ji.

yes, I too think so

but nearest option of 202 is 200

So, D) :)
It should be $\frac{101}{501}*\small1000 = 201.59$ to be precise. And since it is closer to 202 than 201, I'd go for 202.
Why is everyone simply ignoring the fact that relational algebra doesn't return repeat values? it will give distinct values only right, the we will get only 100 tuples and 101 if 0 is included.

@shaz No where it is given A is primary it can be repeat..



1 0

1 1

2 1

2 2

Etc.(if A is not ck)

But there is no "B" mentioned as well. so in that Selection will give unique results..isn't it?

actually here we are having a selection operator so its not that we are selecting only one column "A" ,there can be many other attributes in a tuple  and all will combine to give unique tuples and hence our answer must be 200 and not 100.
condition should be A< 100
"Relational Algebra doesn't allow repetition"

Well, it's selection query, not Projection. Selection by default will never have duplicates, since all tuples have to be different.
If whole tuple will repeat then it will ignore repeating tuple. But Here may be more attributes apart from A...And here A is repeating but combination of other attribute with A may not repeat...bcz of this 200 is ans instead of 100.


If projection(A) has also given with this question then No doubt 100 would be answer.

4 Answers

27 votes
Best answer
$\sigma_{A \leq 100}(r)$
 $r$ has $1000$ tuples

Values for A among the tuples are uniformly distributed in the interval $[0, 500].$ This can be split to $5$ mutually exclusive (non-overlapping) and exhaustive (no other intervals) intervals of same width of $100$ $([0-100], [101-200], [201-300], [301-400], [401-500],$ $0$ makes the first interval larger - this must be a typo in question) and we can assume all of them have same number of values due to Uniform distribution. So, number of tuples with A value in first interval should be

$\frac{\text{Total no. of tuples}}{5} = 1000/5 = 200$

Correct Answer: $D$

edited by
each tupple can choose number from 0 to 500.. So the probability of choosing a number from 0 to 100 for each tupple is 101 / 501..

So number of tupple return is = expected number of tupple choose number from 0 to 100

= 101/501 + 101/501..... 1000 times

around 200 tupples..

Numbers between 0 - 500 so number less than equal to 100 is 101(0- 100) . so answer should be 202.

why we take 1- 500 since closed bracket used so 0 and 500 are inclusive.
yes, but that must be a typo. And if we include 0, answer must be 101/501*1000. Also we cant assume values are integers.
Ok sir :)
how did you spilt [0-500]? why only 5 intervals ? why cant we have more intervals? didn't get it at all. please help!
@Arjun Sir selection of A <= 100 means only 100 tuples will be returned right? Because selection operator removes duplicates??

@ Sasidhar Gelivi  

I think you are confusing it with projection(\Pi).

yes that person is confusing
What's the meaning of typo  question ????
is RA allow duplicate ??

if not then how 200 will be the  ans ??
how did you spilt [0-500]? why only 5 intervals ? why cant we have more intervals?

didn't get it at all. please help!
since in question given...we have only nos. 0 to 500 to distribute among 1000 tuples for attribute A "UNIFORMLY". So if we use each element from 0 to 500 twice, we will have 1002 element which is suffice for distributing among 1000 tuples. we will use any of the 499 elements only and two times each. so, in this way we'll have 1000 values for A of relation r, distributed uniformly on 0 to 500.

so now when we'll select all the A values such that a<=100, we will get min 200(if 0 to 499 selected) and max 202(if 1 to 500 selected) such tuples.

so both the answers are correct, better go with the options now, whichever is given select it. if both given the select 202. bcz its the max possibility.

@ashwina did u get how split is made... i am also not getting this?


we've numbers [0,500] & tuples 1000. Now we've to uniformly or equally distribute this 501 numbers into 1000 tuples.

i.e each number will get, 1000/501 = 1.996 tuples.

i.e each number will get 2 tuples.( here floating point comes because of typo,if the number range would have [1,500] then we'll get perfectly 2 tuples for each number).

So the RA expression will select 101*2 = 202 tuples technically.

But here answer should be taken as 200 because of typo.

but tell me this

but in relational algebra, in the final table ,the tuples are distinct,so if each number is repeated two times....even through number of distinct tuples should be 100

It would be if it was $\pi_{A\leq 100} (r)$

but here it's selecting tuple,not a single attribute.


What if we choose interval like [0-50],[51-100].....[950-1000] the we would be having 10 intervals of size 50 each and each interval like others can contain a attribute value twice, then in this case we get vatue as 100 instead of 200 As

Total number of tuple /10 =1000/10 =100

@Abhinav Rana Sir





Can u tell me one thing, if we can accommodate <=100 values , then why do we need [0,500] interval to accommodate it or vice versa??




why do we need [0,500] interval to accommodate it or vice versa

what is the meaning of this line ???

here A attribute can take value in the range (0 - 500), &  these 0 - 500 values will be distributed in 1000 places,that too uniformly(equal frequency of each values).

Then what is meaning of <=100??

totaltotal we have 1000

 Now we splitting [0-500]---> 0-250, 251-500

Combinely we have 0-250,251-500,0-1000 in the 200 tuple is <=100


Since we have 1000 tuples and values between 0 and 500.So we can distribute 0-499 initially.So by that time 500 tuples would have been completed.

Now the next value is 500 which is 501 st  tuple.Now if we try to wrap around and repeat same interval again,since we have 499 tuples remained with values possible from 0 to 499.

So for values A < 100 we can have initial 101 + final 101 =202 tuples in total.

So the best estimate matches 200 tuples?

Can we consider such a distribution?

17 votes

option D

total numbers are 1000 and they have said that it is uniformly distrubuted between [0,500] it means every number is 2 times thats the only way we can distribute it uniformaly and as per our condition A<=100 at max 100 tuples can be there and and every one can be repeated 2 times so it sums up to 200 hence it is answer

should it not be 202 tupples as 0 is inclusive in the range [0-500]
that is considered as a typo in the should be from 1

that is what is assumed  @anmeet
What does typo means?????
typing error in question :D
but in relational alzebra, in the final table ,the tuples are distinct,so if each number is repeated two times....even through number of distinct tuples should be 100
yes, I also think so.


Here it is not mentioned which attributes we are projecting, so even if the value of A may be same the other attribute's value may be different making the tuples different, hence 200 is the correct answer

does select operator removes redundant tuples?
0 votes
There must be typo in question. As clearly 1000 tuples written in question. And values of A are uniformly distributed.

By taking (0 500] , A values can be 0.5, 1,1.5.... 100, 100.5....499, 499.5, 500

Total 1000 values.

By which values <=100 are 200.
0 votes

may be this is best and simple method to understand this question


Related questions

52 votes
5 answers
A demand paging system takes $100$ time units to service a page fault and $300$ time units to replace a dirty page. Memory access time is $1$ time unit. The probability of a page fault is $p$. In case of a page fault, the probability of page being dirty is also $p$. It is observed that the average access time is $3$ time units. Then the value of $p$ is $0.194$ $0.233$ $0.514$ $0.981$
asked Oct 30, 2014 in Operating System Ishrat Jahan 8.5k views
23 votes
1 answer
In a multi-user operating system on an average, $20$ requests are made to use a particular resource per hour. The arrival of requests follows a Poisson distribution. The probability that either one, three or five requests are made in $45$ minutes is given by : $6.9 \times 10^6 \times e^{-20}$ $1.02 \times 10^6 \times e^{-20}$ $6.9 \times 10^3 \times e^{-20}$ $1.02 \times 10^3 \times e^{-20}$
asked Oct 30, 2014 in Probability Ishrat Jahan 3.5k views
36 votes
11 answers
Consider a hash function that distributes keys uniformly. The hash table size is $20$. After hashing of how many keys will the probability that any new key hashed collides with an existing one exceed $0.5$. $5$ $6$ $7$ $10$
asked Oct 30, 2014 in DS Ishrat Jahan 8.3k views
9 votes
2 answers
Suppose there are two coins. The first coin gives heads with probability $\dfrac{5}{8}$ when tossed, while the second coin gives heads with probability $\dfrac{1}{4}.$ ... $\left(\dfrac{1}{2}\right)$ $\left(\dfrac{7}{16}\right)$ $\left(\dfrac{5}{32}\right)$
asked Oct 30, 2014 in Probability Ishrat Jahan 990 views