9,271 views

Consider a selection of the form $\sigma_{A\leq 100} (r)$, where $r$ is a relation with $1000$ tuples. Assume that the attribute values for $A$ among the tuples are uniformly distributed in the interval $[0, 500].$ Which one of the following options is the best estimate of the number of tuples returned by the given selection query ?

1. $50$
2. $100$
3. $150$
4. $200$

"Relational Algebra doesn't allow repetition"

Well, it's selection query, not Projection. Selection by default will never have duplicates, since all tuples have to be different.
If whole tuple will repeat then it will ignore repeating tuple. But Here may be more attributes apart from A...And here A is repeating but combination of other attribute with A may not repeat...bcz of this 200 is ans instead of 100.

If projection(A) has also given with this question then No doubt 100 would be answer.
edited
Another way to solve this question

Probability density function $F(x) \ = \frac{1}{500\ – \ 0} = \frac{1}{500}$

$\therefore P(x \le 100) = \int_{0}^{100} F(x) dx$

$P = \int_{0}^{100} \frac{1}{500} dx$

$P = \frac{1}{500} \ [x]_{0}^{100}$

$P = \frac{100}{500} = \frac{1}{5}$

So, the total number of tuples $= NP$

$= 1000 \ \times \ \frac{1}{5}$

$= 200$

$\sigma_{A \leq 100}(r)$
$r$ has $1000$ tuples

Values for A among the tuples are uniformly distributed in the interval $[0, 500].$ This can be split to $5$ mutually exclusive (non-overlapping) and exhaustive (no other intervals) intervals of same width of $100$ $([0-100], [101-200], [201-300], [301-400], [401-500],$ $0$ makes the first interval larger - this must be a typo in question) and we can assume all of them have same number of values due to Uniform distribution. So, number of tuples with A value in first interval should be

$\frac{\text{Total no. of tuples}}{5} = 1000/5 = 200$

Correct Answer: $D$

Then what is meaning of <=100?? total￼total we have 1000

Now we splitting [0-500]---> 0-250, 251-500

Combinely we have 0-250,251-500,0-1000 in the 200 tuple is <=100

Since we have 1000 tuples and values between 0 and 500.So we can distribute 0-499 initially.So by that time 500 tuples would have been completed.

Now the next value is 500 which is 501 st  tuple.Now if we try to wrap around and repeat same interval again,since we have 499 tuples remained with values possible from 0 to 499.

So for values A < 100 we can have initial 101 + final 101 =202 tuples in total.

So the best estimate matches 200 tuples?

Can we consider such a distribution?

option D

total numbers are 1000 and they have said that it is uniformly distrubuted between [0,500] it means every number is 2 times thats the only way we can distribute it uniformaly and as per our condition A<=100 at max 100 tuples can be there and and every one can be repeated 2 times so it sums up to 200 hence it is answer

by

Here it is not mentioned which attributes we are projecting, so even if the value of A may be same the other attribute's value may be different making the tuples different, hence 200 is the correct answer

does select operator removes redundant tuples?

@Jhaiyam A relation is a set and so it has no duplicates. Here, select with respect to relational algebra, it does eliminate duplicates.

may be this is best and simple method to understand this question There must be typo in question. As clearly 1000 tuples written in question. And values of A are uniformly distributed.

By taking (0 500] , A values can be 0.5, 1,1.5.... 100, 100.5....499, 499.5, 500

Total 1000 values.

By which values <=100 are 200.