7,325 views
A relation $r(A, B)$ in a relational database has $1200$ tuples. The attribute $A$ has integer values ranging from $6$ to $20$, and the attribute $B$ has integer values ranging from $1$ to $20$. Assume that the attributes $A$ and $B$ are independently distributed.

The estimated number of tuples in the output of $\sigma _{(A>10)\vee(B=18)}(r)$ is ____________.

• $P(A>10) = \frac{10}{15} = \frac{2}{3}$
• $P(B=18) = \frac{1}{20}$
• $P(A>10 \land B=18) = \frac{2}{3}\times\frac{1}{20} = \frac{1}{30}$

$P(A>10 \lor B=18) = P(A>10) + P(B=18) – P(A>10 \land B=18)$

$\qquad = \frac{2}{3} + \frac{1}{20} – \frac{1}{30} = \frac{40 + 3 – 2}{60} = \frac{41}{60}$

$\text{Estimated number of tuples} = \frac{41}{60}\times1200 = 820$

The above answer is TRUE for SQL SELECT but not for Relational Algebra as by theory relational algebra operates on a set which means all the elements must be distinct. Since we have $15$ distinct possible values for $A$ and $20$ distinct possible values for $B,$ in strict relational algebra we’ll get

$\text{Estimated number of tuples} = \frac{41}{60}\times (15 \times 20) = 205.$

Official Answer: $205$ OR $820.$

by

But since Relational Database is just a Relation, which itself is a Set, then how can we have duplicates?
Anybody going to challenge this question ?
null values are also possible in A or  B

Remember relational algebra operator select is also distinct

By principal of mutual inclusion exclusion

For each A>10 (11-20)

there are 20 distinct B(1-20)

So current total = 200 (20*10)

For each B = 18

There are 15 distinct A

current total = 200 + 15 = 215

Now excluding the tuples which we have counted twice i.e B=18 and A>10

There are 10 such tuples

So current total = 200 +15 -10

Final Answer should be 205 according to me

What is the final answer, 820 or 205?
marks were awarded for both of these answers.
nice explanation