7,324 views
A relation $r(A, B)$ in a relational database has $1200$ tuples. The attribute $A$ has integer values ranging from $6$ to $20$, and the attribute $B$ has integer values ranging from $1$ to $20$. Assume that the attributes $A$ and $B$ are independently distributed.

The estimated number of tuples in the output of $\sigma _{(A>10)\vee(B=18)}(r)$ is ____________.

• $P(A>10) = \frac{10}{15} = \frac{2}{3}$
• $P(B=18) = \frac{1}{20}$
• $P(A>10 \land B=18) = \frac{2}{3}\times\frac{1}{20} = \frac{1}{30}$

$P(A>10 \lor B=18) = P(A>10) + P(B=18) – P(A>10 \land B=18)$

$\qquad = \frac{2}{3} + \frac{1}{20} – \frac{1}{30} = \frac{40 + 3 – 2}{60} = \frac{41}{60}$

$\text{Estimated number of tuples} = \frac{41}{60}\times1200 = 820$

The above answer is TRUE for SQL SELECT but not for Relational Algebra as by theory relational algebra operates on a set which means all the elements must be distinct. Since we have $15$ distinct possible values for $A$ and $20$ distinct possible values for $B,$ in strict relational algebra we’ll get

$\text{Estimated number of tuples} = \frac{41}{60}\times (15 \times 20) = 205.$

Official Answer: $205$ OR $820.$

by

Considering that Relational projections always return distinct tuples. The “estimated number of tuples in the output” will boil down to all the unique tuples that satisfy the given condition.

So the answer will be 15 + 10*19 = 205?
yeah, I think so. I considered a set theory approach.

# tuples with A>10 = 200 (20 B’s * 10 A’s)

# tuples with B==18 = 15 (1 B * 15 A’s)

# tuples with A>10 AND # tuples with B==18 = 10

total = “# tuples with A>10” + “# tuples with B==18” – “# tuples with A>10 AND # tuples with B==18”

= 200 + 15 – 10

= 205
Maybe you are right, but what’s the use of 1200 then?

Not sure. Maybe it is just to throw us off.

One thing I am certain of is “Relational projections always return distinct tuples”

PS. I also got it wrong in the actual exam.

Huh, maybe you got it right in the exam :)

Making this a comment, just in case
Also how can you assume “uniform distribution” to apply probability?
how can there be 1200 tuples in the database if in total 15*20 = 300 unique tuples can exist at most?
it would have duplicate tuples
But since Relational Database is just a Relation, which itself is a Set, then how can we have duplicates?
Anybody going to challenge this question ?
null values are also possible in A or  B

Remember relational algebra operator select is also distinct

By principal of mutual inclusion exclusion

For each A>10 (11-20)

there are 20 distinct B(1-20)

So current total = 200 (20*10)

For each B = 18

There are 15 distinct A

current total = 200 + 15 = 215

Now excluding the tuples which we have counted twice i.e B=18 and A>10

There are 10 such tuples

So current total = 200 +15 -10

Final Answer should be 205 according to me

What is the final answer, 820 or 205?
marks were awarded for both of these answers.
nice explanation