7,139 views
A relation $r(A, B)$ in a relational database has $1200$ tuples. The attribute $A$ has integer values ranging from $6$ to $20$, and the attribute $B$ has integer values ranging from $1$ to $20$. Assume that the attributes $A$ and $B$ are independently distributed.

The estimated number of tuples in the output of $\sigma _{(A>10)\vee(B=18)}(r)$ is ____________.

• $P(A>10) = \frac{10}{15} = \frac{2}{3}$
• $P(B=18) = \frac{1}{20}$
• $P(A>10 \land B=18) = \frac{2}{3}\times\frac{1}{20} = \frac{1}{30}$

$P(A>10 \lor B=18) = P(A>10) + P(B=18) – P(A>10 \land B=18)$

$\qquad = \frac{2}{3} + \frac{1}{20} – \frac{1}{30} = \frac{40 + 3 – 2}{60} = \frac{41}{60}$

$\text{Estimated number of tuples} = \frac{41}{60}\times1200 = 820$

The above answer is TRUE for SQL SELECT but not for Relational Algebra as by theory relational algebra operates on a set which means all the elements must be distinct. Since we have $15$ distinct possible values for $A$ and $20$ distinct possible values for $B,$ in strict relational algebra we’ll get

$\text{Estimated number of tuples} = \frac{41}{60}\times (15 \times 20) = 205.$

Official Answer: $205$ OR $820.$

by

But since Relational Database is just a Relation, which itself is a Set, then how can we have duplicates?
Anybody going to challenge this question ?
null values are also possible in A or  B

Remember relational algebra operator select is also distinct

By principal of mutual inclusion exclusion

For each A>10 (11-20)

there are 20 distinct B(1-20)

So current total = 200 (20*10)

For each B = 18

There are 15 distinct A

current total = 200 + 15 = 215

Now excluding the tuples which we have counted twice i.e B=18 and A>10

There are 10 such tuples

So current total = 200 +15 -10

Final Answer should be 205 according to me

You are correct but it is also true that no tuple in a table have same two rows ,so how can they say that a relation(A,B) with A having atmost 15 different values and B having atmost 20 different values has 1200 tuples.

Don't you think its quite ambiguous?
In question the relation r is in a relational database so it can store duplicate tuples. Also the select operator works individually on each tuple and do the horizontal partition of the relation based on condition thus the result relation may contain duplicates so in my opinion 820 should be the answer.

@Sanskar @Saketp The given relation has only two attributes so it is quite obvious that there will be duplicates

@Sanskar how can you even say no two tuples have same two rows

First of all you mean to say column and second there are only two attributes and 300 different possible combinations of AB tuple

So answer can never be greater than 300

@Saketp

Relational algebra can never give duplicates in the answer

That is the first rule of Relational algebra

@meetdoshi90

For each B = 18

There are 15 distinct A.
How can a relation have duplicate tuple?

I think the question is ambiguous because as per question there can be a maximum of 15x20=300 tuples, but the question says it has 1200 tuples. How can it be possible?

@Meetdoshi90 Refer https://www.cs.wcupa.edu/~zjiang/RDB_table.htm for properties of DBMS relation.

It says every row must be unique

@Hradesh patel you can make this out from the question itself

A= {6,7,8,9,……,19,20}

Therefore for B = 18 we can pair each A

A B

6 18

7 18

8 18

9 18

.

.

.

.

19 18

20 18

Total 15 tuples
@ujju

The question is not ambiguous

Question says 1200 tuples in the relation

It can even be a million but after we apply any relational operation on the relation

The number of tuples can never be greater than 300 in the answer

@Sanskar

Row = Tuple you can use the term interchangeably

You mean to say the table itself is wrong as there cannot be duplicates

But the relational table does not need all rows distinct

You need to bring the table to normal forms to remove duplicates, data redundancy, etc

By itself relational table can have composite values , multi valued attributes, duplicates, null values,etc.

You cannot ever say that each tuple in a table needs to be unique unless it is mentioned there is primary key or it is in a normal form.

@Meetdoshi90 1NF is compulsory for all tables by default .

However other normal forms are optional.
@Meetdoshi90 This is from the slide 6 of link you mentioned.

Relational model is set-based (no duplicate tuples)
• Relation R has no duplicates, therefore selection cannot produce
duplicates.

But what if relation R has duplicate tuples as given in the question and remember that the relation r in the question is from relational database and not a relation based on formal relational model, then from slide 7 of your mentioned document changing the above relational algebra query to sql query we get

SELECT *
FROM Employee
WHERE JobType = 'Faculty';

And select * do not remove duplicates.This is my logic.
@Saketp

the question itself gives a relational query how can you think or consider it equivalent Sql query

and the storage form is a relational table

You're trying to complicate a relatively easy question
Our point is just one that in a relation duplicate rows are not allowed. So they shouldn’t have given 1200 tuples in question.

Everywhere we read this only that a set represent a tuple in a relation and in a set duplicates are not allowed.That’s why we are confused and thinking that the question is ambiguous.
@Sanskar
Recall and tell me on what kind of table do we perform relational algebra
Normal database
edited by

This is a snippet of navathe book. You can read Q 3.3

Hope you will understand what i am trying to say

I am getting marks but you should definitely try to challenge this.

I see many people are confused with this question.

Let me clarify a bit.

Let’s read the question once again:

A relation $r(A,B)$ in a relational database has $1200$ tuples. The attribute $A$ has integer values ranging from $6$ to $20$, and the attribute $B$ has integer values ranging from $1$ to $20$. Assume that the attributes $A$ and $B$ are independently distributed.

The estimated number of tuples in the output of $σ_{(A>10)∨(B=18)}(r)$

is ____________.

Note the had the term been “relational model” instead of “relational database”, then we could have argued that the table is built using the classical set theory concept.

But since they have used the term “relational database” we could think that they have talking about a specific implementation. So there is no harm in considering the table as an SQL table. As such we can say that duplicates are allowed in the table.

We can further confirm this as follows:

$$E[A=a, B=b]=P(A=a,B=b). 1200 = \frac{1}{15} \times \frac{1}{20} \times 1200 = 8 \neq 1$$

So as such we are sure that the actual database has duplicate tuples. And the relational algebra query as such can be thought of as follows:

$\text{SELECT *}$

$\text{FROM r}$

$\text{WHERE A>10 AND B=18}$

I think pi projection eleminates duplicates but not the sigma. Please correct me If im wrong

I think pi projection eliminates duplicates but not the sigma.

The sirs as well have said the same thing. But I could please provide me with a screenshot of the textbook from where you have read this? (I just wanted to verify, that’s all)

What is the final answer, 820 or 205?
marks were awarded for both of these answers.
nice explanation