Dark Mode

7,218 views

19 votes

A relation $r(A, B)$ in a relational database has $1200$ tuples. The attribute $A$ has integer values ranging from $6$ to $20$, and the attribute $B$ has integer values ranging from $1$ to $20$. Assume that the attributes $A$ and $B$ are independently distributed.

The estimated number of tuples in the output of $\sigma _{(A>10)\vee(B=18)}(r)$ is ____________.

The estimated number of tuples in the output of $\sigma _{(A>10)\vee(B=18)}(r)$ is ____________.

34 votes

Best answer

- $P(A>10) = \frac{10}{15} = \frac{2}{3}$
- $P(B=18) = \frac{1}{20}$
- $P(A>10 \land B=18) = \frac{2}{3}\times\frac{1}{20} = \frac{1}{30}$

$P(A>10 \lor B=18) = P(A>10) + P(B=18) – P(A>10 \land B=18)$

$\qquad = \frac{2}{3} + \frac{1}{20} – \frac{1}{30} = \frac{40 + 3 – 2}{60} = \frac{41}{60}$

$\text{Estimated number of tuples} = \frac{41}{60}\times1200 = 820$

The above answer is TRUE for SQL SELECT but not for Relational Algebra as by theory relational algebra operates on a set which means all the elements must be distinct. Since we have $15$ distinct possible values for $A$ and $20$ distinct possible values for $B,$ in strict relational algebra we’ll get

$\text{Estimated number of tuples} = \frac{41}{60}\times (15 \times 20) = 205.$

Official Answer: $205$ OR $820.$

0

10 votes

To all those answering 820

**Remember relational algebra operator select is also distinct**

By principal of mutual inclusion exclusion

For each **A>10 (11-20)**

there are **20 distinct B(1-20)**

So *current total = 200 (20*10)*

For each **B = 18 **

There are **15 distinct A**

*current total = 200 + 15 = 215*

Now excluding the tuples which we have counted twice i.e B=18 and A>10

There are 10 such tuples

So *current total = 200 +15 -10*

*Final Answer should be 205 according to me*

*Sadly I made a silly mistake and entered 210 :)*

1

In question the relation r is in a relational database so it can store duplicate tuples. Also the select operator works individually on each tuple and do the horizontal partition of the relation based on condition thus the result relation may contain duplicates so in my opinion 820 should be the answer.

0

@Sanskar @Saketp The given relation has only two attributes so it is quite obvious that there will be duplicates

@Sanskar how can you even say no two tuples have same two rows

First of all you mean to say column and second there are only two attributes and 300 different possible combinations of AB tuple

So answer can never be greater than 300

@Saketp

Relational algebra can never give duplicates in the answer

That is the first rule of Relational algebra

https://student.cs.uwaterloo.ca/~cs338/slides/6%20Rel%20Algebra.pdf Slide 5 and 6

0

1

@Meetdoshi90 Refer https://www.cs.wcupa.edu/~zjiang/RDB_table.htm for properties of DBMS relation.

It says every row must be unique

0

1

1

@Sanskar

Row = Tuple you can use the term interchangeably

You mean to say the table itself is wrong as there cannot be duplicates

But the relational table does not need all rows distinct

You need to bring the table to normal forms to remove duplicates, data redundancy, etc

**By itself relational table can have composite values , multi valued attributes, duplicates, null values,etc.**

You cannot ever say that each tuple in a table needs to be unique unless it is mentioned there is primary key or it is in a normal form.

0

0

@Meetdoshi90 This is from the slide 6 of link you mentioned.

Relational model is set-based (no duplicate tuples)

• Relation R has no duplicates, therefore selection cannot produce

duplicates.

But what if relation R has duplicate tuples as given in the question and remember that the relation r in the question is from relational database and not a relation based on formal relational model, then from slide 7 of your mentioned document changing the above relational algebra query to sql query we get

SELECT *

FROM Employee

WHERE JobType = 'Faculty';

And select * do not remove duplicates.This is my logic.

Relational model is set-based (no duplicate tuples)

• Relation R has no duplicates, therefore selection cannot produce

duplicates.

But what if relation R has duplicate tuples as given in the question and remember that the relation r in the question is from relational database and not a relation based on formal relational model, then from slide 7 of your mentioned document changing the above relational algebra query to sql query we get

SELECT *

FROM Employee

WHERE JobType = 'Faculty';

And select * do not remove duplicates.This is my logic.

0

0

Our point is just one that in a relation duplicate rows are not allowed. So they shouldn’t have given 1200 tuples in question.

Everywhere we read this only that a set represent a tuple in a relation and in a set duplicates are not allowed.That’s why we are confused and thinking that the question is ambiguous.

Everywhere we read this only that a set represent a tuple in a relation and in a set duplicates are not allowed.That’s why we are confused and thinking that the question is ambiguous.

0

1

I see many people are confused with this question.

Let me clarify a bit.

Let’s read the question once again:

A relation $r(A,B)$ in

has $1200$ tuples. The attribute $A$ has integer values ranging from $6$ to $20$, and the attribute $B$ has integer values ranging from $1$ to $20$. Assume that the attributes $A$ and $B$ are independently distributed.a relational databaseThe estimated number of tuples in the output of $σ_{(A>10)∨(B=18)}(r)$

is ____________.

Note the had the term been **“relational model” **instead of **“relational database”**, then we could have argued that the table is built using the classical set theory concept.

But since they have used the term **“relational database” **we could think that they have talking about a specific implementation. So there is no harm in considering the table as an SQL table. As such we can say that duplicates are allowed in the table.

We can further confirm this as follows:

$$E[A=a, B=b]=P(A=a,B=b). 1200 = \frac{1}{15} \times \frac{1}{20} \times 1200 = 8 \neq 1$$

So as such we are sure that the actual database has duplicate tuples. And the relational algebra query as such can be thought of as follows:

$\text{SELECT *}$

$\text{FROM r}$

$\text{WHERE A>10 AND B=18}$

0