GATE Overflow Test Series | Mixed Subjects | Test 3 | Question: 25

Question

GATE Overflow Test Series | Mixed Subjects | Test 3 | Question: 25

gatecse asked Oct 15, 2020 • recategorized Oct 15, 2020 by Lakshman Bhaiya

105 views

Consider the following SQL query:

SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = 'Manager' AND b.city = 'Bangalore');

Which of the following relational algebra query gives the most optimal execution strategy for the above SQL query assuming no indexing and that there are $1000$ tuples in Staff, $100$ tuples in Branch, $100$ Managers (one for each branch), and $10$ Bangalore branches?

$\sigma_{ ( position = 'Manager') \wedge ( city ='Bangalore') \wedge ( \text{Staff}.branchNo = \text{Branch}.branchNo )} ( \text{Staff} \times \text{Branch} )$
$\sigma_{ ( position ='Manager') \wedge ( city ='Bangalore')} ( \text{Staff} \bowtie_{\text{Staff}.branchNo = \text{Branch}.branchNo} \text{Branch} )$
$\sigma_{ ( position ='Manager') } ( \text{Staff} \bowtie_{\text{Staff}.branchNo = \text{Branch}.branchNo} \text{Branch} )$
$(\sigma_{ position ='Manager'} ( \text{Staff} )) \bowtie_ {\text{Staff}.branchNo = \text{Branch}.branchNo} (\sigma_{ city ='Bangalore'} ( \text{Branch} ))$

gatecse asked Oct 15, 2020 • recategorized Oct 15, 2020 by Lakshman Bhaiya

gatecse

105 views

See all

1 Answer

Best answer

Query C is not equivalent to the given SQL. So, we can ignore it.

We can compare these three queries based on the number of tuple accesses required (ignoring the tuple size).

The first query calculates the Cartesian product of Staff and Branch, which requires
$(1000 + 100)$ tuple accesses to read the relations, and creates a relation with $(1000 \times 100)$ tuples. We then have to read each of these tuples again to test them against the selection predicate at a cost of another $(1000 \times 100)$ tuple accesses, giving a total cost of $(1000 + 100) + 2\times (1000 \times 100) = 201,100$ tuple accesses.

The second query joins Staff and Branch on the branch number branchNo, which again
requires $(1000 + 100)$ tuple accesses to read each of the relations. We know that the join of the two relations has $1000$ tuples, one for each member of staff (a member of staff
can only work at one branch). Consequently, the Selection operation requires $1000$ tuple
accesses to read the result of the join, giving a total cost of $2\times 1000 + (1000 + 100) = 3,100$ tuple accesses.

The final query first reads each Staff tuple to determine the Manager tuples, which
requires $1000$ tuple accesses and produces a relation with $100$ tuples. The second
Selection operation reads each Branch tuple to determine the Bangalore branches, which
requires $100$ tuple accesses and produces a relation with $10$ tuples. The final operation is the join of the reduced Staff and Branch relations, which requires $(100 \times 10)$ tuple accesses, giving a total cost of $1000 + 2\times 100 + 10 + (100 + 10) = 1,320$ tuple accesses.

Thus query D is the most optimal one.

gatecse answered Oct 15, 2020 • selected Oct 9, 2021 by Arjun

gatecse

See all

@gatecse, @Sachin Mittal 1 sir, @Deepak Poonia sir Please confirm if the number of tuple accesses involved in join of tuple A(n rows) and tuple B(m rows) will only be m+n. I had earlier thought that it was m*n — looooommmm, Jan 22, 2023

tags	tag:apple
author	user:martin
title	title:apple
content	content:apple
exclude	-tag:apple
force match	+apple
views	views:100
score	score:10
answers	answers:2
is accepted	isaccepted:true
is closed	isclosed:true

GATE Overflow Test Series | Mixed Subjects | Test 3 | Question: 25

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

0 reply

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

1 1 comment reply

Please log in or register to add a comment.

Related questions

0

1 1 comment