recategorized by
105 views
2 votes
2 votes

Consider the following SQL query:

SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = 'Manager' AND b.city = 'Bangalore');


Which of the following relational algebra query gives the most optimal execution strategy for the above SQL query assuming no indexing and that there are $1000$ tuples in Staff, $100$ tuples in Branch, $100$ Managers (one for each branch), and $10$ Bangalore branches?

  1. $\sigma_{ ( position = 'Manager') \wedge ( city ='Bangalore') \wedge ( \text{Staff}.branchNo = \text{Branch}.branchNo )} ( \text{Staff} \times \text{Branch} )$
  2. $\sigma_{ ( position ='Manager') \wedge ( city ='Bangalore')} ( \text{Staff} \bowtie_{\text{Staff}.branchNo = \text{Branch}.branchNo} \text{Branch} )$
  3. $\sigma_{ ( position ='Manager') } ( \text{Staff} \bowtie_{\text{Staff}.branchNo = \text{Branch}.branchNo} \text{Branch} )$
  4. $(\sigma_{ position ='Manager'} ( \text{Staff} )) \bowtie_ {\text{Staff}.branchNo = \text{Branch}.branchNo} (\sigma_{ city ='Bangalore'} ( \text{Branch} ))$
recategorized by

1 Answer

Best answer
3 votes
3 votes
Query C is not equivalent to the given SQL. So, we can ignore it.

We can compare these three queries based on the number of tuple accesses required (ignoring the tuple size).

The first query calculates the Cartesian product of Staff and Branch, which requires
$(1000 + 100)$ tuple accesses to read the relations, and creates a relation with $(1000 \times 100)$ tuples. We then have to read each of these tuples again to test them against the selection predicate at a cost of another $(1000 \times  100)$ tuple accesses, giving a total cost of $(1000 + 100) + 2\times (1000 \times 100)  =  201,100$ tuple accesses.

The second query joins Staff and Branch on the branch number branchNo, which again
requires $(1000 + 100)$ tuple accesses to read each of the relations. We know that the join of the two relations has $1000$ tuples, one for each member of staff (a member of staff
can only work at one branch). Consequently, the Selection operation requires $1000$ tuple
accesses to read the result of the join, giving a total cost of $2\times 1000 + (1000 + 100) = 3,100$ tuple accesses.

The final query first reads each Staff tuple to determine the Manager tuples, which
requires $1000$ tuple accesses and produces a relation with $100$ tuples. The second
Selection operation reads each Branch tuple to determine the Bangalore branches, which
requires $100$ tuple accesses and produces a relation with $10$ tuples. The final operation is the join of the reduced Staff and Branch relations, which requires $(100 \times 10)$ tuple accesses, giving a total cost of $1000 + 2\times 100 + 10 + (100 + 10) = 1,320$ tuple accesses.

Thus query D is the most optimal one.
selected by
Answer:

Related questions

1 votes
1 votes
1 answer
1
gatecse asked Oct 15, 2020
62 views
Given that the relations $R(A,B)$ and $S(C,D)$.$$\overset{R}{\begin{array}{|c|c|c|c|}\hline\textbf{A} & \textbf{B}\\\hline1 & a\\\hline2 & c\\\hline\end{array}}\qquad \ov...
1 votes
1 votes
1 answer
2
gatecse asked Oct 15, 2020
46 views
Given the Sailors table$${\textbf{Sailors}}$$$${\begin{array}{|c|c|c|c|}\hline\textbf{Sid} & \textbf{Sname} & \textbf{Rating} & \textbf{Age}\\\hline1 & \text{Peter} & 10 ...