retagged
12,825 views
60 votes
60 votes

Consider the following relation schemas :

  • b-Schema = (b-name, b-city, assets)
  • a-Schema = (a-num, b-name, bal)
  • d-Schema = (c-name, a-number)

Let branch, account and depositor be respectively instances of the above schemas. Assume that account and depositor relations are much bigger than the branch relation.

Consider the following query:

Пc-nameb-city = "Agra" ⋀ bal < 0 (branch ⋈ (account ⋈ depositor)

Which one of the following queries is the most efficient version of the above query ?

  1. Пc-namebal < 0 b-city = "Agra" branch ⋈ account) ⋈ depositor)
  2. Пc-nameb-city = "Agra" branch ⋈ (σbal < 0 account ⋈ depositor))
  3. Пc-name ((σb-city = "Agra" branch ⋈ σb-city = "Agra" ⋀  bal < 0 account) ⋈ depositor)
  4. Пc-nameb-city = "Agra" branch ⋈ (σb-city = "Agra" ⋀  bal < 0 account ⋈ depositor))
retagged

4 Answers

Best answer
72 votes
72 votes

It should be A. As in B we are doing a join between two massive table whereas in A we are doing join between relatively smaller table and larger one and the output that this inner table gives (which is smaller in comparison to joins that we are doing in B) is used for join with depositer table with the selection condition.

Options C and D are invalid as there is no b-city column in a-Schema.


Lets see in detail. Let there be 100 different branches. Say about $10$% of accounts are below $0$. Also, let there be $10,000$ accounts in a branch amounting to $1,000,000$ total accounts. A customer can have multiple accounts, so let there be on average $2$ accounts per customer. So, this amounts to $2,000,000$ total entries in depositor table. Lets assume these assumptions are true for all the branches. So, now lets evaluate options A and B.

1. All the accounts in Agra branch, filter by positive balance, and then depositor details of them. So, 

  • Get branch name from branch table after processing $100$ records
  • Filter $10,000$ accounts after processing $1,000,000$ accounts belonging to Agra
  • Filter 1000 accounts after processing 10,000 accounts for positive balance
  • Get $500$ depositor details after processing $2,000,000$ entries for the given $1000$ accounts (assuming $1$ customer having $2$ accounts). So, totally this amounts to $2,000,000,000$ record processing.
  • So totally $\approx$ 2 billion records needs processing.

2. All the positive balance accounts are found first, and then those in Agra are found.

  • Filter $100,000$ accounts after processing $1,000,000$ accounts having positive balance
  • Find the deposito details of these accounts. So, $100,000 $*$ 2,000,000$ records need processing and this is a much larger value than for query A. Even if we reduce the percentage of positive balance ($10$ we assumed) the record processing of query A will also get reduced by same rate. So, overall query A is much better than query B.
edited by
10 votes
10 votes

B) as  b  is very small compared to a and d which is  also taken in account also from the fact that there is a condition  b-city = "Agra" branch which already filter  city as agra making it small  and before that σbal < 0 account ⋈ depositor filter  and give selected tables so ⋈  between them will give same result and better one.
Please correct If Not also 3 and 4 are wrong as σ
b-city = "Agra"   bal < 0 account ⋈ depositor selects city from account ⋈ depositor which is not in either Schemas.

5 votes
5 votes
I guess the most efficient one is this one:

$\prod_{c\_name}(((\sigma_{b\_city="Agra"}branch)\bowtie(\sigma_{bal<0}account))\bowtie depositor)$
 

though it is not given in the options.

Am I right @Arjun sir?
edited by
–2 votes
–2 votes
As the conditional expression contains the comparison of attributes with constants so they can be applied independently  for filtering out some tuples  before performing the join operation over the relations because this will reduce the relation to smaller ones and join can be easily on smaller relations.

Given query is inefficient as join is performed before the selecting the tuples from the relation so we have to select the efficient query from options.

Option A have condition ( bal < 0 ) which is performed after performing join of branch and account So, inefficient.

Option C have condition ( b-city = 'agra' ) with account relation but b-city is not there in account. So, inefficient.

Option D is nearly same as C, so inefficient.

Option B is the most efficient one as filtering of tuples is performed before joining.

Therefore, Option B is correct.
Answer:

Related questions

46 votes
46 votes
5 answers
2
53 votes
53 votes
5 answers
4
Ishrat Jahan asked Oct 30, 2014
17,687 views
Consider the following two transactions$: T1$ and $T2.$$\begin{array}{clcl} T1: & \text{read (A);} & T2: & \text{read (B);} \\ & \text{read (B);} & & \text{read (A);} \\ ...