edited by
15,003 views
44 votes
44 votes

Consider the relation enrolled (student, course) in which (student, course) is the primary key, and the relation paid (student, amount) where student is the primary key. Assume no null values and no foreign keys or integrity constraints. Assume that amounts $6000, 7000, 8000, 9000$ and $10000$ were each paid by $20\%$ of the students. Consider these query plans (Plan 1 on left, Plan 2 on right) to “list all courses taken by students who have paid more than x”

A disk seek takes $4ms,$ disk data transfer bandwidth is $300$ MB/s and checking a tuple to see if amount is greater than $x$ takes $10µs.$ Which of the following statements is correct?

  1. Plan 1 and Plan 2 will not output identical row sets for all databases
  2. A course may be listed more than once in the output of Plan 1 for some databases
  3. For $x = 5000,$ Plan 1 executes faster than Plan 2 for all databases
  4. For $x = 9000,$ Plan I executes slower than Plan 2 for all databases
edited by

8 Answers

1 votes
1 votes
since, both plans need to load both the tables viz.courses and enrolled in the main memory. So disk access time is same for both plans.
Plan 2 does more number of comparisons compared to plan 1 because:

1) filtering of rows is done before join thus less comparison done because of less no of tuples.
1) Join operation will require more comparisons as the second table will have more rows in plan 2 compared to plan 1.
1 votes
1 votes

Option B is correct.

Example: Consider enrolled tuples (s1,c1) and (s2,c1) & paid tuples (s1, 200) and (s2, 200).

So for the query “list all courses taken by students who have paid more than x”, for x=100 will return c1 twice (once for s1 and s2 each).


For Option C & D, consider the following

let number of rows in enrolled be e and number of rows in paid be p.

Plan 1: 

select p comparisons

nested join e*p comparisons (considering worst case when all p are selected)

Total comparisonsp + e*p

 

Plan 2:

nested join = e*p comparisons

select = e comparisons (as ideally e>=p)

Total comparisonse*p + e

 

So speed depends upon the database as,

Plan 1 will be faster for e>p

Plan 1 and Plan 2 will be same for e=p.

So the assertion that one will be faster than other for all database does not hold true.

0 votes
0 votes
Although plan 1 and plan2 will output same results, but the execution time for each plan will be different
 
Let total tuples in Enrolled = 100 in Paid = 200 
Let x= 5000 and tuples having x= 5000 is 10 
 
Plan 1 first selects the amount> 5000 and then Joins
 So , it searches 200 tuples and that will result in 10 tuples
So 200 times disk accessed
Now join it with 100 tuples of Enrolled and let we get the result of 50 tuples
 
Plan 2 first do join on tables, so let assume 200 tuples matched with 100 tuples and let we get 300 tuples, 
Then we do search for amount>x , 
Here we have to search x in 300 tuples , so disk time is more here as compared to plan 1
 
 
So plan 1 executes faster than plan 2 for all values of x 
 
Ans is C
 
 
 
 
 
Answer:

Related questions