edited by
14,981 views
44 votes
44 votes

Consider the relation enrolled (student, course) in which (student, course) is the primary key, and the relation paid (student, amount) where student is the primary key. Assume no null values and no foreign keys or integrity constraints. Assume that amounts $6000, 7000, 8000, 9000$ and $10000$ were each paid by $20\%$ of the students. Consider these query plans (Plan 1 on left, Plan 2 on right) to “list all courses taken by students who have paid more than x”

A disk seek takes $4ms,$ disk data transfer bandwidth is $300$ MB/s and checking a tuple to see if amount is greater than $x$ takes $10µs.$ Which of the following statements is correct?

  1. Plan 1 and Plan 2 will not output identical row sets for all databases
  2. A course may be listed more than once in the output of Plan 1 for some databases
  3. For $x = 5000,$ Plan 1 executes faster than Plan 2 for all databases
  4. For $x = 9000,$ Plan I executes slower than Plan 2 for all databases
edited by

8 Answers

36 votes
36 votes

If we carefully look at both the query plans, we can easily identify which plan runs faster.

Let's first understand how we can efficiently run nested loop join.

Algorithm is :

Two relations R and S are joined as follows:

  For each tuple r in R do
     For each tuple s in S do
        If r and s satisfy the join condition
           Then output the tuple <r,s>

According to https://technet.microsoft.com/en-us/library/ms191318(v=sql.105).aspx article from microsoft

This algorithm runs fast if the number of tuples in outer loop are less.

Since, in plan1 we are reducing the number of tuples by placing where condition for selecting rows of plan relation, the join algorithm of plan1 is guaranteed to run faster than that of plan2 and hence Plan1 will execute faster than plan2.

So, answer is C.

32 votes
32 votes

Answer should be (C)

In all cases plan $1$ is faster than plan $2$ cause in plan $1$ we are reducing the load by doing select amount $>x$ and then the loop

But, in case of plan 2 its in the nested loop so it need to check every time and will take more time to execute .

edited by
5 votes
5 votes

Both B and C should be true.

Obviously applying select operation before the join operation is better than plan2 bcoz it helps in optimising the entire Join-Selection process.


A is false.

B is true.

suppose two students paid more than x opting for same course so both will be selected as per select operation in plan1 and then we will apply Join between result of select and enrolled printing same course twice.

C is true. 

Effect of select operation becomes more evident when x is any value >=6000(min. amount) bcoz then select will remove these tuples at very first place instead of removing them after join.
But even if x=5000 then select operation though becomes useless bcoz all amounts are greater than or equal to 6000.But still Plan1 is faster than Plan2 bcoz it has to apply select(scanning the tuples) on less number of tuples as compared to Plan2 which will apply select after JOIN leading to scanning of more tuples(although none would be eliminated).

D is false
D would have been true ---> "For x = 9000, Plan I executes faster than Plan 2 for all databases"

3 votes
3 votes

Option A is totally false, as both queries output same results.

 

Option B seems right, but projection never allows duplicates(in relational algebra). So, option B is obviously false(The only confusion is if we treat the question as part of SQL, then projection is same as select and it produces duplicates.

 

Option C also seems right.

If we have x=5000, all tuples are already OK. We don't need to filter anything, but still, SQL don't know this. So, the procedure will be same as for other conditions.

For eg suppose enrolled relation contains 1000 tuples and paid relation contains 500 tuples. Then distinct number of students must be atleast 500(because paid relation contains all distinct student and there might be some students in enrolled who haven't paid the fees yet).

Then In Query Ist, we need 500 comparison operations to select tuples where amount > x, and suppose the tuples where x > amount comes out to be 200. Then we get 200 * 1000 = 200,000 entries in output table from which we have to select only the tuples where enrolled.student = paid.student. So total amount of work = 200,000 + 500 = 200,500.

Now, if we go with Query IInd, we first need to do cross product = 500 * 1000 = 500,000, then filter comparison conditions on all 500,000 tuples. So total work done = 500,000 * 2 = 1,000,000

Clearly, we can see option C comes out to be right.

 

This example also counters the Option D, and hence option D is false.

Answer:

Related questions