Consider the relation enrolled (student, course) in which (student, course) is the primary key, and the relation paid (student, amount) where student is the primary key. Assume no null values and no foreign keys or integrity constraints. Assume that amounts 6000, 7000, 8000, 9000 and 10000 were each paid by 20% of the students. Consider these query plans (Plan 1 on left, Plan 2 on right) to “list all courses taken by students who have paid more than x”
A disk seek takes 4ms, disk data transfer bandwidth is 300 MB/s and checking a tuple to see if amount is greater than x takes 10µs. Which of the following statements is correct?
I think it should be C)
In all cases plan $1$ is faster than plan $2$ cause in plan $1$ we are reducing the load by doing select amount $>x$ and then the loop
But, in case of plan 2 its in the nested loop so it need to check every time and will take more time to execute .
but what about this information : A disk seek takes 4ms, disk data transfer bandwidth is 300 MB/s and checking a tuple to see if amount is greater than x takes 10µs....... does it make no sense???
for x=5000, all amounts are greater than 5000, bcoz amounts 6000, 7000, 8000, 9000 and 10000 were each paid by 20% of the students. So i dont think it makes any difference, I think ans should be B.
Both B and C should be true.
Obviously applying select operation before the join operation is better than plan2 bcoz it helps in optimising the entire Join-Selection process.
A is false.
B is true.
suppose two students paid more than x opting for same course so both will be selected as per select operation in plan1 and then we will apply Join between result of select and enrolled printing same course twice.
C is true.
Effect of select operation becomes more evident when x is any value >=6000(min. amount) bcoz then select will remove these tuples at very first place instead of removing them after join.
But even if x=5000 then select operation though becomes useless bcoz all amounts are greater than or equal to 6000.But still Plan1 is faster than Plan2 bcoz it has to apply select(scanning the tuples) on less number of tuples as compared to Plan2 which will apply select after JOIN leading to scanning of more tuples(although none would be eliminated).
D is false
D would have been true ---> "For x = 9000, Plan I executes faster than Plan 2 for all databases"
If we carefully look at both the query plans, we can easily identify which plan runs faster.
Let's first understand how we can efficiently run nested loop join.
Algorithm is :
Two relations R and S are joined as follows:
For each tuple r in R do
For each tuple s in S do
If r and s satisfy the join condition
Then output the tuple <r,s>
According to https://technet.microsoft.com/en-us/library/ms191318(v=sql.105).aspx article from microsoft
This algorithm runs fast if the number of tuples in outer loop are less.
Since, in plan1 we are reducing the number of tuples by placing where condition for selecting rows of plan relation, the join algorithm of plan1 is guaranteed to run faster than that of plan2 and hence Plan1 will execute faster than plan2.
So, answer is C.