2.5k views

Consider the relation enrolled (student, course) in which (student, course) is the primary key, and the relation paid (student, amount) where student is the primary key. Assume no null values and no foreign keys or integrity constraints. Assume that amounts 6000, 7000, 8000, 9000 and 10000 were each paid by 20% of the students. Consider these query plans (Plan 1 on left, Plan 2 on right) to “list all courses taken by students who have paid more than x”

A disk seek takes 4ms, disk data transfer bandwidth is 300 MB/s and checking a tuple to see if amount is greater than x takes 10µs. Which of the following statements is correct?

1. Plan 1 and Plan 2 will not output identical row sets for all databases
2. A course may be listed more than once in the output of Plan 1 for some databases
3. For x = 5000, Plan 1 executes faster than Plan 2 for all databases
4. For x = 9000, Plan I executes slower than Plan 2 for all databases
0
how the answer would depend on value of x?
+1
Consider both tables 100 each, then given we perform comparison in plan 1 on 100 tuples in paid relation time is 100*10us, where comparsion after join takes 10000*10us in plan 2,  even if it not filters any tuples according to option c,

You also add now other factors similarly

I think it should be C)

In all cases plan $1$ is faster than plan $2$ cause in plan $1$ we are reducing the load by doing select amount $>x$ and then the loop

But, in case of plan 2 its in the nested loop so it need to check every time and will take more time to execute .

edited
+1

but what about this information : A disk seek takes 4ms, disk data transfer bandwidth is 300 MB/s and checking a tuple to see if amount is greater than x takes 10µs....... does it make no sense???

+5

for x=5000, all amounts are greater than 5000, bcoz amounts 6000, 7000, 8000, 9000 and 10000 were each paid by 20% of the students. So i dont think it makes any difference, I think ans should be B.

0
Consider both tables 100 each, then given we perform comparison in plan 1 on 100 tuples in paid relation time is 100*10us, where comparsion after join takes 10000*10us in plan 2,  even if it not filters any tuples according to option c,
0
plan 1 is always faster than plan 2 am I right?
0
Yes, I Agree with you that Plan 1 will execute Faster than the Plan 2 because of select Condition, But Here in Option C it is Asking About for X=5000, So in Plan-1 all the entries are Satisfying the condition  because of the Statement "Assume that amounts 6000, 7000, 8000, 9000 and 10000 were each paid by 20% of the students".So in Both cases Join Condition Applies on all the Entries and takes Equal For this Value of X.

So Option C Should be Correct.

Both B and C should be true.

Obviously applying select operation before the join operation is better than plan2 bcoz it helps in optimising the entire Join-Selection process.

A is false.

B is true.

suppose two students paid more than x opting for same course so both will be selected as per select operation in plan1 and then we will apply Join between result of select and enrolled printing same course twice.

C is true.

Effect of select operation becomes more evident when x is any value >=6000(min. amount) bcoz then select will remove these tuples at very first place instead of removing them after join.
But even if x=5000 then select operation though becomes useless bcoz all amounts are greater than or equal to 6000.But still Plan1 is faster than Plan2 bcoz it has to apply select(scanning the tuples) on less number of tuples as compared to Plan2 which will apply select after JOIN leading to scanning of more tuples(although none would be eliminated).

D is false
D would have been true ---> "For x = 9000, Plan I executes faster than Plan 2 for all databases"

+4
Project operation removes duplicates. B should not be true. Isn't so ?
+1
i think same in plan B. duplicate courses may come in the result of plan B toooo. !!!
0
Consider the database in which every student enrolls for a single course only.

In such case : no. of entries in Enrolled table = no. of entries in Paid table

For this database, count of scan & select will be the same in both plans for x=5000, making both plan execution equivalent in speed.
0
What if Each Tuple Corresponds to Enrolled get join with exactly one tuple of Paid.Then in this case , The time taken by select is also same.
0
B is not true as question says "Project on Course" and it will have no duplicate tuples.

Project operation is "select distinct"

If we carefully look at both the query plans, we can easily identify which plan runs faster.

Let's first understand how we can efficiently run nested loop join.

Algorithm is :

Two relations R and S are joined as follows:

  For each tuple r in R do
For each tuple s in S do
If r and s satisfy the join condition
Then output the tuple <r,s>

According to https://technet.microsoft.com/en-us/library/ms191318(v=sql.105).aspx article from microsoft

This algorithm runs fast if the number of tuples in outer loop are less.

Since, in plan1 we are reducing the number of tuples by placing where condition for selecting rows of plan relation, the join algorithm of plan1 is guaranteed to run faster than that of plan2 and hence Plan1 will execute faster than plan2.

since, both plans need to load both the tables viz.courses and enrolled in the main memory. So disk access time is same for both plans.
Plan 2 does more number of comparisons compared to plan 1 because:

1) filtering of rows is done before join thus less comparison done because of less no of tuples.
1) Join operation will require more comparisons as the second table will have more rows in plan 2 compared to plan 1.

Option B is correct.

Example: Consider enrolled tuples (s1,c1) and (s2,c1) & paid tuples (s1, 200) and (s2, 200).

So for the query “list all courses taken by students who have paid more than x”, for x=100 will return c1 twice (once for s1 and s2 each).

For Option C & D, consider the following

let number of rows in enrolled be e and number of rows in paid be p.

Plan 1:

select p comparisons

nested join e*p comparisons (considering worst case when all p are selected)

Total comparisonsp + e*p

Plan 2:

nested join = e*p comparisons

select = e comparisons (as ideally e>=p)

Total comparisonse*p + e

So speed depends upon the database as,

Plan 1 will be faster for e>p

Plan 1 and Plan 2 will be same for e=p.

So the assertion that one will be faster than other for all database does not hold true.

1
2