The Gateway to Computer Science Excellence
First time here? Checkout the FAQ!
x
+18 votes
2.3k views

Consider the relation enrolled (student, course) in which (student, course) is the primary key, and the relation paid (student, amount) where student is the primary key. Assume no null values and no foreign keys or integrity constraints. Assume that amounts 6000, 7000, 8000, 9000 and 10000 were each paid by 20% of the students. Consider these query plans (Plan 1 on left, Plan 2 on right) to “list all courses taken by students who have paid more than x”

A disk seek takes 4ms, disk data transfer bandwidth is 300 MB/s and checking a tuple to see if amount is greater than x takes 10µs. Which of the following statements is correct?

  1. Plan 1 and Plan 2 will not output identical row sets for all databases
  2. A course may be listed more than once in the output of Plan 1 for some databases
  3. For x = 5000, Plan 1 executes faster than Plan 2 for all databases
  4. For x = 9000, Plan I executes slower than Plan 2 for all databases
asked in Databases by Active (3.7k points) | 2.3k views
0
how the answer would depend on value of x?
+1
Consider both tables 100 each, then given we perform comparison in plan 1 on 100 tuples in paid relation time is 100*10us, where comparsion after join takes 10000*10us in plan 2,  even if it not filters any tuples according to option c,

You also add now other factors similarly

5 Answers

+18 votes

I think it should be C)

In all cases plan $1$ is faster than plan $2$ cause in plan $1$ we are reducing the load by doing select amount $>x$ and then the loop

But, in case of plan 2 its in the nested loop so it need to check every time and will take more time to execute .

answered by Boss (10.3k points)
edited by
+1

but what about this information : A disk seek takes 4ms, disk data transfer bandwidth is 300 MB/s and checking a tuple to see if amount is greater than x takes 10µs....... does it make no sense???

+5

for x=5000, all amounts are greater than 5000, bcoz amounts 6000, 7000, 8000, 9000 and 10000 were each paid by 20% of the students. So i dont think it makes any difference, I think ans should be B.

0
Consider both tables 100 each, then given we perform comparison in plan 1 on 100 tuples in paid relation time is 100*10us, where comparsion after join takes 10000*10us in plan 2,  even if it not filters any tuples according to option c,
0
plan 1 is always faster than plan 2 am I right?
0
Yes, I Agree with you that Plan 1 will execute Faster than the Plan 2 because of select Condition, But Here in Option C it is Asking About for X=5000, So in Plan-1 all the entries are Satisfying the condition  because of the Statement "Assume that amounts 6000, 7000, 8000, 9000 and 10000 were each paid by 20% of the students".So in Both cases Join Condition Applies on all the Entries and takes Equal For this Value of X.

So Option C Should be Correct.
+5 votes

Both B and C should be true.

Obviously applying select operation before the join operation is better than plan2 bcoz it helps in optimising the entire Join-Selection process.


A is false.

B is true.

suppose two students paid more than x opting for same course so both will be selected as per select operation in plan1 and then we will apply Join between result of select and enrolled printing same course twice.

C is true. 

Effect of select operation becomes more evident when x is any value >=6000(min. amount) bcoz then select will remove these tuples at very first place instead of removing them after join.
But even if x=5000 then select operation though becomes useless bcoz all amounts are greater than or equal to 6000.But still Plan1 is faster than Plan2 bcoz it has to apply select(scanning the tuples) on less number of tuples as compared to Plan2 which will apply select after JOIN leading to scanning of more tuples(although none would be eliminated).

D is false
D would have been true ---> "For x = 9000, Plan I executes faster than Plan 2 for all databases"

answered by Active (2.7k points)
+4
Project operation removes duplicates. B should not be true. Isn't so ?
0
i think same in plan B. duplicate courses may come in the result of plan B toooo. !!!
+4 votes

If we carefully look at both the query plans, we can easily identify which plan runs faster.

Let's first understand how we can efficiently run nested loop join.

Algorithm is :

Two relations R and S are joined as follows:

  For each tuple r in R do
     For each tuple s in S do
        If r and s satisfy the join condition
           Then output the tuple <r,s>

According to https://technet.microsoft.com/en-us/library/ms191318(v=sql.105).aspx article from microsoft

This algorithm runs fast if the number of tuples in outer loop are less.

Since, in plan1 we are reducing the number of tuples by placing where condition for selecting rows of plan relation, the join algorithm of plan1 is guaranteed to run faster than that of plan2 and hence Plan1 will execute faster than plan2.

So, answer is C.

answered by Boss (13.9k points)
0 votes
I think answer is B.
answered by Active (1.4k points)
0 votes
since, both plans need to load both the tables viz.courses and enrolled in the main memory. So disk access time is same for both plans.
Plan 2 does more number of comparisons compared to plan 1 because:

1) filtering of rows is done before join thus less comparison done because of less no of tuples.
1) Join operation will require more comparisons as the second table will have more rows in plan 2 compared to plan 1.
answered by Junior (895 points)
Answer:

Related questions



Quick search syntax
tags tag:apple
author user:martin
title title:apple
content content:apple
exclude -tag:apple
force match +apple
views views:100
score score:10
answers answers:2
is accepted isaccepted:true
is closed isclosed:true

40,874 questions
47,534 answers
146,072 comments
62,300 users