# GATE2014-2-30

9.2k views

Consider a join (relation algebra) between relations $r(R)$ and $s(S)$ using the nested loop method. There are 3 buffers each of size equal to disk block size, out of which one buffer is reserved for intermediate results. Assuming size(r(R))<size(s(S)) , the join will have fewer number of disk block accesses if

1. relation $r(R)$ is in the outer loop.
2. relation $s(S)$ is in the outer loop.
3. join selection factor between $r(R)$ and $s(S)$ is more than 0.5.
4. join selection factor between $r(R)$ and $s(S)$ is less than 0.5.

edited
17
A join between r(R) and s (S) using nested loop method will be as follows.
For each tuple r in R do
For each tuple s in S do
If r and s satisfy the join condition then output the tuple <r,s>
This algorithm will involve Nr*Bs+Br block transfers and Nr+Br seeks, where Br and Bs are
number of blocks in relations R and S respectively and Nr is number of tuple in relation R.
Now to have less block accesses, Nr should be less and it is already given that |R|<|S|. Relation
r(R) should be in the outer loop to have fewer number of disk block accesses.

1
are join algorithms in syllabus?
0
Yes.
2

Quoting from wiki

join selection factor is"the percentage (or fraction) of records in one file that will be joined with records of another file". This can be calculated when two database tables are to be joined. It is primarily concerned with query optimisation.

4

First solve https://gateoverflow.in/3847/gate2005-it-82a

first .Then it becomes very easy to understand

0
number of records grows faster than number of blocks - one of the factor of choosing r over S.
0

Can someone explain option C and D

2

@Vikrant Singh Here in this NPTEL video, the professor states the no. of seeks as  $2n_{r}$ . Which one is correct?

@4:44

1
Join selection = (no. of selected tuple/total no. of tuples in cartesian product)
how Join selection affects the no. of block access???
0

here in question we certainly don't know what the JOIN-condition is, thereby no of tuples selected in the resultant join is quite uncertain.Hence we cannot figure out the the join-selection factor basically. Is that so?

1

What does it mean by number of seeks? Although, it is not required for this Question but I went through the Korth and found that -

• We need only one seek for each scan on the inner relation s since it is read sequentially, and a total of br seeks to read r, leading to a total of nr + br seeks.
• In the best case, there is enough space for both relations to fit simultaneously in memory, so each block would have to be read only once; hence, only br + bs block transfers would be required, along with 2 seeks.

No. of Block Transfers, I can calculate but not getting what does Seeks mean. Assume, that Outer Relation is R, having br blocks and nr total tuples and Inner Relation is S, having bs blocks and ns total tuples

0

@ayushsomani

Seek time is

" When anything is read or written to a disc drive, the read/write head of the disc needs to move to the right position"

isnot it??

0

Can you help me out how no. of Seeks are Calculated in nested Loop (nr + br, in worst case)?

0

@ayushsomani

seek time is $2*n_{r}$

One unit for read that relation and another unit is brought the next relation

right?

0

@ayushsomani

Can u tell me , what is meaning of "join selection factor between r(R) and s(S) is more than 0.5.​​​​​​​"

0

Join Selection Factor

The percentage (or fraction) of records in one file that will be joined with records of another file.

Source: Wikipedia

For eg:

Let say we have 2 tables.

A is:

 AID Name Age 12 Arun 60 15 Shreya 24 25 Hari 40 98 Rohit 20 99 Rohit 11

B is:

 BID Phone Area 10 2200 02 99 2100 01

If meet condition is: A ⋈(AId>40∨BId<15) B

Then we have to take 2 tuples from A and 1 tuple from B.

And Join Selection factor is $\frac{1}{2}=0.5$ (Always less than or equal to 1)

4

@ayushsomani

The total number of seeks will be ($b_r+n_r$) - Wikipedia Link Because -

Lets assume we have two memory blocks available in the main memory, now seek time is the time taken to move the read/write head of the disk to the position from where a block is to be read and written to the main memory from the disk.

Now all the blocks of a relation are contiguously stored in the disk so if you seek to a certain block of a relation, then you don't need further seeks to access the subsequent blocks as all are contiguous.

Coming to nested loop join, For every tuple in all the tuples of has to be scanned.

So as said earlier we have two blocks(B1,B2 (Suppose)) available in main memory, then first seek to a block of and transfer it to B1.So number of seeks = 1

Now for the first tuple of in B1 seek to a block of S and bring the block to B2. number of seeks = 1+1=2.Then compare the $1^{st}$ tuple of with the first tuple of S.

Next we have to compare $1^{st}$ tuple of with second tuple of S, so transfer the same block again (assuming the second tuple of S is in that block only) in B2 (This is the disadvantage of nested loop join.Check Block Nested Loop Join) .But now no more seek is required, in-fact for comparing the $1^{st}$ tuple of with all the other tuples of S , we don't need anymore seeks as all blocks are contiguous and can be accessed directly.

After we are done comparing all records of S to the first record of R .We have to compare second record of with all the records of .No seek needed as 2nd tuple of R already present in B1.But now the read/write head of disk is at last block of S,hence a seek is needed to move it to the first block of S. Thus number of seeks=2+1=3.

Thus we can see we need a seek for every tuple of R and we also need a seek if we have to access a block of R, like here if we are done comparing all the records of in B1 , a seek is needed to move the read/write head to next block of as now it is at the last block of S.

Therefore total no. of seeks needed= $n_r+b_r$

0
yes these are part of query optimisation

and yes, they are in syllabus

optimisations in DBMS are as important as

Reducing time complexity in Algorithms

In joining B and B using nested loop method, with A in outer loop two factors are involved.

i>No. of  blocks containing all rows in A should be fetched

ii> No. of Rows A times no of Blocks containing all Rows of B

(in worst case all rows of B are matched with all rows of A).

In above ques, $|R|<|S|$

i> will be less when number of rows in outer table is less since less no of rows will take less no. of blocks

ii> if we keep $R$ in outer loop, no. of rows in $R$ are less and no. of blocks in $S$ are more

If we keep $S$ in outer loop, no of rows in $S$ are more and no. of blocks in $R$ are less.

In ii> block accesses will be multiplication and will come same in both cases.

So, i> will determine no of block accesses

edited
1
Is "join" in question, the only reason for option A?
1
It is  due to join using nested loop  that a> is correct.Is this u were asking or something else?
13

size(r(R)) < size(r(S))  means blocks occupied by R are less?
And if yes then what about the case if blocks are less but rows are more

Consider

r(R) #blocks (br) = 10 and #tuples (nr) = 100

r(S) #blocks (bs) = 20 and #tuples (ns) = 50

Now

1 If R is in outer loop #block Accesss = nr * bs + br = 100*20 + 10 = 2010

2 If S is in outer loop #block Accesss = ns * br + bs = 50*10 + 20 = 520

1
Sir, is there any significance of the 3rd buffer which is for holding intermediate results...
0

Hi @ਜਗਮੀ ji, Very good Point.

ping @Krish__,  @Anu007,  @Ashwin Kulkarni and @reena_kandari ji.

0
II) is only true if we are considering both relations to have same record size.We can't say that expression 2 is equal in general case.
0
Can anyone explain option C and D???
0
Can anyone explain what is the use of 3 buffers of equal size mentioned in question?
0

@ I think size(s(S)) means size of the relation in terms of number of rows, considering the concept that they want to test in this question is - what is the most efficient way of performing a nested loop join? (when relation with lesser number of rows is put in the outer loop).

But it does come off as ambiguous.

0

@ਜਗਮੀ

r(R) #blocks (br) = 10 and #tuples (nr) = 100

r(S) #blocks (bs) = 20 and #tuples (ns) = 50

With this I am getting, 10 tuples/block for relation R and 2.5 tuples/block (could assume spanned strategy) per block. How No. of tuples/block could be different?

We just have to think which table would be in outer loop.
To minimize block accesses, we have to put that table outside having less records coz for each outer record one block access inside will be required

https://gateoverflow.in/3847/gate2005-it-82a

1 vote

In case of nested loop join the no. of block access is - $B_{r} + n_{r}.B_{s}$ where Br is the no. of blocks in relation r & nr is the no. of records or rows in relation r & Bs is the no. of blocks in relation s.

so, it's obvious that to minimize no . of block access we've to minimize the outer table block numbers which is Br in this case.

Ans is option A

edited by
0
Can you explain Options C and D??

The join between r(R) and s(S) which is using nested loop will be as follows,
For each tuple r in R do
For each tuple s in S do
If r and s satisfy the join condition then output the tuple

The above algorithm involves (nr*bs+br ) block transfer and (nr+br ) seeks.
where,
br → no. of blocks in R
bs → no. of blocks in S
nr → no. of tuples in R
To have less block accesses in nr should be less and in question it is given that size(r(R)) < size(s(S)).
To have fewer no. of disk block accesses the relation r(R) should be in outer loop.

Nested loop join Algo: https://en.wikipedia.org/wiki/Nested_loop_join

No. of block transfers, N(B) = $n_r*b_s+b_r$ ; when R is in outer loop and S is in inner loop

Where, $n_r$ are number of tuples in relation R.

$b_s$ and $b_r$ are number of blocks in relation S and R respectively.

size(r(R))<size(s(S)) blocks occupied by R are less ($b_r<b_s$). Also, ($n_r<n_s$). As, less no of blocks acquire less no. of rows.

For minimizing N(B), product term must be minimum and it will be minimum when R is in the outer loop.

For Example: let $b_r=10 , b_s=20; n_r=50, n_s=100$ since size(r(R))<size(s(S))

1. when R is in the Outer loop: N(B) = $n_r*b_s+b_r$ = 50*20+10 = 1010.

2. when S is in the outer loop: N(B) = $n_s*b_r+b_s$ = 100*10+20 = 1020.

So, relation R should be in outer loop.

Option(A).

–1 vote

(D) join selection factor between $r(R)$ and $s(S)$ is less than 0.5.

it does not matter if 10 x 20 or 20 x 10 answer is still the same. If the jon selection factor is less that means we will fewer record in the result relation

## Related questions

1
9.6k views
SQL allows duplicate tuples in relations, and correspondingly defines the multiplicity of tuples in the result of joins. Which one of the following queries always gives the same answer as the nested query shown below: select * from R where a in (select S.a from S) select R.* from R, S where R.a ... R,(select distinct a from S) as S1 where R.a=S1.a select R.* from R,S where R.a=S.a and is unique R
Consider the following schedule S of transactions $T1, T2, T3, T4:$ ... serializable but not recoverable S is not conflict-serializable but is recoverable S is both conflict-serializable and recoverable S is neither conflict-serializable not is it recoverable
Given an instance of the STUDENTS relation as shown as below ... $(\text{StudentName, StudentAge})$ to be a key for this instance, the value $X$ should NOT be equal to______.
The maximum number of superkeys for the relation schema $R(E,F,G,H)$ with $E$ as the key is _____.