MSSQLCity.Com - All about MS SQL
     
About Us  
SSWUG Articles  
Articles  
Administering  
Comparison  
General  
Know How  
Replication  
Tuning  
Undocumented  
UDF  
SQL 6.5  
FAQ  
Scripts  
Tips  
Test Exams  
Advertise  
Download  
History  
Search  
Traffic  
Related Links  
     
Your button logo
Add to Favorites
 
     
 

Microsoft SQL Server Hash Joins

Alexander Chigrik
chigrik@mssqlcity.com


Microsoft SQL Server 7.0/2000 supports three types of join operations:

  • Nested-Loop joins
  • Merge joins
  • Hash joins


  • In this article, I want to tell you about Hash joins, what kinds of Hash joins exist, and when SQL Server uses this kind of join operation.

    The Hash join will be used, if there are no adequate indexes on the joined columns. This is a worst situation. In this case, hash table will be created. Hash join is most efficient when one of the tables is significantly differ in size than another one.

    The query optimizer makes a Hash join in two phases: build and probe. So, Hash join has two inputs: the build input and the probe input.

    On the build phase, hash table will be created by scanning each value in the build input and applying the hashing algorithm to the key.

    Let me to describe Hash join on the example with two tables.
    Look at this example:

    if object_id('dbo.Table1') is not null drop table Table1
    GO
    CREATE TABLE Table1 (id int, Name char(10))
    GO
    if object_id('dbo.Table2') is not null drop table Table2
    GO
    CREATE TABLE Table1 (id int, Name char(20))
    GO
    
    DECLARE @i int
    SELECT @i = 1
    WHILE @i < 1000
      BEGIN
        INSERT INTO Table1 VALUES (@i, LTRIM(str(@i)))
        SELECT @i = @i + 1
      END
    GO
    
    DECLARE @i int
    SELECT @i = 1
    WHILE @i < 1000
      BEGIN
        INSERT INTO Table2 VALUES (@i, LTRIM(str(@i)))
        SELECT @i = @i + 1
      END
    GO
    
    SET SHOWPLAN_TEXT ON
    GO
    SELECT a.Name FROM Table1 a INNER JOIN Table2 b
      ON a.Name = b.Name
    GO
    SET SHOWPLAN_TEXT OFF
    GO
    
    This is the result:

    
    StmtText
    -------------------------------------------------------------------------------------------
    |--Hash Match(Inner Join, HASH:([a].[Name])=([b].[Name]), RESIDUAL:([a].[Name]=[b].[Name]))
         |--Table Scan(OBJECT:([pubs].[dbo].[Table1] AS [a]))
         |--Table Scan(OBJECT:([pubs].[dbo].[Table2] AS [b]))
    
    The smaller table will be build input, the other - probe input. Field Name (column that joins the tables) is called hash key. The hash table consists of linked lists called hash buckets. The result of using a hash function on a hash key is called hash value. Hash value and RID (row identifier) will be placed into hash table.

    Hash value must be smaller than hash key. So, query processor economies on the size of the hash table. The real example of hashing is notebook. You can open notebook on the appropriate letter and scan all surnames on this letter to find necessary ones. So, notebook is the example of hash table, and pages on the appropriate letter are the example of hash bucket.

    During the probe phase, the entire probe input is scanned, and for each probe row computes the same hash value on the hash key to find any matches in the corresponding hash bucket.

    There are two main kinds of Hash join:

  • In-memory Hash join
  • Grace Hash join


  • In-memory Hash Join will be used if entire build input can be placed into memory.

    Grace Hash join will be used if your server has not enough memory to hold the entire build input. In this case, query processor proceeds Hash Join in several steps (hash table will be divided into multiple partitions and relevant partition will be loaded as need).

    Because the query optimizer usually selects the best execution plan for a given select statement, it is not necessary to enforce the desirable join type, but sometimes it can be useful. You can enforce the desirable join type by using the OPTION clause.

    This is the example to enforce Hash join:

    USE pubs
    GO
    SET SHOWPLAN_TEXT ON
    GO
    SELECT a.au_id FROM authors a JOIN titleauthor b
       ON a.au_id = b.au_id OPTION (HASH JOIN)
    GO
    SET SHOWPLAN_TEXT OFF
    GO
    
    This is the result:

    
    StmtText
    -----------------------------------------------------------------------------------------------
    SELECT a.au_id FROM authors a JOIN titleauthor b
       ON a.au_id = b.au_id OPTION (HASH JOIN)
    
    (1 row(s) affected)
    
    StmtText
    -----------------------------------------------------------------------------------------------
    |--Hash Match(Inner Join, HASH:([a].[au_id])=([b].[au_id]), RESIDUAL:([a].[au_id]=[b].[au_id]))
         |--Index Scan(OBJECT:([pubs].[dbo].[authors].[aunmind] AS [a]))
         |--Index Scan(OBJECT:([pubs].[dbo].[titleauthor].[titleidind] AS [b]))
    
    (3 row(s) affected)
    

     

     
    Visit The SQL Server Worldwide User's Group for all the latest news and information about SQL Server, Oracle, DB2 and XML for developers and administrators.

    (c) 1997, 2005 Bits on the Wire, Inc