Kinesia Online Course
Data Structures
Kinesia LLC, 2003

    1. Basic Concepts
    2. The Standard Library Container Class
    3. Vectors and Component Reuse
    4. Lists: A Dynamic Data Structure
    5. Stacks and Queues
     
    6. Sets and Multisets
    7. Trees
    8. Hashing

    
    
    
    

    Hashing

    1. Introduction

      number of possible keys > amount of space available for table

      e.g. keys are words with 8 letters
      # of possible keys = 268 ~ 2 x 1011 ~ 200 Giga

      names → table sparse

      SMITH
      CHEN

      hash function which transforms a key to an index may map different keys to same index → collision

    2. Hash Functions

      Hashing functions must be

      1. fast to compute
      2. evenely distributed the keys across the range of indices
      a) Truncation ignore part of the key
                                                                 

      e.g.
      6 2 5 3 8 1 9 4
      yeilds 3 1 4

      fast but may not be even

      b) folding

      divide and combine

      e.g.     625 | 381 | 94

        625
        381
        94
        --------
        1100
        may need to truncate to 100

      spread the keys better

      c) Modular Arithmetic

    3. convert key to integer X

    4. take the modulo of it with respect to table size n ( i.e. range of indices )
        i.e. i = x mod n ( e.g. if x = 20, n = 17, then i = 20 mod 17 = 3 )

    5. modules need to be a prime in order to spread the keys evenly
    6. An example: The calendar problem

      R -- list contains 3-letter month name and number of days

    7. R = { ( JAN, 31 ), ( FEB, 28 ), ( MAR, 31 ), ( APR, 30 ),
        ( MAY, 31 ), ( JUN, 30 ), ( JUL, 31 ), ( AUG, 31 ),
        ( SEP, 30 ), ( OCT, 31 ), ( NOV, 30 ), ( DEC, 31 ) }

    8. month -- key

    9. given a key K, we want to find the # of days in the month
      e.g. K = 'AUG', &rarr 31

    10. Searching methods

      • linear search
          average # of comparisons N = ( 12 + 1 ) / 2 = 6.5

      • binary search
          N 3.1
          but R must be sorted ( time consuming ).
          insert, delete, update -- expensive

      • Using hash functions

        h (KEY ) -- convert first and second letters to ASCII codes, add them, and mod 13

        h(JAN) = ( ASCII ( J ) + ASCII ( A ) ) mod 13
          = ( 74 + 65 ) mod 13
          = 139 mod 13
          = 9
        Thus record with key JAN goes to location 9

        h(AUG) = ( 65 + 85 ) mod 13
          = 150 mod 13
          = 7
        Thus record goes to location 7

        h(FEB) = ( 70 + 69 ) mod 13
          = 139 mod 13
          = 9
        collide with key JAN
        so put that into next available empty position

        keys map to same integer are called synonyms
        ( JUN, JUL, OCT ), ( AUG, DEC ), ( JAN, FEB, SEP )

        index iKEYh(KEY) # comparisons
        0MAY122
        1NOV11
        2APR21
        3JUN31
        4JUL32
        5OCT33
        6
        7AUG71
        8DEC72
        9JAN91
        10FEB92
        11SEP93
        12MAR121

        Average # of comparisons 1.67

        it depends on table density and quality of hashing function rather than size