Kinesia Online Course
Data Structures
Kinesia LLC, 2003

    1. Basic Concepts
    2. The Standard Library Container Class
    3. Vectors and Component Reuse
    4. Lists: A Dynamic Data Structure
    5. Stacks and Queues
     
    6. Sets and Multisets
    7. Trees
    8. Hashing

    
    The world would take people out of the slums.
    Christ takes the slums out of people, and then
    they take themselves out of the slums.
    				Ezra Taft Benson
    
    

    Trees

    1. Introduction

      Definition

      A tree is a finite set which consists of either

    2. nothing at all, which we call the empty tree,

      or

    3. a specially designated node called root, along with a ( possibly empty ) collection {T1,T2, ..., Tn} subtrees, each of which itself is a tree.
    4. A node that has non-empty subtrees is the parent of the roots of the subtrees; the roots of the subtrees are the children of the node

    5. A node without any child is a leaf ( or terminal node ); others are called internal nodes

    6. children of same parent are called siblings

    7. degree of a node is the number of subtrees associated with the node

    8. degree of a tree is the maximum allowed degree of any node in the tree

    9. ancestors of a node are all the nodes along the path from the root to the node

    10. descendants of a node are all the nodes that are in its subtrees

    11. height ( or depth ) of a tree is the maximum level of any node in the tree
    12. representations

    13. use parenthesis
      root nodes first
      e.g. ( A ( B ( E ( K, L ), F ), C ( G ), D ( H ( M ), I, J ) ) )

    14. use links

        data     link1     link2     ....     linkn  

      n siblings

    15. k-ary tree -- tree with degree k

      max # of nodes n ≤ 1 + k + k2 + ... + kh-1 = kh - 1
      –––––
      k - 1
      kh - 1

      But   h ≤ n

      Thus   h ≤ n ≤ kh - 1

      ordered tree

      b is left child of a
      c is right sibling of b
      d is right sibling of c

      T1 = T2 = T3 if we consider general tree

      T1 ≠ T2 ≠ T3 if we consider ordered tree

    16. Binary Trees

      each node has at most 2 children → 2-ary tree

      2-ary ordered tree = binary tree

      Definition

      A binary tree is made up of nodes, which are elements of some set. A binary tree consists of either

    17. notheing at all ( the empty tree )

      or

    18. a node, called the root of the binary tree, along with a left subtree and a right subtree, both of which are binary trees
    19. max. possible # of nodes of a binary tree of height h

      nmax = 1 + 2 + 22 + ... + 2h-1 = 2h - 1 Thus h ≤ n < 2h

      A complete binary tree is a special case of a binary tree, in which all the levels, except perhaps the last, are full; while on the last level, any missing nodes are to the right of all the nodes that are present.


      A complete binary tree: the only ``missing'' entries can be on the last row.

      A full binary tree is a binary tree in which all of the leaves are on the same level and every non leaf node has 2 children.

    20. Huffman Tree

      Huffman tree is a binary tree with a weight associated with each node.

      Used in data encoding

      Huffman codes

      ASCII code uses 8 bits to represent each character; unicode uses 16 bits; they are fixed-length codes; simple but inefficient

      A code has the prefix property if no character code is the prefix, or start of the the code for another character.

    21. do codes with the prefix property exist; and if so
    22. is there a ``best'' one to use?
    23. Example

      Symbol	Code 1 	Code 2
      a 	000 	000
      b 	001 	11
      c 	010 	01
      d 	011 	001
      e 	100 	10
      

      Code 1 is fixed-length, code 2 has the prefix property ( note that a fixed-length code always has the prefix property )

      Code 2 decodes the string "0000100111" uniquely to "abcd"

      Huffman codes

      Consider

      Letter   Frequency   Code 1   Code 2
      a 0.35 000 00
      b 0.20 001 10
      c 0.20 010 011
      d 0.15 011 010
      e 0.10 100 110
      Average lengths: Code 1 : 3 bits

      Code 2 : L = 0.35 x 2 + 0.20 x 2 + 0.20 x 3 + 0.15 x 3 + 0.10 x 3 = 2.45 ( bits )

      Can we do better?

      Huffman tree:

      merge two trees whose roots have lowest frequencies



      symbol     Codeword Length
      a     102
      b     002
      c     012
      d     1103
      e     1113

      L = 2.25 ( bits )

      Another Example:

      Encode

      A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS

      Symbol frequencies:

      I A B	D M E	O C F	G S T	L R N	P U	
      6 3 3	2 4 5	3 1 1	2 4 3	2 2 5	1 2 11
      

    24. Binary Tree properties

      Suppose

      n0 = number of leaves ( nodes of degree 0 )
      n1 = number of nodes of degree 1
      n2 = number of nodes of degree 2

      Lemma:

      For a non-empty binary tree,
        n0 = n2 + 1

      Proof:

      Total number of nodes is
        n = n0 + n1 + n2

      Except the root, a node always has a branch leading to it, thus

      Total number of branches is

        nB = n - 1

      But all branches stem from nodes of degree 1 or 2, so

        nB = n1 + 2n2
      or
        n - 1 = n1 + 2n2
      yeilding
        n0 + n1 + n2 - 1 = n1 + 2n2

      From which we have

        n0 = n2 + 1

      Height-balanced binary tree
    25. the difference in heights of right and left subtrees is never larger than 1
    26. Largest number of nodes in a balanced binary tree of height h is 2h - 1
    27. What is the minimum number M?
    28. Note here height ( h ) ← height + 1
      For h = 1, M1 = 1

      For h = 2, M2 = 2

      For h = 3, M3 = M1 + M2 + 1 = 4

      For h = 4, M4 = M2 + M3 + 1 = 7

      In general,

        Mh = Mh-1 + Mh-2 + 1

      Basically, this is a Fibonacci sequence ( fn = fn-1 + fn-2 )

       

      When h becomes large,
        Mh  1 
        5
        1
        2
        ( 1 + 5) h   ---------- ( 1 )

      Taking log of equation (1), we have

        h 1.44 log Mh
      which is the 'worst' case

      Given n nodes, the 'worst' h 1.44 log n

      or the height of a balanced binary tree is O( log n )

    29. Binary tree representations

      Array Representations

      A binary tree with n nodes is complete iff its nodes corresponding to nodes numbered from 1 to n in the full binary tree

      When labeled nodes according to positions of a full binary tree, we have

    30. left_child( i ) is at 2i
    31. right_child( i ) is at 2i + 1
    32. parent of node i = ?
      • parent of ( 2i ) is i ( = 2i / 2 )
      • parent of ( 2i + 1 ) is i ( = int ( ( 2i + 1 ) / 2) )
      • thus parent of ( i ) is at |_i/2 _|
    33. ....

      Linked Representation

    34. Binary Tree Traversal

      preorder

    35. visit the root
    36. traverse left subtree
    37. traverse right subtree
    38. inorder

    39. traverse left subtree
    40. visit the root
    41. traverse right subtree
    42. postorder

    43. traverse left subtree
    44. traverse right subtree
    45. visit the root
    46. Binary Search Trees

    47. binary tree is an ordered tree

    48. key in data → BST

    49. left < right;

      keys in left subtree ≤ root ≤ right subtree

    50. The rightmost node of the left subtree is smaller than the leftmost node of the right subtree

    51. Delete a node from a BST and retain the resulting tree as a BST
      • delete a node with no child -- simply remove the node

      • delete a node with a single child -- let parent point to the "grandchild" and remove the node

      • delete a node with two children -- we replace the node with the largest ( rightmost ) element in its left subtree or the smallest ( leftmost ) element in its right subtree
      ....
    52. analysis
      h ≤ n ≤ 2h - 1

      Search is O ( h )

      Given n, if elements are random, tree is balanced

        n 2h     or   h log n
      and serach is O ( log n )
      If tree is already sorted, the binary search tree is very unbalanced,
      and

        h = n

        search is O( n )

      Search becomes a sequential search

      insert : O( n ) or O( log n )

      Application: tree sort

      Given a list ( e.g. 5, 7, 8 , 2 )

    53. build BST by insert
    54. sort it by inorder traversal ( left, root, right )
    55. Suppose data are quite random with size n

    56. build: each insert is O( log n ), totally n inserts
      So build is O ( n log n )
    57. traversal is O ( n )
    58. thus, treesort is O ( n log n )
    59. if data are nearly sorted, tree is unbalanced

      tree sort is O ( n 2 )

    60. AVL Trees

    61. an AVL tree is a balanced binary search tree
    62. insertion or deletion may make the tree unbalanced; we need to rebalance it
    63. rebalance by rotations
    64. Two cases:

      Case 1

    65. right subtree of y has height larger than or equal to that of the left subtree of y
    66. can be restored by "rotate left"

    67. A < x < B < y < C

      A < x < B < y < C

      //psuedo-code
      void rotate_left ( node *x )
      {
        node *y;
      
        if ( p->right != NULL ) {
          y = x->right;
          x->right = y->left;
          y -> left = x;	//x becomes left child of y
          x = y;		//Y becomes new root of whole subtree
        }
      }
      

      ( If mirror is considered, need to "rotate-right" )

      Case 2

    68. left subtree of y is being too high

      A < x < B < y < C

    69. expand B into subtrees B' and B"

    70. Rotate-right ( y ), we get

    71. Then rotat-left ( x )
    72. p>
    73. Red/Black Tree
      1. A node is either red or black.
      2. The root is black.
      3. All leaves are black. (This includes the NIL children)
      4. Both children of every red node are black. (i.e. Every red node must have a black parent.)
      5. All paths from any given node to its leaf nodes contain the same number of black nodes.

    74. Priority Queues and heaps

      priority queue -- optimal, leftist binary tree satisfying the heap condition

    75. optimal tree -- all nodes except possibly the lowest ones have two children
    76. leftist tree -- nodes on a level are all as far to the left as possible
    77. heap condition -- the key value of each node is less than or equal to the values of its children ( min tree )
    78. .....

      can be represented by array ( note: parent of i = |_i/2_| )

    79. last element in array is the rightmost element at the lowest level of tree
    80. which element has the minimal key?
    81. insert node with key 1

      delete_min

    82. root has minimum element
    83. delete root, the replace it by "last" ( rightmost element in tree )
    84. swap to retain order
    85. e.g. delete node with key 2 in the above figure

      insert or delete ~ O ( log n )

      heap sort

      delete_min continuously

      O( n log n )

    86. Huffman Tree Implementation using Priority Queue

      1. each node associates with a weight; the lower the weight, the higher the priority
      2. insert the roots ( nodes ) of all trees of the forest into a priority queue
      3. delete two nodes from the priority queue; the two deleted nodes always have the least weights ( highest priorities )
      4. merge the nodes to form a new root with combined weights; insert the new root back to the priority queue
      5. repeat 3 - 4 until the priority queue is emplty