Program#7: Average Height of Binary Search Trees

)

The height of the binary search tree:

Given a binary search tree (referred to as BST in the following), let’s define the number of nodes on a longest path from the root to a node in the BST as the height of the BST. On average, the height of random BSTs of n nodes is O(log n). However, in the worst case, the BST may degenerate into a sorted linked list. In such a case, the height is n.

The maximal amount of time for search, deletion, and insertion in a BST of n nodes is proportional to the height of the BST. Thus the average time for search, deletion, and insertion in a BST of n nodes is of the order of O(log n). However, in the worst case, when a BST degenerates into a sorted linked list, the performance of search, insertion, and deletion, degrades to O(n). In this lab, we would like to conduct experiments to see empirically how well the binary trees actually work.

 

Requirements:

Part I (Coding the height-related method and testing it extensively): Based on the binary search tree class code and the C++ project you have for Programming#6, do the following implementation:

·       Method to recursively calculate the height of a given branch: Add a new private member function, int BranchHeight(TreeNode * ptrNode ), into the binary search tree class you have for the previous programming assignment. This method should return the height of a given branch within the binary search tree rooted at the node pointed to by ptrNode. To implement this method, you should check whether preNode is NULL or not. If it is NULL, it is an empty branch and you return 0 as the height. Otherwise, you recursively call BranchHeight(ptrNode-Left) and BranchHeight(ptrNode->Right) to determine the heights of the two immediately branches, and return one plus the larger one of the heights of the two branches. Note that you should just call BranchHeight(ptrNode-Left) and BranchHeight(ptrNode->Right) once respectively and remember the results for later use to minimize the computational cost; otherwise, your program will perform very slowly in the experiments in Part 2.

·       Method to calculate the height of a tree: Add a new public member function, int Height( ), to the binary search tree class calculate the current height of the tree. For any concrete binary search tree object myTree, a member function call myTree.Height( ) should return the current height of the tree. To implement this method, you can simply call BranchHeight(Root), where Root  is simply private data member of myTree pointing to the root of the underlying binary search tree.

·       Clear method: You should have implemented the Clear method in Programming #6 already. If so, skip this step.  If not yet done, add a new public member function, void Clear( ), to the binary search tree class. To implement this method, you should delete everything from the tree by calling DeleteCompleteTree(Root) and also remember to set Root to NULL. For any concrete binary search tree object myTree, a member function call myTree.Clear( ) will free all the tree nodes used in the tree and make it an empty tree.

·       Additional options for testing tree height and clearing the tree: Enhance the menu provided by the main function by implementing the following additional options: (i) add an option that will calculate and determine the height of the binary search tree by calling the Hight method, (ii) add an option that will clear the tree to an empty tree.

·       Extensive test to make sure the implementation is correct: You should then extensively test the Height method to make sure it works. For example, after clearing the tree and then manually inserting the following series of dates (1, 1, 2006), (1, 2, 2006), (1, 3, 2006), (1, 4, 2006), (1, 5, 2006), (1, 6, 2006), (1, 7 2006) one by one into the binary search tree, the height should be 7.  And after clearing the tree and then manually inserting the following series of dates (1, 4, 2006), (1, 1, 2006), (1, 7, 2006), (1, 2, 2006), (1, 5, 2006), (1, 6, 2006), (1, 3 2006) one by one into the binary search tree, the height should be 4.

 

 

 

Part II (Adding an option of running experiments for measuring the heights of random binary search trees of 16,000 nodes)

First, declare an integer array heightArray of 1000 integers in the main function, i.e. “int heightArray[1000;”. We’ll use it to store the height information up to the 1000 random binary search trees. Enhance the menu provided by the main function by implementing the following additional option of running experiments for measuring the heights of random binary search trees:

·       Set up a loop to through 1000 iterations. On  iteration i (from 0 to 999), we’ll randomly generate a binary search tree of 4000 nodes by (i) calling Clear ( ) first to empty the tree, say, myTree, (ii) set up an inner loop to  go through 4000 iterations and on each iteration randomly set and insert a random date into the myTree, (iii) call myTree.Height( ) to calculate the height of the tree, and (iv) store the height information in the corresponding element of the heightArray[i].

·       Calculate to report the average height of these 1000 trees according to the information now stored in heightArray[i] and store it in an integer variable averageHeight.

·       Calculate and report the deviation from averageHeight according to the information now stored in heightArray[i], i.e. the square root of ( (the sum of (heightArray[i]– averageHeight)2 over all i  in the range of [0,1000-1]) divided by 1000).  Let’s store this number in an integer variable deviation. Note that you should have “#include <cmath> “ in order to call the sqrt function for calculating the square root.

·       Calculate and report the percentage of the 1000 trees that have a height within the range of [averageHeight – 2*deviation, averageHeight + 2*deviation].

·       Calculate and report the percentage of the 1000 trees that have a height within the range of [averageHeight – 3*deviation, averageHeight + 3*deviation].

 

 

Part III (Run experiments and report the results)

Run the experiment option a couple of times and report your findings in the self-evaluation report. Are your findings consistent with the claim that on average, the height of random BSTs of n nodes is O(log n).

 

Submission:

·       Send your source code and self-evaluation to the TA and report in your progress report your empirical findings from the experiments in Part III.