1. Home
Data Structure

Data Structure Tutorial: Everything You Need to Know

Learn all about data structures with our comprehensive tutorial. Master the fundamentals and advance your skills in organizing and managing data efficiently.

  • 60
  • 14
right-top-arrow

Tutorial Playlist

58 Lessons
19

Mastering the Maze: An All-Encompassing Guide to the Optimal Binary Search Trees

Updated on 02/08/2024448 Views

Introduction

Consider a huge library with a multitude of shelves, each one holding a treasure chest of books. You’re on a search mission for the exact book, and time is not on your side. What are the strategies you use to narrow down your search? Enter the Optimal Binary Search Trees realm!

What is a Binary Search Tree?

BSTs are data structures that are used for ordered keys (usually numbers or strings) and are organized in a hierarchical manner. Each node in a BST has at most two children: the left one (with a smaller key) and the right one (with a larger key). BSTs enable fast searching, insertion, and deletion, which makes them perfect for ordered data.

An OBST is an advanced version of a BST where the obstacles are more complex. It’s not just about properly arranging keys; it’s about lowering the total cost of searching. Think about the process of calculating the probability for each key such as how often it is used. An OBST is a construction of a tree that minimizes the expected search time on the basis of these probabilities.

Main Features of Binary Search Trees (BSTs)

A Binary Search Tree is a tree-based structure where each node holds a value (like a book title) and adheres to two fundamental properties:

  • Ordered Structure: Each node has value larger than any value in the left subtree and smaller than any value in the right subtree. This built-in logical structure is what our search is based on.
  • Left and Right Subtrees: Each node has a maximum of two child nodes, which are the left child and the right child. The latter can also be BSTs, and the structure becomes hierarchical.

Python Implementation of Binary Search Tree (BST)

Here’s a simple implementation of a BST in Python:

Python

class Node:

    def __init__(self, key):

        self.left = None

        self.right = None

        self.value = key

class BinarySearchTree:

    def __init__(self):

        self.root = None

    def insert(self, key):

        if self.root is None:

            self.root = Node(key)

        else:

            self._insert(self.root, key)

    def _insert(self, root, key):

        if key < root.value:

            if root.left is None:

                root.left = Node(key)

            else:

                self._insert(root.left, key)

        else:

            if root.right is None:

                root.right = Node(key)

            else:

                self._insert(root.right, key)

    def search(self, key):

        return self._search(self.root, key)

    def _search(self, root, key):

        if root is None or root.value == key:

            return root

        if key < root.value:

            return self._search(root.left, key)

        return self._search(root.right, key)

    def delete(self, key):

        self.root = self._delete(self.root, key)

    def _delete(self, root, key):

        if root is None:

            return root

        if key < root.value:

            root.left = self._delete(root.left, key)

        elif key > root.value:

            root.right = self._delete(root.right, key)

        else:

            if root.left is None:

                return root.right

            elif root.right is None:

                return root.left

            temp = self._min_value_node(root.right)

            root.value = temp.value

            root.right = self._delete(root.right, temp.value)

        return root

    def _min_value_node(self, node):

        current = node

        while current.left is not None:

            current = current.left

        return current

# Example Usage

bst = BinarySearchTree()

bst.insert(50)

bst.insert(30)

bst.insert(70)

bst.insert(20)

bst.insert(40)

bst.insert(60)

bst.insert(80)

# Search for a value

print(bst.search(40))  # Output: <__main__.Node object at ...>

# Delete a value

bst.delete(20)

Time and Space Complexity

Insertion:

  • Time Complexity: O(h). Har, h denotes the height of the tree. It can be O(n) in the worst case.
  • Space Complexity: O(1) for iterative and O(h) for recursive because of the call stack.

Search:

  • Time Complexity: O(h); Here, h is the height of the tree. It can be O(n) in the worst case.
  • Space Complexity: O(1) for iterative and O(h) for recursive because of the call stack.

Deletion:

  • Time Complexity: O(h); Here, h is the height of the tree.  It can be O(n) in the worst case.
  • Space Complexity: O(1) for iterative and O(h) for recursive because of the call stack.

BSTs are highly efficient when balanced, but their performance can degrade if they become skewed. These problems can be avoided by using balanced BSTs such as AVL trees or Red-Black trees.

In-order Traversal and Sorted order

In-order traversal is a technique that enables us to visit all nodes in a BST in an ascending order, mimicking how we would traverse a sorted list. It works by recursively visiting the left subtree, the current node, and the right subtree. This fact is ensured by the natural order of BST.

Let us imagine that the BST has the values 5, 10, 13, 2, and 7. In-order traversal would visit: 2, 5, 7, 10, 13, which is the order of the data in the sorted form.

Search Operation in a Binary Search Tree

The best thing about BSTs is that they have a fast search operation. Here's how it works:

  • Beginning with the root node (the highest node in the tree), move downward.
  • Compare the search value with the root's value.
    • If they are equal to each other, it means that you have reached the target element!
    • If the search value is less than the root's value, then move to the left subtree.
    • If the search value is larger than the root's value, continue with the right subtree.
  • Keep doing steps 1 and 2 until you either find the element or you reach a null node (that means the element is not present).

Insertion and Deletion of a Node in a BST

The BSTs are characterized by the fact that the operations of insertion and deletion involve the maintenance of the tree's ordered structure. While a specific optimal binary search tree algorithm can be intricate, here's a simplified overview:

  • Insertion: We steer through the tree, looking for the new element's value as compared with existing nodes. When we are in the right place (according to the ordering rules), we create a new node for the element and insert it into the tree.
  • Deletion: We then identify the node to be removed. Following that, we will perform rearrangements based on the configuration of the node (number of children) to ensure that BST properties are not violated and the node is removed from the tree.

The order complexity for both insertion and deletion operations in an optimal BST is also O(log n).

Search Cost in a BST

The search cost in a BST stands for the number of nodes compared during the operation of the search. In an ideal scenario, the cost of the search should be kept as low as possible to speed up the process. The organization of the BST plays a very important role in the cost of search.

The BST, with a balanced structure, where each level is almost the same length, minimizes the search cost. In an optimal BST, the search operation usually goes along a logarithmic path (log n) to reach the desired element.

Dynamic Programming Approach

We can form the optimal binary search tree using dynamic programming. Here’s a high-level overview:

  • Subproblem Definition: Resolve the problem by dividing it into smaller subproblems. For each subinterval of keys, look for an optimal subtree that is rooted at certain key.
  • Cost Calculation: Find out the anticipated search cost for every subinterval. This implies that we will calculate the probabilities of keys being within that time span.
  • Optimal Choice: Select the root key that gives you the least expected cost for the entire time period.

Consider the case that a set of keys {A, B, C, D} with their use frequencies is given to you. The dynamic programming technique enables us to solve the problem of finding the best possible BST for the whole set of nodes by gradually solving problems for the smaller subsets like {A, B}, {B, C}, and so on. The solutions of these subsolutions are then smartly combined to build the optimal BST for the whole set.

Algorithmic Approaches for OBST Generation

Here, we'll explore two popular optimal binary search tree algorithms: Rodger's formula and the chain matrix order algorithm. We will dissect their usability, evaluate their efficiency, and give tips on choosing the appropriate approach for your requirements.

Rodger's Formula: A Recursive Journey

According to Rodger’s formula, the problem is solved by a recursive approach to building the OBSTs based on the principles of dynamic programming. It calculates the minimum search cost of every possible subproblem and then carefully composes them to form the optimal tree.

Imagine we have a set of keys {A, B, C, D} with their access frequency {3, 2, 2, 1}. Rodger's formula breaks down the problem of finding the optimal BST into smaller subproblems:

A good BST for subsets {A}, {A, B}, . .. , {A, B, C, D} must be found.

The formula uses a dynamic programming table c to record the minimum search cost for each sub-problem. Here's a simplified representation:

       c[1, 1]  c[1, 2]  c[1, 3]  c[1, 4]

        /         /         /         /

       c[2, 1]  c[2, 2]  c[2, 3]  c[2, 4]

      /         /         /         /

     ...        ...        ...        ...

An entry [i, j] indicates the minimum search cost for the optimal BST that contains the keys from the i'th to the j'th elements in the original set. The algorithm iteratively calculates the costs of these using the frequency of access and the subproblem solutions.

Matrix Chain Order Algorithm

The matrix chain order algorithm that is commonly used in dynamic programming problems can also be utilized to generate OBSTs. This method interprets the access frequencies as a matrix, with each element corresponding to the frequency of accessing a particular key.

Use the same set of keys {A, B, C, D} with the frequency of access {3, 2, 2, 1}. We can construct a matrix as follows:

  A    B    C    D

A   3    -    -    -

B   -    2    -    -

C   -    -    2    -

D   -    -    -    1

The matrix chain order algorithm efficiently implements the grouping of keys to reduce search costs. It goes through the diagonal of the matrix and calculates the cost of the amalgamation of the adjacent keys into subtrees.

For example, take the case of grouping keys B and C. Their access frequency would be 2 + 2 = 4. This continues until the entire matrix is inspected and the most suitable groups and the structure of the OBST are found.

Optimization Techniques for Constructing OBSTs

Although there are well-known methods, such as Rodger’s formula and the matrix chain order algorithm, there are more complex methods that can be used with different approaches and optimizations.

1. Hu-Tucker Algorithm

The Hu-Tucker algorithm is one of the less popular but quite effective methods of constructing OBSTs. It aims at reducing the external path length, which is the sum of the depths of all the external nodes or leaves.

Advantages:

  • Efficiency: The Hu-Tucker algorithm builds an OBST in 𝑂(𝑛log⁡𝑛)O(nlogn) time, which is faster than the traditional dynamic programming approaches that take 𝑂(𝑛3)O(n3) time.
  • Greedy Approach: It employs a greedy approach of merging the two nodes with the least weight, similar to Huffman coding, to arrive at a near-optimal solution in a relatively short time.

Limitations:

  • Complex Implementation: The algorithm can be more difficult to implement than other dynamic programming methods due to its complexity.
  • Applicability: It is especially useful for reducing the external path length but can be less efficient for other types of weighted search trees.

2. Weighted Interval Scheduling Algorithm

The weighted interval scheduling algorithm is another advanced technique that can be used to construct OBSTs. This algorithm is generally used for solving scheduling problems but can be used in OBSTs by considering the weights of the nodes as intervals.

Advantages:

  • Dynamic Programming Approach: This algorithm also employs the concept of dynamic programming to solve the problem in the most efficient manner possible by breaking it down into subproblems that have already been solved.
  • Flexibility: It can work with any weighted scenario, which means that it can be used with different input distributions and weight settings.

Limitations:

  • Higher Complexity: In general, the weighted interval scheduling algorithm is faster than the Hu-Tucker algorithm, although it has a higher time complexity.
  • Specialized Use: It may need some changes to be applied to OBST construction because it was developed for interval scheduling problems.

Recap: Your Voyage Through the Labyrinth Unraveled

Congratulations! You have successfully overcome the hardships of optimal binary search trees (OBSTs). We have covered the basic principles of BSTs and also considered search cost, and now we have revealed the mystery of OBSTs—the BSTs with the lowest search cost.

Note, OBSTs use the access frequencies to strategically place data close to the root, which allows the most frequently accessed items to be retrieved faster.

FAQs

1. What is the optimal binary search tree?

The most efficient binary search tree is a data structure used to search for keys that are arranged in a sorted manner. It reduces the average search time by assigning frequently used keys near the root and keys that are used less frequently farther away.

2. What is a perfect binary search tree?

A binary search tree with perfect balance is a special case of an optimal binary search tree where each internal node has two children and all leaves appear at the same depth or level. It is the fastest in lookups, but as a rule, it is impracticable due to its inflexible structure.

3. What is the optimal binary tree frequency?

The optimal probability of accessing every key in the binary search tree is represented by the binary tree frequency. This data is fundamental to the process of developing an efficient binary search tree that results in the least amount of search time.

4. What is the difference between an optimal binary search tree and an AVL tree?

An optimal binary search tree involves minimizing the average search time by arranging keys in an optimal manner, taking into account their access frequencies. On the one hand, an AVL tree is a self-balancing binary search tree that guarantees the tree remains balanced after inserts or deletes, aiming at the height of the tree being logarithmic for fast operations.

5. What is the difference between an optimal binary search tree and a binary tree?

In a binary tree ( a hierarchical data structure ) a node may have at most two children, left and right. It doesn't have any rules for the keys. In contrast, an optimal binary search tree is a particular type of binary tree constructed for searching purposes, where the keys are sorted to minimize search time.

6. Why is binary search the best?

Binary search is optimum as it cuts the search space in half with every comparison. Hence, the time complexity is O(log n) for searching a sorted array or a binary search tree. This efficiency makes it appropriate for use with large datasets.

7. What are the benefits of an optimal binary search tree?

Here are some of the benefits of an optimal binary search tree:

  • Efficient searching: Maintains quick average search times through the efficient organization of keys.
  • Space efficiency: Keys and their respective probabilities are not required to be stored in memory, which reduces the memory space requirement.
  • Versatility: This can be applied in different areas in which searching is the key function.

8. What is the drawback of an optimal binary search tree?

Here are some of the drawbacks of an optimal binary search tree:

  • Construction complexity: The optimal binary search tree construction needs to know key access frequencies, which may not always be available in advance.
  • Sensitivity to changes: Small shifts in the frequencies of the keys may have a considerable impact on the tree, but they can result in poor performance.

9. What are the types of binary search trees?

Types of an optimal binary search tree include:

  1. Binary Search Tree (BST): A binary tree that has a left child node that contains keys that are smaller than the parent node and a right child node that contains keys that are larger than the parent node.
  2. AVL Tree: A self-balancing binary search tree such that the heights of its two child subtrees differ by at most one.
  3. Red-Black Tree: Another balanced binary search tree that has additional properties to maintain balance and improve performance.
  4. Splay Tree: A self-adjusting binary search tree that restructures itself, taking into account the recent access patterns, to improve the search times.
Rohan Vats

Rohan Vats

Passionate about building large scale web apps with delightful experiences. In pursuit of transforming engineers into leaders.

Get Free Career Counselling
form image
+91
*
By clicking, I accept theT&Cand
Privacy Policy
image
right-top-arrowleft-top-arrow

upGrad Learner Support

Talk to our experts. We’re available 24/7.

text

Indian Nationals

1800 210 2020

text

Foreign Nationals

+918045604032

Disclaimer

upGrad does not grant credit; credits are granted, accepted or transferred at the sole discretion of the relevant educational institution offering the diploma or degree. We advise you to enquire further regarding the suitability of this program for your academic, professional requirements and job prospects before enr...