Binary Search Trees: Self-Balancing Data Structures
Master binary search trees through interactive visualizations of insertions, deletions, rotations, and self-balancing algorithms like AVL and Red-Black trees.
Best viewed on desktop for optimal interactive experience
Understanding Binary Search Trees
Binary Search Trees (BSTs) are fundamental data structures that maintain sorted data in a hierarchical structure, enabling efficient search, insertion, and deletion operations. Self-balancing variants like AVL and Red-Black trees ensure O(log n) performance even in the worst case by maintaining balanced heights through rotations.
The elegance of BSTs lies in their recursive structure and the invariant that for each node, all values in the left subtree are smaller, and all values in the right subtree are larger. This property enables binary search without requiring a contiguous array.
Interactive Binary Search Tree Explorer
Visualize BST operations, rotations, and self-balancing algorithms in action:
Tree Visualization
Binary Search Tree
- • No self-balancing
- • O(n) worst case
- • Simple implementation
- • Can degenerate to list
AVL Tree
- • Height balanced
- • O(log n) guaranteed
- • More rotations
- • Balance factor ≤ 1
Red-Black Tree
- • Color-based balancing
- • O(log n) guaranteed
- • Fewer rotations
- • Used in std::map
Binary Search Tree Fundamentals
BST Property
For every node in a BST:
- All nodes in the left subtree have values < node's value
- All nodes in the right subtree have values > node's value
- Both subtrees are also BSTs
Basic Node Structure
class TreeNode: def __init__(self, val): self.val = val self.left = None self.right = None self.height = 1 # For AVL trees self.color = 'RED' # For Red-Black trees self.parent = None # For easier rotations
Core Operations
1. Search
def search(root, key): # Base cases: root is null or key is at root if root is None or root.val == key: return root # Key is greater than root's value if key > root.val: return search(root.right, key) # Key is smaller than root's value return search(root.left, key) # Iterative version (often more efficient) def search_iterative(root, key): current = root while current and current.val != key: if key < current.val: current = current.left else: current = current.right return current
Time Complexity:
- Average: O(log n)
- Worst (unbalanced): O(n)
2. Insertion
def insert(root, key): # Base case: empty tree if not root: return TreeNode(key) # Recursive insertion if key < root.val: root.left = insert(root.left, key) elif key > root.val: root.right = insert(root.right, key) return root # With parent pointers def insert_with_parent(root, key): if not root: return TreeNode(key) parent = None current = root # Find insertion position while current: parent = current if key < current.val: current = current.left elif key > current.val: current = current.right else: return root # Duplicate # Insert new node new_node = TreeNode(key) new_node.parent = parent if key < parent.val: parent.left = new_node else: parent.right = new_node return root
3. Deletion
Three cases to handle:
- Leaf node: Simply remove
- One child: Replace with child
- Two children: Replace with inorder successor/predecessor
def delete(root, key): if not root: return root # Find node to delete if key < root.val: root.left = delete(root.left, key) elif key > root.val: root.right = delete(root.right, key) else: # Node with only one child or no child if not root.left: return root.right elif not root.right: return root.left # Node with two children # Get inorder successor (smallest in right subtree) successor = find_min(root.right) # Copy successor's value to this node root.val = successor.val # Delete the successor root.right = delete(root.right, successor.val) return root def find_min(node): while node.left: node = node.left return node
Tree Traversals
1. Inorder (Left, Root, Right)
Gives sorted sequence for BST
def inorder(root): if root: inorder(root.left) print(root.val) inorder(root.right) # Iterative using stack def inorder_iterative(root): result = [] stack = [] current = root while stack or current: if current: stack.append(current) current = current.left else: current = stack.pop() result.append(current.val) current = current.right return result
2. Preorder (Root, Left, Right)
Useful for tree copying
def preorder(root): if root: print(root.val) preorder(root.left) preorder(root.right)
3. Postorder (Left, Right, Root)
Useful for tree deletion
def postorder(root): if root: postorder(root.left) postorder(root.right) print(root.val)
4. Level Order (BFS)
from collections import deque def level_order(root): if not root: return [] result = [] queue = deque([root]) while queue: level_size = len(queue) current_level = [] for _ in range(level_size): node = queue.popleft() current_level.append(node.val) if node.left: queue.append(node.left) if node.right: queue.append(node.right) result.append(current_level) return result
Tree Rotations
Rotations are fundamental for maintaining balance in self-balancing trees.
Right Rotation
def rotate_right(y): """ y x / \ / \ x C --> A y / \ / \ A B B C """ x = y.left B = x.right # Perform rotation x.right = y y.left = B # Update heights (for AVL) y.height = 1 + max(height(y.left), height(y.right)) x.height = 1 + max(height(x.left), height(x.right)) return x # New root
Left Rotation
def rotate_left(x): """ x y / \ / \ A y --> x C / \ / \ B C A B """ y = x.right B = y.left # Perform rotation y.left = x x.right = B # Update heights x.height = 1 + max(height(x.left), height(x.right)) y.height = 1 + max(height(y.left), height(y.right)) return y # New root
AVL Trees
AVL trees maintain balance factor = height(left) - height(right) ∈ 1
AVL Insertion
class AVLTree: def insert(self, root, key): # Step 1: Standard BST insertion if not root: return TreeNode(key) if key < root.val: root.left = self.insert(root.left, key) elif key > root.val: root.right = self.insert(root.right, key) else: return root # Duplicate # Step 2: Update height root.height = 1 + max(self.height(root.left), self.height(root.right)) # Step 3: Get balance factor balance = self.get_balance(root) # Step 4: If unbalanced, there are 4 cases # Left Left Case if balance > 1 and key < root.left.val: return self.rotate_right(root) # Right Right Case if balance < -1 and key > root.right.val: return self.rotate_left(root) # Left Right Case if balance > 1 and key > root.left.val: root.left = self.rotate_left(root.left) return self.rotate_right(root) # Right Left Case if balance < -1 and key < root.right.val: root.right = self.rotate_right(root.right) return self.rotate_left(root) return root def height(self, node): return node.height if node else 0 def get_balance(self, node): return self.height(node.left) - self.height(node.right) if node else 0
AVL Deletion
def delete_avl(self, root, key): # Step 1: Standard BST deletion if not root: return root if key < root.val: root.left = self.delete_avl(root.left, key) elif key > root.val: root.right = self.delete_avl(root.right, key) else: # Node with one child or no child if not root.left: return root.right elif not root.right: return root.left # Node with two children temp = self.find_min(root.right) root.val = temp.val root.right = self.delete_avl(root.right, temp.val) if not root: return root # Step 2: Update height root.height = 1 + max(self.height(root.left), self.height(root.right)) # Step 3: Rebalance balance = self.get_balance(root) # Left Left if balance > 1 and self.get_balance(root.left) >= 0: return self.rotate_right(root) # Left Right if balance > 1 and self.get_balance(root.left) < 0: root.left = self.rotate_left(root.left) return self.rotate_right(root) # Right Right if balance < -1 and self.get_balance(root.right) <= 0: return self.rotate_left(root) # Right Left if balance < -1 and self.get_balance(root.right) > 0: root.right = self.rotate_right(root.right) return self.rotate_left(root) return root
Red-Black Trees
Red-Black trees ensure balance through coloring rules:
- Every node is either RED or BLACK
- Root is BLACK
- All leaves (NIL) are BLACK
- RED nodes have BLACK children
- All paths from root to leaves have same number of BLACK nodes
Red-Black Properties
class RBNode: def __init__(self, val): self.val = val self.left = None self.right = None self.parent = None self.color = 'RED' # New nodes are RED class RedBlackTree: def __init__(self): self.NIL = RBNode(None) self.NIL.color = 'BLACK' self.root = self.NIL
Red-Black Insertion
def insert_rb(self, key): node = RBNode(key) node.left = self.NIL node.right = self.NIL parent = None current = self.root # Find insertion position while current != self.NIL: parent = current if node.val < current.val: current = current.left else: current = current.right node.parent = parent # Insert node if parent is None: self.root = node elif node.val < parent.val: parent.left = node else: parent.right = node # Fix Red-Black properties self.insert_fixup(node) def insert_fixup(self, node): while node.parent and node.parent.color == 'RED': if node.parent == node.parent.parent.left: uncle = node.parent.parent.right # Case 1: Uncle is RED if uncle.color == 'RED': node.parent.color = 'BLACK' uncle.color = 'BLACK' node.parent.parent.color = 'RED' node = node.parent.parent else: # Case 2: Uncle is BLACK, node is right child if node == node.parent.right: node = node.parent self.rotate_left(node) # Case 3: Uncle is BLACK, node is left child node.parent.color = 'BLACK' node.parent.parent.color = 'RED' self.rotate_right(node.parent.parent) else: # Mirror cases uncle = node.parent.parent.left if uncle.color == 'RED': node.parent.color = 'BLACK' uncle.color = 'BLACK' node.parent.parent.color = 'RED' node = node.parent.parent else: if node == node.parent.left: node = node.parent self.rotate_right(node) node.parent.color = 'BLACK' node.parent.parent.color = 'RED' self.rotate_left(node.parent.parent) self.root.color = 'BLACK'
Performance Comparison
Operation | Array (sorted) | Linked List | BST (avg) | BST (worst) | AVL/RB Tree |
---|---|---|---|---|---|
Search | O(log n) | O(n) | O(log n) | O(n) | O(log n) |
Insert | O(n) | O(1)* | O(log n) | O(n) | O(log n) |
Delete | O(n) | O(n) | O(log n) | O(n) | O(log n) |
Min/Max | O(1) | O(n) | O(log n) | O(n) | O(log n) |
Predecessor | O(1) | O(n) | O(log n) | O(n) | O(log n) |
*At known position
Advanced Operations
1. Finding kth Smallest Element
Augment nodes with subtree size:
class NodeWithSize: def __init__(self, val): self.val = val self.left = None self.right = None self.size = 1 # Size of subtree def kth_smallest(root, k): if not root: return None left_size = root.left.size if root.left else 0 if k <= left_size: return kth_smallest(root.left, k) elif k == left_size + 1: return root.val else: return kth_smallest(root.right, k - left_size - 1)
2. Range Queries
Find all elements in range [low, high]:
def range_query(root, low, high, result=[]): if not root: return # If root is in range, explore both subtrees if low <= root.val <= high: range_query(root.left, low, high, result) result.append(root.val) range_query(root.right, low, high, result) # If root is greater than high, explore left elif root.val > high: range_query(root.left, low, high, result) # If root is less than low, explore right else: range_query(root.right, low, high, result)
3. Lowest Common Ancestor
def lca(root, p, q): # Both nodes are in left subtree if p < root.val and q < root.val: return lca(root.left, p, q) # Both nodes are in right subtree if p > root.val and q > root.val: return lca(root.right, p, q) # Nodes are on different sides or one is the root return root
Tree Validation
Is Valid BST?
def is_valid_bst(root, min_val=float('-inf'), max_val=float('inf')): if not root: return True if root.val <= min_val or root.val >= max_val: return False return (is_valid_bst(root.left, min_val, root.val) and is_valid_bst(root.right, root.val, max_val))
Is Balanced?
def is_balanced(root): def check_height(node): if not node: return 0 left_height = check_height(node.left) if left_height == -1: return -1 right_height = check_height(node.right) if right_height == -1: return -1 if abs(left_height - right_height) > 1: return -1 return max(left_height, right_height) + 1 return check_height(root) != -1
Real-World Applications
1. Database Indexes
B-trees and B+ trees (generalizations of BSTs):
-- B-tree index for range queries CREATE INDEX idx_price ON products(price); -- Composite index (like multi-dimensional BST) CREATE INDEX idx_category_price ON products(category, price);
2. File Systems
Directory structures and file allocation:
class FileNode: def __init__(self, name, is_directory=False): self.name = name self.is_directory = is_directory self.children = {} # For directories self.size = 0 self.modified_time = time.time()
3. Expression Trees
Evaluate mathematical expressions:
class ExprNode: def __init__(self, val, op=None): self.val = val self.op = op self.left = None self.right = None def evaluate(self): if not self.op: # Leaf node return self.val left_val = self.left.evaluate() right_val = self.right.evaluate() if self.op == '+': return left_val + right_val elif self.op == '-': return left_val - right_val elif self.op == '*': return left_val * right_val elif self.op == '/': return left_val / right_val
Common Pitfalls
- Not handling duplicates: Decide whether to allow duplicates
- Forgetting to update parent pointers: Critical for some operations
- Incorrect rotation logic: Test thoroughly with edge cases
- Not maintaining invariants: Always verify BST/AVL/RB properties
- Memory leaks: Properly deallocate deleted nodes
Optimization Techniques
1. Path Compression
Cache frequently accessed paths:
class CachedBST: def __init__(self): self.root = None self.cache = {} # key -> node mapping def search_with_cache(self, key): if key in self.cache: return self.cache[key] node = self.search(self.root, key) if node: self.cache[key] = node return node
2. Lazy Deletion
Mark nodes as deleted instead of removing:
class LazyNode: def __init__(self, val): self.val = val self.left = None self.right = None self.deleted = False def lazy_delete(root, key): node = search(root, key) if node: node.deleted = True
3. Threaded Trees
Add threads for efficient traversal:
class ThreadedNode: def __init__(self, val): self.val = val self.left = None self.right = None self.left_thread = False # True if left points to predecessor self.right_thread = False # True if right points to successor
Related Concepts
Understanding BSTs connects to:
- Hash Tables: Alternative for unordered data
- B-trees: Generalization for external storage
- Heaps: Tree-based priority queues
- Tries: Prefix trees for strings
- Segment Trees: Range query optimization
Conclusion
Binary Search Trees elegantly combine the efficiency of binary search with the flexibility of linked structures. While basic BSTs can degenerate to linked lists, self-balancing variants like AVL and Red-Black trees guarantee logarithmic performance through careful maintenance of tree properties. Understanding BSTs and their rotations provides the foundation for many advanced data structures and algorithms in computer science.