Python Garbage Collection
How Python handles memory cleanup with reference counting and cyclic GC
Best viewed on desktop for optimal interactive experience
Python's Dual Approach
Python uses two complementary mechanisms for memory management:
- Reference Counting (primary, immediate)
- Generational Garbage Collection (secondary, for cycles)
Garbage Collection
Generational Garbage Collector
Generation 0
New objectsGeneration 1
Survived 1 collectionGeneration 2
Long-lived objectsCircular Reference Detection
A
→B
B
→C
C
→A
D
→E
E
→D
Objects A-B-C form a reachable cycle. Objects D-E form an unreachable cycle.
GC Configuration
import gc
gc.get_threshold()
# (700, 10, 10)
gc.set_threshold(800, 20, 20)
gc.collect() # Manual collection
Best Practices
- • Break circular references explicitly
- • Use weak references when appropriate
- • Monitor with gc.get_stats()
- • Profile with gc.set_debug()
Reference Counting
How It Works
import sys # Object created, refcount = 1 x = [] # Reference added, refcount = 2 y = x # Check refcount (adds temporary reference) print(sys.getrefcount(x)) # 3 # Remove reference, refcount = 1 del y # Remove last reference, object freed del x # Object deallocated immediately
Advantages
- Immediate cleanup
- Predictable
- Simple implementation
Disadvantages
- Cannot handle circular references
- Overhead on every operation
- Not thread-safe (needs GIL)
Circular References Problem
Creating Circular References
# Simple circular reference class Node: def __init__(self, value): self.value = value self.next = None # Create cycle a = Node(1) b = Node(2) a.next = b b.next = a # Circular reference! # Even after deleting references del a, b # Objects still exist in memory (refcount never reaches 0)
Real-World Examples
# Parent-child relationships class Parent: def __init__(self): self.children = [] def add_child(self, child): self.children.append(child) child.parent = self # Circular reference # Event handlers class EventEmitter: def __init__(self): self.handlers = [] def on(self, handler): self.handlers.append(handler) class Handler: def __init__(self, emitter): self.emitter = emitter emitter.on(self.handle) # Circular reference def handle(self, event): pass
Generational Garbage Collection
The Three Generations
import gc # Get current thresholds print(gc.get_threshold()) # (700, 10, 10) # Generation 0: New objects, collected frequently # Generation 1: Survived 1 collection # Generation 2: Long-lived objects, collected rarely
Collection Algorithm
# Simplified collection process def collect_generation(generation): # 1. Mark phase reachable = set() for root in get_root_objects(): mark_reachable(root, reachable) # 2. Sweep phase for obj in generation: if obj not in reachable: free_object(obj) # 3. Promote survivors if generation < 2: promote_to_next_generation(survivors)
Using the gc Module
Manual Collection
import gc # Disable automatic collection gc.disable() # Create circular references for i in range(1000): a = [] b = [] a.append(b) b.append(a) # Manual collection collected = gc.collect() # Returns number of objects collected print(f"Collected {collected} objects") # Re-enable automatic collection gc.enable()
Debugging with gc
import gc # Enable debug flags gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_LEAK) # Create objects with circular references class MyClass: def __init__(self): self.ref = self obj = MyClass() del obj # Force collection with debug output gc.collect() # Get garbage objects (uncollectable) garbage = gc.garbage print(f"Uncollectable objects: {len(garbage)}")
GC Statistics
import gc # Get collection stats stats = gc.get_stats() for i, gen_stats in enumerate(stats): print(f"Generation {i}:") print(f" Collections: {gen_stats['collections']}") print(f" Collected: {gen_stats['collected']}") print(f" Uncollectable: {gen_stats['uncollectable']}")
Weak References
Breaking Cycles with weakref
import weakref class Node: def __init__(self, value): self.value = value self.parent = None self.children = [] def add_child(self, child): self.children.append(child) # Use weak reference to avoid cycle child.parent = weakref.ref(self) def get_parent(self): if self.parent: return self.parent() # Call weak reference return None # No circular reference problem! parent = Node("parent") child = Node("child") parent.add_child(child) del parent # Parent can be collected # child.get_parent() returns None
WeakValueDictionary
import weakref # Cache that doesn't prevent garbage collection class Cache: def __init__(self): self._cache = weakref.WeakValueDictionary() def get(self, key): return self._cache.get(key) def set(self, key, value): self._cache[key] = value cache = Cache() obj = MyExpensiveObject() cache.set("key", obj) # Object exists in cache assert cache.get("key") is obj # Delete only reference del obj # Object automatically removed from cache import gc gc.collect() assert cache.get("key") is None
Performance Tuning
Adjusting Thresholds
import gc # Default: (700, 10, 10) # Format: (threshold0, threshold1, threshold2) # More aggressive collection gc.set_threshold(500, 5, 5) # Less frequent collection (better performance) gc.set_threshold(1000, 20, 20) # Disable automatic collection entirely gc.set_threshold(0) # Must call gc.collect() manually
Monitoring GC Impact
import gc import time # Measure GC overhead gc_time = 0 for gen_stats in gc.get_stats(): gc_time += gen_stats.get('gc_time', 0) print(f"Total GC time: {gc_time:.3f} seconds") # Benchmark with/without GC gc.disable() start = time.time() # Your code here no_gc_time = time.time() - start gc.enable() start = time.time() # Your code here with_gc_time = time.time() - start print(f"Without GC: {no_gc_time:.3f}s") print(f"With GC: {with_gc_time:.3f}s")
Best Practices
1. Avoid Creating Cycles
# Bad: Circular reference class BadNode: def __init__(self): self.parent = None self.children = [] # Good: Use weak references class GoodNode: def __init__(self): self.parent = None # Set as weakref.ref() self.children = []
2. Explicitly Break Cycles
class Resource: def __init__(self): self.circular_ref = self def cleanup(self): self.circular_ref = None # Break cycle # Use context managers class ManagedResource: def __enter__(self): return self def __exit__(self, *args): self.cleanup() # Automatic cleanup
3. Use slots for Large Numbers of Objects
# Reduces memory and GC overhead class Point: __slots__ = ('x', 'y') def __init__(self, x, y): self.x = x self.y = y
4. Profile Before Optimizing
import gc import tracemalloc # Start tracing tracemalloc.start() gc.collect() # Clean slate # Your code here create_many_objects() # Check memory and GC stats snapshot = tracemalloc.take_snapshot() top_stats = snapshot.statistics('lineno') for stat in top_stats[:10]: print(stat) print(f"GC collected: {gc.collect()} objects")
Common Pitfalls
1. Relying on del
# Problematic: __del__ with circular references class Bad: def __del__(self): print("Cleaning up") # May never be called! # Better: Use context managers class Good: def __enter__(self): return self def __exit__(self, *args): print("Cleaning up") # Guaranteed to be called
2. Large Default Arguments
# Bad: Keeps large object alive def bad_function(data=large_default_object): pass # Good: Create when needed def good_function(data=None): if data is None: data = create_large_object()
Key Takeaways
- Reference counting handles most cleanup immediately
- Cyclic GC handles circular references in generations
- Weak references can break cycles
- gc module provides control and debugging
- Profile first before adjusting GC settings
- Context managers ensure cleanup
- slots reduces GC overhead