Python Garbage Collection

8 min

How Python handles memory cleanup with reference counting and cyclic GC

Best viewed on desktop for optimal interactive experience

Python's Dual Approach

Python uses two complementary mechanisms for memory management:

  1. Reference Counting (primary, immediate)
  2. Generational Garbage Collection (secondary, for cycles)

Garbage Collection

Generational Garbage Collector

Generation 0
New objects
700/700
Generation 1
Survived 1 collection
10/10
Generation 2
Long-lived objects
10/10

Circular Reference Detection

A
B
B
C
C
A
D
E
E
D

Objects A-B-C form a reachable cycle. Objects D-E form an unreachable cycle.

GC Configuration
import gc gc.get_threshold() # (700, 10, 10) gc.set_threshold(800, 20, 20) gc.collect() # Manual collection
Best Practices
  • • Break circular references explicitly
  • • Use weak references when appropriate
  • • Monitor with gc.get_stats()
  • • Profile with gc.set_debug()

Reference Counting

How It Works

import sys # Object created, refcount = 1 x = [] # Reference added, refcount = 2 y = x # Check refcount (adds temporary reference) print(sys.getrefcount(x)) # 3 # Remove reference, refcount = 1 del y # Remove last reference, object freed del x # Object deallocated immediately

Advantages

  • Immediate cleanup
  • Predictable
  • Simple implementation

Disadvantages

  • Cannot handle circular references
  • Overhead on every operation
  • Not thread-safe (needs GIL)

Circular References Problem

Creating Circular References

# Simple circular reference class Node: def __init__(self, value): self.value = value self.next = None # Create cycle a = Node(1) b = Node(2) a.next = b b.next = a # Circular reference! # Even after deleting references del a, b # Objects still exist in memory (refcount never reaches 0)

Real-World Examples

# Parent-child relationships class Parent: def __init__(self): self.children = [] def add_child(self, child): self.children.append(child) child.parent = self # Circular reference # Event handlers class EventEmitter: def __init__(self): self.handlers = [] def on(self, handler): self.handlers.append(handler) class Handler: def __init__(self, emitter): self.emitter = emitter emitter.on(self.handle) # Circular reference def handle(self, event): pass

Generational Garbage Collection

The Three Generations

import gc # Get current thresholds print(gc.get_threshold()) # (700, 10, 10) # Generation 0: New objects, collected frequently # Generation 1: Survived 1 collection # Generation 2: Long-lived objects, collected rarely

Collection Algorithm

# Simplified collection process def collect_generation(generation): # 1. Mark phase reachable = set() for root in get_root_objects(): mark_reachable(root, reachable) # 2. Sweep phase for obj in generation: if obj not in reachable: free_object(obj) # 3. Promote survivors if generation < 2: promote_to_next_generation(survivors)

Using the gc Module

Manual Collection

import gc # Disable automatic collection gc.disable() # Create circular references for i in range(1000): a = [] b = [] a.append(b) b.append(a) # Manual collection collected = gc.collect() # Returns number of objects collected print(f"Collected {collected} objects") # Re-enable automatic collection gc.enable()

Debugging with gc

import gc # Enable debug flags gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_LEAK) # Create objects with circular references class MyClass: def __init__(self): self.ref = self obj = MyClass() del obj # Force collection with debug output gc.collect() # Get garbage objects (uncollectable) garbage = gc.garbage print(f"Uncollectable objects: {len(garbage)}")

GC Statistics

import gc # Get collection stats stats = gc.get_stats() for i, gen_stats in enumerate(stats): print(f"Generation {i}:") print(f" Collections: {gen_stats['collections']}") print(f" Collected: {gen_stats['collected']}") print(f" Uncollectable: {gen_stats['uncollectable']}")

Weak References

Breaking Cycles with weakref

import weakref class Node: def __init__(self, value): self.value = value self.parent = None self.children = [] def add_child(self, child): self.children.append(child) # Use weak reference to avoid cycle child.parent = weakref.ref(self) def get_parent(self): if self.parent: return self.parent() # Call weak reference return None # No circular reference problem! parent = Node("parent") child = Node("child") parent.add_child(child) del parent # Parent can be collected # child.get_parent() returns None

WeakValueDictionary

import weakref # Cache that doesn't prevent garbage collection class Cache: def __init__(self): self._cache = weakref.WeakValueDictionary() def get(self, key): return self._cache.get(key) def set(self, key, value): self._cache[key] = value cache = Cache() obj = MyExpensiveObject() cache.set("key", obj) # Object exists in cache assert cache.get("key") is obj # Delete only reference del obj # Object automatically removed from cache import gc gc.collect() assert cache.get("key") is None

Performance Tuning

Adjusting Thresholds

import gc # Default: (700, 10, 10) # Format: (threshold0, threshold1, threshold2) # More aggressive collection gc.set_threshold(500, 5, 5) # Less frequent collection (better performance) gc.set_threshold(1000, 20, 20) # Disable automatic collection entirely gc.set_threshold(0) # Must call gc.collect() manually

Monitoring GC Impact

import gc import time # Measure GC overhead gc_time = 0 for gen_stats in gc.get_stats(): gc_time += gen_stats.get('gc_time', 0) print(f"Total GC time: {gc_time:.3f} seconds") # Benchmark with/without GC gc.disable() start = time.time() # Your code here no_gc_time = time.time() - start gc.enable() start = time.time() # Your code here with_gc_time = time.time() - start print(f"Without GC: {no_gc_time:.3f}s") print(f"With GC: {with_gc_time:.3f}s")

Best Practices

1. Avoid Creating Cycles

# Bad: Circular reference class BadNode: def __init__(self): self.parent = None self.children = [] # Good: Use weak references class GoodNode: def __init__(self): self.parent = None # Set as weakref.ref() self.children = []

2. Explicitly Break Cycles

class Resource: def __init__(self): self.circular_ref = self def cleanup(self): self.circular_ref = None # Break cycle # Use context managers class ManagedResource: def __enter__(self): return self def __exit__(self, *args): self.cleanup() # Automatic cleanup

3. Use slots for Large Numbers of Objects

# Reduces memory and GC overhead class Point: __slots__ = ('x', 'y') def __init__(self, x, y): self.x = x self.y = y

4. Profile Before Optimizing

import gc import tracemalloc # Start tracing tracemalloc.start() gc.collect() # Clean slate # Your code here create_many_objects() # Check memory and GC stats snapshot = tracemalloc.take_snapshot() top_stats = snapshot.statistics('lineno') for stat in top_stats[:10]: print(stat) print(f"GC collected: {gc.collect()} objects")

Common Pitfalls

1. Relying on del

# Problematic: __del__ with circular references class Bad: def __del__(self): print("Cleaning up") # May never be called! # Better: Use context managers class Good: def __enter__(self): return self def __exit__(self, *args): print("Cleaning up") # Guaranteed to be called

2. Large Default Arguments

# Bad: Keeps large object alive def bad_function(data=large_default_object): pass # Good: Create when needed def good_function(data=None): if data is None: data = create_large_object()

Key Takeaways

  1. Reference counting handles most cleanup immediately
  2. Cyclic GC handles circular references in generations
  3. Weak references can break cycles
  4. gc module provides control and debugging
  5. Profile first before adjusting GC settings
  6. Context managers ensure cleanup
  7. slots reduces GC overhead

If you found this explanation helpful, consider sharing it with others.

Mastodon