Python Memory Management

10 min

How CPython manages memory with PyMalloc, object pools, and reference counting

Best viewed on desktop for optimal interactive experience

Python Memory Architecture

CPython uses a hierarchical memory management system optimized for the allocation patterns of typical Python programs.

Python Memory Management

PyMalloc Memory Pools

8B
45/64 blocks
int, bool
16B
32/64 blocks
float
24B
28/64 blocks
small str
32B
20/64 blocks
tuple
40B
15/64 blocks
small list
48B
10/64 blocks
small dict
64B
8/64 blocks
object
128B
4/32 blocks
medium str
256B
2/16 blocks
large list
512B
1/8 blocks
large dict

Simulate Allocation

Memory Optimization Tips

  • • Use `__slots__` to reduce memory overhead for classes
  • • Reuse objects when possible (especially small integers and strings)
  • • Use generators for large datasets to avoid loading everything into memory
  • • Profile memory usage with tools like memory_profiler or tracemalloc

Memory Management Layers

1. System Allocator

  • Used for large objects (greater than 512 bytes)
  • Direct calls to system malloc()/free()
  • No Python-specific optimizations

2. PyMalloc (Object Allocator)

  • Handles small objects (less than 512 bytes)
  • Reduces fragmentation
  • Faster than system malloc for small allocations

3. Object-Specific Allocators

  • Specialized allocators for ints, lists, dicts
  • Free lists for common types
  • Object pools for frequently created/destroyed objects

PyMalloc Architecture

Memory Hierarchy

Arena (256 KB) ├── Pool 1 (4 KB) - 8-byte blocks ├── Pool 2 (4 KB) - 16-byte blocks ├── Pool 3 (4 KB) - 24-byte blocks └── ... up to 512-byte blocks

Allocation Strategy

# Small allocation (<= 512 bytes) def allocate_small(size): size_class = round_up_to_multiple_of_8(size) pool = find_pool_for_size_class(size_class) if pool.has_free_block(): return pool.allocate_block() else: return create_new_pool(size_class).allocate_block() # Large allocation (> 512 bytes) def allocate_large(size): return system_malloc(size)

Reference Counting

How It Works

Every Python object has a reference count:

import sys a = [] # refcount = 1 b = a # refcount = 2 c = [a, a] # refcount = 4 del b # refcount = 3 c = None # refcount = 1 del a # refcount = 0 → object freed

Checking Reference Counts

import sys obj = "hello" print(sys.getrefcount(obj)) # Shows count + 1 (temporary ref) # Reference count changes x = [1, 2, 3] print(sys.getrefcount(x)) # 2 (x + temporary) y = x print(sys.getrefcount(x)) # 3 (x + y + temporary) container = [x, x, x] print(sys.getrefcount(x)) # 6 (x + y + 3 in container + temporary)

Object Caching

Small Integer Cache

# Integers from -5 to 256 are cached a = 256 b = 256 print(a is b) # True - same object c = 257 d = 257 print(c is d) # False - different objects # Force same object with sys.intern equivalent behavior import sys e = 100 f = 100 print(e is f) # True - cached

String Interning

# Short strings are often interned a = "hello" b = "hello" print(a is b) # True - interned # Longer strings may not be c = "hello world this is a long string" d = "hello world this is a long string" print(c is d) # May be False # Force interning import sys e = sys.intern("long string to intern") f = sys.intern("long string to intern") print(e is f) # True - forced interning

Free Lists

CPython maintains free lists for common types:

# Lists reuse memory lists = [] for i in range(1000): lists.append([1, 2, 3]) # Delete all - memory goes to free list del lists # New lists reuse the freed memory new_lists = [] for i in range(1000): new_lists.append([4, 5, 6]) # Faster allocation

Memory Profiling

Using tracemalloc

import tracemalloc # Start tracing tracemalloc.start() # Your code here data = [i for i in range(1000000)] # Get current memory usage current, peak = tracemalloc.get_traced_memory() print(f"Current: {current / 1024 / 1024:.1f} MB") print(f"Peak: {peak / 1024 / 1024:.1f} MB") # Get top memory users snapshot = tracemalloc.take_snapshot() top_stats = snapshot.statistics('lineno') for stat in top_stats[:3]: print(stat) tracemalloc.stop()

Using memory_profiler

# Install: pip install memory-profiler from memory_profiler import profile @profile def memory_hungry_function(): a = [1] * (10 ** 6) b = [2] * (2 * 10 ** 7) del b return a # Run with: python -m memory_profiler script.py

Memory Optimization Techniques

1. Use slots

# Without __slots__ - uses dict class Point: def __init__(self, x, y): self.x = x self.y = y # Memory per instance: ~296 bytes # With __slots__ - fixed attributes class PointOptimized: __slots__ = ('x', 'y') def __init__(self, x, y): self.x = x self.y = y # Memory per instance: ~56 bytes (5x less!)

2. Use Generators

# Bad: Creates entire list in memory def get_squares(n): return [x**2 for x in range(n)] # Good: Generates values on demand def get_squares_gen(n): return (x**2 for x in range(n)) # Memory comparison import sys list_squares = get_squares(1000000) print(sys.getsizeof(list_squares)) # ~8.5 MB gen_squares = get_squares_gen(1000000) print(sys.getsizeof(gen_squares)) # ~120 bytes

3. String Operations

# Bad: Creates many intermediate strings result = "" for item in items: result += str(item) + ", " # Good: Single allocation result = ", ".join(str(item) for item in items)

4. Reuse Objects

# Bad: Creates new list each time def process_data(): temp = [] for item in data: temp.append(transform(item)) return temp # Good: Reuse with clear() temp_buffer = [] def process_data_optimized(): temp_buffer.clear() for item in data: temp_buffer.append(transform(item)) return temp_buffer.copy()

Memory Leaks in Python

Common Causes

  1. Circular References (handled by GC)
  2. Global Caches that grow indefinitely
  3. Unclosed Resources (files, connections)
  4. Large Default Arguments
# Memory leak example cache = {} def cached_computation(x): if x not in cache: cache[x] = expensive_computation(x) return cache[x] # Cache grows without bound! # Fixed version with LRU cache from functools import lru_cache @lru_cache(maxsize=128) def cached_computation_fixed(x): return expensive_computation(x)

Best Practices

  1. Profile Before Optimizing: Use tools to find actual bottlenecks
  2. Prefer Built-in Types: They're optimized in C
  3. Use Context Managers: Ensure cleanup with with statements
  4. Limit Cache Sizes: Use lru_cache or similar
  5. Consider Data Types: array.array for homogeneous data
  6. Lazy Loading: Don't load data until needed

Key Takeaways

  • PyMalloc optimizes small object allocation
  • Reference counting is the primary memory management
  • Object caching improves performance for common values
  • Free lists reduce allocation overhead
  • slots can significantly reduce memory usage
  • Generators provide memory-efficient iteration
  • Profile to find real memory issues

If you found this explanation helpful, consider sharing it with others.

Mastodon