Global Interpreter Lock (GIL)

12 min

Understanding Python's GIL, its impact on multithreading, and workarounds

Best viewed on desktop for optimal interactive experience

What is the GIL?

The Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecode simultaneously. Only one thread can hold the GIL at any given time.

Global Interpreter Lock (GIL)

GIL Status

Not acquired

Bytecode Ticks
0

Thread Execution

Thread 1
CPU
Thread 2
CPU
Thread 3
IO
Thread 4
CPU

When GIL is Released

I/O Operation
file.read(), socket.recv()
time.sleep()
time.sleep(1)
C Extension
numpy operations
Pure Python
for i in range(1000000)
Threading Lock
lock.acquire()
Every 100 bytecodes
Automatic check
CPU-Bound Tasks
No true parallelism
Threads take turns
Use multiprocessing instead
I/O-Bound Tasks
GIL released during I/O
Good concurrency
Threading works well

Working Around the GIL

  • • Use multiprocessing for CPU-bound parallelism
  • • Use asyncio for I/O-bound concurrency
  • • Write performance-critical code in C extensions
  • • Consider alternative Python implementations (PyPy, Jython)
  • • Use concurrent.futures for high-level parallelism

Why Does the GIL Exist?

1. Reference Counting Safety

# Without GIL, this would be unsafe import sys obj = [] # Thread 1: sys.getrefcount(obj) # Read refcount # Thread 2: del obj # Modify refcount # Race condition!

2. C Extension Compatibility

Many C extensions assume they have exclusive access to Python objects.

3. Simplicity

Single lock is simpler than fine-grained locking throughout the interpreter.

How the GIL Works

Thread Switching

# Simplified GIL behavior while True: acquire_gil() # Execute 100 bytecode instructions for _ in range(100): execute_one_instruction() # Check if other threads are waiting if other_threads_waiting(): release_gil() # Give other threads a chance thread_yield()

GIL Release Points

The GIL is released:

  • During I/O operations
  • When calling time.sleep()
  • Every 100 bytecode instructions (check)
  • In some C extensions

Impact on Different Workloads

CPU-Bound Tasks (Poor Performance)

import threading import time def cpu_intensive(): total = 0 for i in range(100_000_000): total += i return total # Single thread start = time.time() cpu_intensive() print(f"Single thread: {time.time() - start:.2f}s") # Multiple threads - NO SPEEDUP! start = time.time() threads = [] for _ in range(4): t = threading.Thread(target=cpu_intensive) t.start() threads.append(t) for t in threads: t.join() print(f"4 threads: {time.time() - start:.2f}s") # Actually SLOWER due to context switching!

I/O-Bound Tasks (Good Performance)

import threading import time import requests def io_task(url): response = requests.get(url) return len(response.content) urls = ["http://example.com"] * 10 # Single thread start = time.time() for url in urls: io_task(url) print(f"Sequential: {time.time() - start:.2f}s") # Multiple threads - MUCH FASTER! start = time.time() threads = [] for url in urls: t = threading.Thread(target=io_task, args=(url,)) t.start() threads.append(t) for t in threads: t.join() print(f"Threaded: {time.time() - start:.2f}s")

Working Around the GIL

1. Multiprocessing

from multiprocessing import Pool import time def cpu_task(n): total = 0 for i in range(n): total += i return total # Use multiple processes instead of threads with Pool(4) as pool: start = time.time() results = pool.map(cpu_task, [25_000_000] * 4) print(f"Multiprocessing: {time.time() - start:.2f}s")

2. Asyncio for I/O

import asyncio import aiohttp async def fetch(session, url): async with session.get(url) as response: return await response.text() async def main(): urls = ["http://example.com"] * 10 async with aiohttp.ClientSession() as session: tasks = [fetch(session, url) for url in urls] results = await asyncio.gather(*tasks) return results # Run async tasks start = time.time() asyncio.run(main()) print(f"Async: {time.time() - start:.2f}s")

3. C Extensions

# NumPy releases the GIL for many operations import numpy as np import threading def numpy_operation(): # GIL released during computation a = np.random.random((1000, 1000)) b = np.random.random((1000, 1000)) return np.dot(a, b) # This can achieve parallelism threads = [] for _ in range(4): t = threading.Thread(target=numpy_operation) t.start() threads.append(t)

4. Alternative Python Implementations

  • PyPy: JIT compiler, still has GIL but faster
  • Jython: Runs on JVM, no GIL
  • IronPython: Runs on .NET, no GIL
  • CPython 3.13+: Experimental no-GIL build

GIL Behavior Examples

Example 1: CPU vs I/O

import threading import time # CPU-bound function def count(n): while n > 0: n -= 1 # I/O-bound function def sleep_task(): time.sleep(1) # CPU-bound: No parallelism start = time.time() t1 = threading.Thread(target=count, args=(100_000_000,)) t2 = threading.Thread(target=count, args=(100_000_000,)) t1.start(); t2.start() t1.join(); t2.join() print(f"CPU-bound threads: {time.time() - start:.2f}s") # I/O-bound: True parallelism start = time.time() t1 = threading.Thread(target=sleep_task) t2 = threading.Thread(target=sleep_task) t1.start(); t2.start() t1.join(); t2.join() print(f"I/O-bound threads: {time.time() - start:.2f}s") # ~1s, not 2s!

Example 2: GIL Battle

import threading import time counter = 0 iterations = 100_000_000 def increment(): global counter for _ in range(iterations): counter += 1 def decrement(): global counter for _ in range(iterations): counter -= 1 # Threads fight for GIL t1 = threading.Thread(target=increment) t2 = threading.Thread(target=decrement) start = time.time() t1.start(); t2.start() t1.join(); t2.join() print(f"Time: {time.time() - start:.2f}s") print(f"Counter: {counter}") # Should be 0, might not be due to race conditions!

Best Practices

1. Choose the Right Tool

# CPU-bound: Use multiprocessing from multiprocessing import Process # I/O-bound: Use threading or asyncio from threading import Thread import asyncio # Mixed: Use ProcessPoolExecutor with ThreadPoolExecutor from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor

2. Profile First

import cProfile import threading def profile_threading(): # Profile to see if GIL is the bottleneck pass cProfile.run('profile_threading()')

3. Use Queue for Communication

from queue import Queue import threading def worker(queue): while True: item = queue.get() if item is None: break process(item) queue.task_done() # Thread-safe communication q = Queue() threads = [] for _ in range(4): t = threading.Thread(target=worker, args=(q,)) t.start() threads.append(t)

Common Misconceptions

❌ "Python can't do parallelism"

✅ Python can via multiprocessing, just not thread-based for CPU tasks

❌ "The GIL makes Python slow"

✅ The GIL only affects multi-threaded CPU-bound code

❌ "Threading is useless in Python"

✅ Threading works great for I/O-bound tasks

❌ "Remove the GIL to fix everything"

✅ Removing GIL has tradeoffs (complexity, single-thread performance)

Future of the GIL

PEP 703: Making GIL Optional

  • Experimental no-GIL build in Python 3.13
  • Gradual migration path
  • Performance implications being evaluated

Subinterpreters

  • PEP 554: Multiple interpreters in one process
  • Each with its own GIL
  • Better isolation than threads

Key Takeaways

  1. GIL prevents true parallelism in threads for CPU-bound tasks
  2. I/O-bound tasks work well with threading despite GIL
  3. Use multiprocessing for CPU-bound parallelism
  4. Use asyncio for high-concurrency I/O
  5. Profile first to identify if GIL is actually your bottleneck
  6. Alternative implementations exist without GIL
  7. Future Python may make GIL optional

If you found this explanation helpful, consider sharing it with others.

Mastodon