Introduction
When you type ./program
, a complex dance begins. The kernel loads your executable, maps it into memory, resolves dynamic libraries, and transfers control to your code. This article explores the journey from executable file to running process with interactive visualizations.
The Loading Process
The loading process involves several key steps:
- Kernel reads the executable file
- Creates a new process
- Maps executable into memory
- Loads dynamic linker
- Dynamic linker loads libraries
- Transfers control to main()
ELF File Structure
Understanding the Executable and Linkable Format (ELF) is crucial for understanding loading.
ELF Components
// Simplified ELF structure typedef struct { unsigned char e_ident[16]; // Magic number and other info uint16_t e_type; // Object file type uint16_t e_machine; // Architecture uint32_t e_version; // Object file version uint64_t e_entry; // Entry point virtual address uint64_t e_phoff; // Program header table offset uint64_t e_shoff; // Section header table offset // ... more fields } Elf64_Ehdr; typedef struct { uint32_t p_type; // Segment type uint32_t p_flags; // Segment flags uint64_t p_offset; // Segment file offset uint64_t p_vaddr; // Segment virtual address uint64_t p_paddr; // Segment physical address uint64_t p_filesz; // Segment size in file uint64_t p_memsz; // Segment size in memory uint64_t p_align; // Segment alignment } Elf64_Phdr;
Examining ELF Files
# View ELF header readelf -h program # View program headers (for loading) readelf -l program # View section headers (for linking) readelf -S program # Hex dump of specific section objdump -s -j .text program # Disassemble objdump -d program
Important Segments
LOAD # Loadable segment (code/data) DYNAMIC # Dynamic linking information INTERP # Path to dynamic linker GNU_STACK # Stack permissions GNU_RELRO # Read-only after relocation
Process Memory Layout
Once loaded, a process has a well-defined memory layout.
Memory Regions
// High Address (0x7FFFFFFFFFFF on x86-64) // ↓ // Kernel Space (not accessible) // ===================================== // Stack (grows downward ↓) // - Function parameters // - Return addresses // - Local variables // // Memory Mapping Region // - Shared libraries // - mmap allocations // - Thread stacks // // Heap (grows upward ↑) // - Dynamic allocations (new/malloc) // // BSS Segment // - Uninitialized global variables // // Data Segment // - Initialized global variables // // Text Segment // - Program code (read-only) // ↓ // Low Address (0x400000 typical start)
Viewing Process Memory
# View memory mappings cat /proc/<pid>/maps # Or for current process cat /proc/self/maps # Example output: # 00400000-00401000 r-xp /path/to/program # Text # 00600000-00601000 r--p /path/to/program # Data # 00601000-00602000 rw-p /path/to/program # Data # 7fff00000000-7fff00021000 rw-p [heap] # 7ffff7a00000-7ffff7c00000 r-xp /lib/libc.so.6 # 7ffffffde000-7ffffffff000 rw-p [stack]
Memory Permissions
// Permission flags // r = read // w = write // x = execute // p = private (copy-on-write) // s = shared // Changing permissions at runtime #include <sys/mman.h> void make_executable(void* addr, size_t len) { mprotect(addr, len, PROT_READ | PROT_EXEC); }
Virtual Memory Management
Modern systems use virtual memory to provide isolation and flexibility.
Virtual to Physical Mapping
// Each process has its own virtual address space // Virtual addresses are translated to physical addresses // Page size (typically 4KB) size_t page_size = sysconf(_SC_PAGESIZE); // Allocate aligned memory void* aligned = aligned_alloc(page_size, size); // Memory-mapped files int fd = open("file.dat", O_RDWR); void* mapped = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
Page Faults and Demand Paging
// Pages are loaded on-demand char* large_array = new char[1000000000]; // 1GB // Memory not actually allocated yet! large_array[0] = 'A'; // Page fault, page allocated large_array[999999999] = 'Z'; // Another page fault // View page faults // cat /proc/<pid>/stat | awk '{print $10, $12}' // Field 10: minor faults, Field 12: major faults
Program Startup Sequence
Before main() runs, extensive initialization occurs.
The Real Entry Point
// Not main(), but _start // Simplified startup sequence: _start: // Entry point from ELF header → __libc_start_main: // glibc initialization → __libc_csu_init: // C startup → _init: // Global constructors → main: // Your code → _fini: // Global destructors → exit: // Cleanup and terminate
Before main()
// Global constructors run before main class GlobalObject { public: GlobalObject() { std::cout << "Before main!\n"; } ~GlobalObject() { std::cout << "After main!\n"; } }; GlobalObject obj; // Constructor runs before main() // Using attributes __attribute__((constructor)) void before_main() { std::cout << "Also before main!\n"; } __attribute__((destructor)) void after_main() { std::cout << "After main returns!\n"; }
Initialization Order
// Within a translation unit: top to bottom int a = 1; // Initialized first int b = a + 1; // Initialized second // Across translation units: undefined! // file1.cpp int x = get_value(); // When does this run? // file2.cpp int y = x + 1; // Dangerous! x might not be initialized // Solution: Use functions int& get_x() { static int x = get_value(); // Initialized on first use return x; }
Dynamic Linker/Loader
The dynamic linker (ld.so) handles runtime library loading.
How Dynamic Loading Works
# The kernel loads the program and notices INTERP segment readelf -l program | grep INTERP # [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] # Kernel loads ld.so, which then: # 1. Loads required shared libraries # 2. Performs relocations # 3. Runs initialization functions # 4. Transfers control to program
Runtime Library Loading
#include <dlfcn.h> // Load library at runtime void* handle = dlopen("libplugin.so", RTLD_LAZY); if (!handle) { std::cerr << "Error: " << dlerror() << std::endl; return; } // Get function pointer typedef int (*func_t)(int); func_t func = (func_t)dlsym(handle, "function_name"); if (!func) { std::cerr << "Symbol not found: " << dlerror() << std::endl; dlclose(handle); return; } // Call function int result = func(42); // Unload library dlclose(handle);
LD Environment Variables
# Library search path LD_LIBRARY_PATH=/custom/lib:$LD_LIBRARY_PATH ./program # Preload libraries LD_PRELOAD=./my_malloc.so ./program # Debug dynamic linker LD_DEBUG=all ./program 2>&1 | head -50 # LD_DEBUG options: # bindings - Symbol binding # libs - Library searching # reloc - Relocations # symbols - Symbol table processing # all - Everything # Disable lazy binding LD_BIND_NOW=1 ./program # Show library dependencies LD_TRACE_LOADED_OBJECTS=1 ./program # Same as ldd
Stack and Heap
Understanding stack and heap behavior is crucial for C++ developers.
Stack Organization
void function(int param) { int local1 = 10; int local2 = 20; char buffer[100]; // Stack frame: // [return address] // [saved rbp] // [local1] // [local2] // [buffer[0..99]] // [padding] } // View stack #include <execinfo.h> void print_backtrace() { void* array[10]; size_t size = backtrace(array, 10); char** strings = backtrace_symbols(array, size); for (size_t i = 0; i < size; i++) { std::cout << strings[i] << std::endl; } free(strings); }
Stack Overflow Detection
// Stack size limits #include <sys/resource.h> void check_stack_limit() { struct rlimit rl; getrlimit(RLIMIT_STACK, &rl); std::cout << "Stack limit: " << rl.rlim_cur << " bytes\n"; } // Increase stack size void increase_stack() { struct rlimit rl; getrlimit(RLIMIT_STACK, &rl); rl.rlim_cur = 16 * 1024 * 1024; // 16MB setrlimit(RLIMIT_STACK, &rl); } // Guard pages detect overflow // Accessing guard page triggers SIGSEGV
Heap Management
// Multiple heap allocation strategies // 1. First-fit: Use first suitable block // 2. Best-fit: Use smallest suitable block // 3. Worst-fit: Use largest block (fragmentation) // Modern allocators (ptmalloc, jemalloc, tcmalloc) // use sophisticated strategies: // - Thread-local caches // - Size classes // - Arena allocation // Custom allocator class PoolAllocator { char pool[1024 * 1024]; // 1MB pool size_t offset = 0; public: void* allocate(size_t size) { if (offset + size > sizeof(pool)) throw std::bad_alloc(); void* ptr = pool + offset; offset += size; return ptr; } void deallocate(void* ptr) { // No-op for pool allocator } };
Runtime Linking Mechanics
Understanding how symbols are resolved at runtime.
PLT/GOT in Action
; First call to printf call printf@plt ; PLT stub for printf: printf@plt: jmp *printf@got ; Jump to GOT entry push $index ; Push relocation index jmp .plt0 ; Jump to resolver ; After resolution, GOT contains actual address ; Subsequent calls go directly to printf
Symbol Interposition
// Override library functions // interpose.cpp #include <cstdio> extern "C" int puts(const char* s) { printf("[INTERCEPTED] %s\n", s); return 0; } // Compile and use: // g++ -shared -fPIC interpose.cpp -o interpose.so // LD_PRELOAD=./interpose.so ./program
Weak Symbols and Runtime
// Weak symbol allows runtime override __attribute__((weak)) void optional_feature() { printf("Default implementation\n"); } // Can be overridden by LD_PRELOAD or linking
Thread Local Storage (TLS)
Each thread gets its own copy of TLS variables.
// Thread-local variable thread_local int tls_var = 0; // Or using __thread (C-style) __thread int c_tls_var = 0; void thread_function() { tls_var++; // Each thread has its own copy std::cout << "Thread " << std::this_thread::get_id() << ": tls_var = " << tls_var << std::endl; } // TLS implementation: // - Static TLS: Allocated at thread creation // - Dynamic TLS: Allocated on first access
TLS Internals
// TLS is typically implemented using: // 1. Thread pointer (FS segment on x86-64) // 2. TLS descriptor // 3. DTV (Dynamic Thread Vector) // Access TLS variable (simplified): mov %fs:tls_var@tpoff, %eax // Read TLS variable
Security Features
Modern loaders implement various security features.
ASLR (Address Space Layout Randomization)
# Check if ASLR is enabled cat /proc/sys/kernel/randomize_va_space # 0 = disabled # 1 = randomize stack and libs # 2 = full randomization # Disable for debugging echo 0 | sudo tee /proc/sys/kernel/randomize_va_space # Or for single execution setarch x86_64 -R ./program
Stack Canaries
// Compiler adds canaries to detect buffer overflow void vulnerable(char* input) { char buffer[64]; strcpy(buffer, input); // Potential overflow // Canary checked before return } // Compile with stack protection // g++ -fstack-protector-all program.cpp
RELRO (Relocation Read-Only)
# Partial RELRO (default) g++ program.cpp # Full RELRO (GOT is read-only after relocation) g++ -Wl,-z,relro,-z,now program.cpp # Check RELRO status checksec --file=program
NX Bit (No-Execute)
// Modern systems mark stack/heap as non-executable // Prevents code injection attacks // Check NX status readelf -l program | grep GNU_STACK # RW = NX enabled (good) # RWE = NX disabled (bad)
Debugging Runtime Issues
Core Dumps
# Enable core dumps ulimit -c unlimited # Set core pattern echo "core.%e.%p" | sudo tee /proc/sys/kernel/core_pattern # Analyze core dump gdb program core (gdb) bt # Backtrace (gdb) info registers (gdb) x/10x $rsp # Examine stack
Memory Debugging
// Valgrind // valgrind --leak-check=full ./program // AddressSanitizer // g++ -fsanitize=address -g program.cpp // ./program // Custom memory tracking void* operator new(size_t size) { void* p = malloc(size); std::cout << "Allocated " << size << " bytes at " << p << std::endl; return p; } void operator delete(void* p) noexcept { std::cout << "Freed memory at " << p << std::endl; free(p); }
Runtime Profiling
# Performance profiling perf record ./program perf report # Memory profiling valgrind --tool=massif ./program ms_print massif.out.* # Call graph profiling valgrind --tool=callgrind ./program kcachegrind callgrind.out.*
Optimizing Load Time
Preloading and Prelinking
# Preload commonly used libraries LD_PRELOAD=/lib/x86_64-linux-gnu/libc.so.6 ./program # Use prelinking (deprecated but educational) sudo prelink -a # Measure startup time time ./program perf stat ./program
Reducing Symbol Resolution
// Use -fvisibility=hidden by default // Only export necessary symbols class __attribute__((visibility("default"))) PublicAPI { // Exported }; class InternalClass { // Hidden };
Lazy Loading
// Load libraries only when needed class PluginManager { void* handle = nullptr; void load_if_needed() { if (!handle) { handle = dlopen("plugin.so", RTLD_LAZY); } } };
Platform-Specific Details
Linux-Specific
// Auxiliary vector (passed by kernel) #include <sys/auxv.h> void print_auxv() { std::cout << "Page size: " << getauxval(AT_PAGESZ) << std::endl; std::cout << "Entry point: " << std::hex << getauxval(AT_ENTRY) << std::endl; }
Windows Differences
// Windows uses PE format instead of ELF // DLLs instead of .so files // Different loader (ntdll.dll) #ifdef _WIN32 HMODULE handle = LoadLibrary("library.dll"); FARPROC func = GetProcAddress(handle, "function"); FreeLibrary(handle); #endif
Best Practices
-
Minimize startup overhead
- Reduce library dependencies
- Use lazy initialization
- Consider static linking for small utilities
-
Optimize memory layout
- Group related data
- Consider cache lines
- Use appropriate allocators
-
Handle initialization order
- Avoid global constructors
- Use lazy initialization
- Document dependencies
-
Security considerations
- Enable all security features
- Validate input in constructors
- Be careful with SUID programs
-
Debug systematically
- Use proper tools (gdb, valgrind)
- Enable debug symbols
- Understand the startup sequence
Conclusion
The journey from executable file to running process involves sophisticated kernel mechanisms, dynamic linking, memory management, and security features. Understanding these concepts helps you:
- Debug runtime issues effectively
- Optimize program startup and memory usage
- Write more secure code
- Understand system behavior
The loading and runtime phase is where your compiled and linked code finally comes to life, transformed from static bytes on disk to a dynamic process in memory.