C++ Loading and Runtime: From Executable to Process

Explore how C++ programs are loaded into memory, how dynamic linking works at runtime, memory layout, and the complete startup sequence with interactive visualizations.

Abhik SarkarAbhik Sarkar
28 min

Best viewed on desktop for optimal interactive experience

Introduction

When you type ./program, a complex dance begins. The kernel loads your executable, maps it into memory, resolves dynamic libraries, and transfers control to your code. This article explores the journey from executable file to running process with interactive visualizations.

The Loading Process

The loading process involves several key steps:

  1. Kernel reads the executable file
  2. Creates a new process
  3. Maps executable into memory
  4. Loads dynamic linker
  5. Dynamic linker loads libraries
  6. Transfers control to main()

ELF File Structure

Understanding the Executable and Linkable Format (ELF) is crucial for understanding loading.

ELF Components

// Simplified ELF structure typedef struct { unsigned char e_ident[16]; // Magic number and other info uint16_t e_type; // Object file type uint16_t e_machine; // Architecture uint32_t e_version; // Object file version uint64_t e_entry; // Entry point virtual address uint64_t e_phoff; // Program header table offset uint64_t e_shoff; // Section header table offset // ... more fields } Elf64_Ehdr; typedef struct { uint32_t p_type; // Segment type uint32_t p_flags; // Segment flags uint64_t p_offset; // Segment file offset uint64_t p_vaddr; // Segment virtual address uint64_t p_paddr; // Segment physical address uint64_t p_filesz; // Segment size in file uint64_t p_memsz; // Segment size in memory uint64_t p_align; // Segment alignment } Elf64_Phdr;

Examining ELF Files

# View ELF header readelf -h program # View program headers (for loading) readelf -l program # View section headers (for linking) readelf -S program # Hex dump of specific section objdump -s -j .text program # Disassemble objdump -d program

Important Segments

LOAD # Loadable segment (code/data) DYNAMIC # Dynamic linking information INTERP # Path to dynamic linker GNU_STACK # Stack permissions GNU_RELRO # Read-only after relocation

Process Memory Layout

Once loaded, a process has a well-defined memory layout.

Memory Regions

// High Address (0x7FFFFFFFFFFF on x86-64) // ↓ // Kernel Space (not accessible) // ===================================== // Stack (grows downward ↓) // - Function parameters // - Return addresses // - Local variables // // Memory Mapping Region // - Shared libraries // - mmap allocations // - Thread stacks // // Heap (grows upward ↑) // - Dynamic allocations (new/malloc) // // BSS Segment // - Uninitialized global variables // // Data Segment // - Initialized global variables // // Text Segment // - Program code (read-only) // ↓ // Low Address (0x400000 typical start)

Viewing Process Memory

# View memory mappings cat /proc/<pid>/maps # Or for current process cat /proc/self/maps # Example output: # 00400000-00401000 r-xp /path/to/program # Text # 00600000-00601000 r--p /path/to/program # Data # 00601000-00602000 rw-p /path/to/program # Data # 7fff00000000-7fff00021000 rw-p [heap] # 7ffff7a00000-7ffff7c00000 r-xp /lib/libc.so.6 # 7ffffffde000-7ffffffff000 rw-p [stack]

Memory Permissions

// Permission flags // r = read // w = write // x = execute // p = private (copy-on-write) // s = shared // Changing permissions at runtime #include <sys/mman.h> void make_executable(void* addr, size_t len) { mprotect(addr, len, PROT_READ | PROT_EXEC); }

Virtual Memory Management

Modern systems use virtual memory to provide isolation and flexibility.

Virtual to Physical Mapping

// Each process has its own virtual address space // Virtual addresses are translated to physical addresses // Page size (typically 4KB) size_t page_size = sysconf(_SC_PAGESIZE); // Allocate aligned memory void* aligned = aligned_alloc(page_size, size); // Memory-mapped files int fd = open("file.dat", O_RDWR); void* mapped = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

Page Faults and Demand Paging

// Pages are loaded on-demand char* large_array = new char[1000000000]; // 1GB // Memory not actually allocated yet! large_array[0] = 'A'; // Page fault, page allocated large_array[999999999] = 'Z'; // Another page fault // View page faults // cat /proc/<pid>/stat | awk '{print $10, $12}' // Field 10: minor faults, Field 12: major faults

Program Startup Sequence

Before main() runs, extensive initialization occurs.

The Real Entry Point

// Not main(), but _start // Simplified startup sequence: _start: // Entry point from ELF header → __libc_start_main: // glibc initialization → __libc_csu_init: // C startup → _init: // Global constructors → main: // Your code → _fini: // Global destructors → exit: // Cleanup and terminate

Before main()

// Global constructors run before main class GlobalObject { public: GlobalObject() { std::cout << "Before main!\n"; } ~GlobalObject() { std::cout << "After main!\n"; } }; GlobalObject obj; // Constructor runs before main() // Using attributes __attribute__((constructor)) void before_main() { std::cout << "Also before main!\n"; } __attribute__((destructor)) void after_main() { std::cout << "After main returns!\n"; }

Initialization Order

// Within a translation unit: top to bottom int a = 1; // Initialized first int b = a + 1; // Initialized second // Across translation units: undefined! // file1.cpp int x = get_value(); // When does this run? // file2.cpp int y = x + 1; // Dangerous! x might not be initialized // Solution: Use functions int& get_x() { static int x = get_value(); // Initialized on first use return x; }

Dynamic Linker/Loader

The dynamic linker (ld.so) handles runtime library loading.

How Dynamic Loading Works

# The kernel loads the program and notices INTERP segment readelf -l program | grep INTERP # [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] # Kernel loads ld.so, which then: # 1. Loads required shared libraries # 2. Performs relocations # 3. Runs initialization functions # 4. Transfers control to program

Runtime Library Loading

#include <dlfcn.h> // Load library at runtime void* handle = dlopen("libplugin.so", RTLD_LAZY); if (!handle) { std::cerr << "Error: " << dlerror() << std::endl; return; } // Get function pointer typedef int (*func_t)(int); func_t func = (func_t)dlsym(handle, "function_name"); if (!func) { std::cerr << "Symbol not found: " << dlerror() << std::endl; dlclose(handle); return; } // Call function int result = func(42); // Unload library dlclose(handle);

LD Environment Variables

# Library search path LD_LIBRARY_PATH=/custom/lib:$LD_LIBRARY_PATH ./program # Preload libraries LD_PRELOAD=./my_malloc.so ./program # Debug dynamic linker LD_DEBUG=all ./program 2>&1 | head -50 # LD_DEBUG options: # bindings - Symbol binding # libs - Library searching # reloc - Relocations # symbols - Symbol table processing # all - Everything # Disable lazy binding LD_BIND_NOW=1 ./program # Show library dependencies LD_TRACE_LOADED_OBJECTS=1 ./program # Same as ldd

Stack and Heap

Understanding stack and heap behavior is crucial for C++ developers.

Stack Organization

void function(int param) { int local1 = 10; int local2 = 20; char buffer[100]; // Stack frame: // [return address] // [saved rbp] // [local1] // [local2] // [buffer[0..99]] // [padding] } // View stack #include <execinfo.h> void print_backtrace() { void* array[10]; size_t size = backtrace(array, 10); char** strings = backtrace_symbols(array, size); for (size_t i = 0; i < size; i++) { std::cout << strings[i] << std::endl; } free(strings); }

Stack Overflow Detection

// Stack size limits #include <sys/resource.h> void check_stack_limit() { struct rlimit rl; getrlimit(RLIMIT_STACK, &rl); std::cout << "Stack limit: " << rl.rlim_cur << " bytes\n"; } // Increase stack size void increase_stack() { struct rlimit rl; getrlimit(RLIMIT_STACK, &rl); rl.rlim_cur = 16 * 1024 * 1024; // 16MB setrlimit(RLIMIT_STACK, &rl); } // Guard pages detect overflow // Accessing guard page triggers SIGSEGV

Heap Management

// Multiple heap allocation strategies // 1. First-fit: Use first suitable block // 2. Best-fit: Use smallest suitable block // 3. Worst-fit: Use largest block (fragmentation) // Modern allocators (ptmalloc, jemalloc, tcmalloc) // use sophisticated strategies: // - Thread-local caches // - Size classes // - Arena allocation // Custom allocator class PoolAllocator { char pool[1024 * 1024]; // 1MB pool size_t offset = 0; public: void* allocate(size_t size) { if (offset + size > sizeof(pool)) throw std::bad_alloc(); void* ptr = pool + offset; offset += size; return ptr; } void deallocate(void* ptr) { // No-op for pool allocator } };

Runtime Linking Mechanics

Understanding how symbols are resolved at runtime.

PLT/GOT in Action

; First call to printf call printf@plt ; PLT stub for printf: printf@plt: jmp *printf@got ; Jump to GOT entry push $index ; Push relocation index jmp .plt0 ; Jump to resolver ; After resolution, GOT contains actual address ; Subsequent calls go directly to printf

Symbol Interposition

// Override library functions // interpose.cpp #include <cstdio> extern "C" int puts(const char* s) { printf("[INTERCEPTED] %s\n", s); return 0; } // Compile and use: // g++ -shared -fPIC interpose.cpp -o interpose.so // LD_PRELOAD=./interpose.so ./program

Weak Symbols and Runtime

// Weak symbol allows runtime override __attribute__((weak)) void optional_feature() { printf("Default implementation\n"); } // Can be overridden by LD_PRELOAD or linking

Thread Local Storage (TLS)

Each thread gets its own copy of TLS variables.

// Thread-local variable thread_local int tls_var = 0; // Or using __thread (C-style) __thread int c_tls_var = 0; void thread_function() { tls_var++; // Each thread has its own copy std::cout << "Thread " << std::this_thread::get_id() << ": tls_var = " << tls_var << std::endl; } // TLS implementation: // - Static TLS: Allocated at thread creation // - Dynamic TLS: Allocated on first access

TLS Internals

// TLS is typically implemented using: // 1. Thread pointer (FS segment on x86-64) // 2. TLS descriptor // 3. DTV (Dynamic Thread Vector) // Access TLS variable (simplified): mov %fs:tls_var@tpoff, %eax // Read TLS variable

Security Features

Modern loaders implement various security features.

ASLR (Address Space Layout Randomization)

# Check if ASLR is enabled cat /proc/sys/kernel/randomize_va_space # 0 = disabled # 1 = randomize stack and libs # 2 = full randomization # Disable for debugging echo 0 | sudo tee /proc/sys/kernel/randomize_va_space # Or for single execution setarch x86_64 -R ./program

Stack Canaries

// Compiler adds canaries to detect buffer overflow void vulnerable(char* input) { char buffer[64]; strcpy(buffer, input); // Potential overflow // Canary checked before return } // Compile with stack protection // g++ -fstack-protector-all program.cpp

RELRO (Relocation Read-Only)

# Partial RELRO (default) g++ program.cpp # Full RELRO (GOT is read-only after relocation) g++ -Wl,-z,relro,-z,now program.cpp # Check RELRO status checksec --file=program

NX Bit (No-Execute)

// Modern systems mark stack/heap as non-executable // Prevents code injection attacks // Check NX status readelf -l program | grep GNU_STACK # RW = NX enabled (good) # RWE = NX disabled (bad)

Debugging Runtime Issues

Core Dumps

# Enable core dumps ulimit -c unlimited # Set core pattern echo "core.%e.%p" | sudo tee /proc/sys/kernel/core_pattern # Analyze core dump gdb program core (gdb) bt # Backtrace (gdb) info registers (gdb) x/10x $rsp # Examine stack

Memory Debugging

// Valgrind // valgrind --leak-check=full ./program // AddressSanitizer // g++ -fsanitize=address -g program.cpp // ./program // Custom memory tracking void* operator new(size_t size) { void* p = malloc(size); std::cout << "Allocated " << size << " bytes at " << p << std::endl; return p; } void operator delete(void* p) noexcept { std::cout << "Freed memory at " << p << std::endl; free(p); }

Runtime Profiling

# Performance profiling perf record ./program perf report # Memory profiling valgrind --tool=massif ./program ms_print massif.out.* # Call graph profiling valgrind --tool=callgrind ./program kcachegrind callgrind.out.*

Optimizing Load Time

Preloading and Prelinking

# Preload commonly used libraries LD_PRELOAD=/lib/x86_64-linux-gnu/libc.so.6 ./program # Use prelinking (deprecated but educational) sudo prelink -a # Measure startup time time ./program perf stat ./program

Reducing Symbol Resolution

// Use -fvisibility=hidden by default // Only export necessary symbols class __attribute__((visibility("default"))) PublicAPI { // Exported }; class InternalClass { // Hidden };

Lazy Loading

// Load libraries only when needed class PluginManager { void* handle = nullptr; void load_if_needed() { if (!handle) { handle = dlopen("plugin.so", RTLD_LAZY); } } };

Platform-Specific Details

Linux-Specific

// Auxiliary vector (passed by kernel) #include <sys/auxv.h> void print_auxv() { std::cout << "Page size: " << getauxval(AT_PAGESZ) << std::endl; std::cout << "Entry point: " << std::hex << getauxval(AT_ENTRY) << std::endl; }

Windows Differences

// Windows uses PE format instead of ELF // DLLs instead of .so files // Different loader (ntdll.dll) #ifdef _WIN32 HMODULE handle = LoadLibrary("library.dll"); FARPROC func = GetProcAddress(handle, "function"); FreeLibrary(handle); #endif

Best Practices

  1. Minimize startup overhead

    • Reduce library dependencies
    • Use lazy initialization
    • Consider static linking for small utilities
  2. Optimize memory layout

    • Group related data
    • Consider cache lines
    • Use appropriate allocators
  3. Handle initialization order

    • Avoid global constructors
    • Use lazy initialization
    • Document dependencies
  4. Security considerations

    • Enable all security features
    • Validate input in constructors
    • Be careful with SUID programs
  5. Debug systematically

    • Use proper tools (gdb, valgrind)
    • Enable debug symbols
    • Understand the startup sequence

Conclusion

The journey from executable file to running process involves sophisticated kernel mechanisms, dynamic linking, memory management, and security features. Understanding these concepts helps you:

  • Debug runtime issues effectively
  • Optimize program startup and memory usage
  • Write more secure code
  • Understand system behavior

The loading and runtime phase is where your compiled and linked code finally comes to life, transformed from static bytes on disk to a dynamic process in memory.

References

  1. Linux Program Startup
  2. ELF Format Specification
  3. Glibc Startup
  4. Dynamic Linker Internals
  5. Virtual Memory in Linux
Abhik Sarkar

Abhik Sarkar

Machine Learning Consultant specializing in Computer Vision and Deep Learning. Leading ML teams and building innovative solutions.

Share this article

If you found this article helpful, consider sharing it with your network

Mastodon