C++ Loading and Runtime: From Executable to Process

Introduction

When you type ./program, a complex dance begins. The kernel loads your executable, maps it into memory, resolves dynamic libraries, and transfers control to your code. This article explores the journey from executable file to running process with interactive visualizations.

The Loading Process

The loading process involves several key steps:

Kernel reads the executable file
Creates a new process
Maps executable into memory
Loads dynamic linker
Dynamic linker loads libraries
Transfers control to main()

ELF File Structure

Understanding the Executable and Linkable Format (ELF) is crucial for understanding loading.

ELF Components

// Simplified ELF structure
typedef struct {
    unsigned char e_ident[16];  // Magic number and other info
    uint16_t      e_type;       // Object file type
    uint16_t      e_machine;    // Architecture
    uint32_t      e_version;    // Object file version
    uint64_t      e_entry;      // Entry point virtual address
    uint64_t      e_phoff;      // Program header table offset
    uint64_t      e_shoff;      // Section header table offset
    // ... more fields
} Elf64_Ehdr;

typedef struct {
    uint32_t   p_type;     // Segment type
    uint32_t   p_flags;    // Segment flags
    uint64_t   p_offset;   // Segment file offset
    uint64_t   p_vaddr;    // Segment virtual address
    uint64_t   p_paddr;    // Segment physical address
    uint64_t   p_filesz;   // Segment size in file
    uint64_t   p_memsz;    // Segment size in memory
    uint64_t   p_align;    // Segment alignment
} Elf64_Phdr;

Examining ELF Files

# View ELF header
readelf -h program

# View program headers (for loading)
readelf -l program

# View section headers (for linking)
readelf -S program

# Hex dump of specific section
objdump -s -j .text program

# Disassemble
objdump -d program

Important Segments

LOAD           # Loadable segment (code/data)
DYNAMIC        # Dynamic linking information
INTERP         # Path to dynamic linker
GNU_STACK      # Stack permissions
GNU_RELRO      # Read-only after relocation

Process Memory Layout

Once loaded, a process has a well-defined memory layout.

Memory Regions

// High Address (0x7FFFFFFFFFFF on x86-64)
// ↓
// Kernel Space (not accessible)
// =====================================
// Stack (grows downward ↓)
//   - Function parameters
//   - Return addresses
//   - Local variables
// 
// Memory Mapping Region
//   - Shared libraries
//   - mmap allocations
//   - Thread stacks
//
// Heap (grows upward ↑)
//   - Dynamic allocations (new/malloc)
//
// BSS Segment
//   - Uninitialized global variables
//
// Data Segment
//   - Initialized global variables
//
// Text Segment
//   - Program code (read-only)
// ↓
// Low Address (0x400000 typical start)

Viewing Process Memory

# View memory mappings
cat /proc/<pid>/maps

# Or for current process
cat /proc/self/maps

# Example output:
# 00400000-00401000 r-xp  /path/to/program  # Text
# 00600000-00601000 r--p  /path/to/program  # Data
# 00601000-00602000 rw-p  /path/to/program  # Data
# 7fff00000000-7fff00021000 rw-p  [heap]
# 7ffff7a00000-7ffff7c00000 r-xp  /lib/libc.so.6
# 7ffffffde000-7ffffffff000 rw-p  [stack]

Memory Permissions

// Permission flags
// r = read
// w = write  
// x = execute
// p = private (copy-on-write)
// s = shared

// Changing permissions at runtime
#include <sys/mman.h>

void make_executable(void* addr, size_t len) {
    mprotect(addr, len, PROT_READ | PROT_EXEC);
}

Virtual Memory Management

Modern systems use virtual memory to provide isolation and flexibility.

Virtual to Physical Mapping

// Each process has its own virtual address space
// Virtual addresses are translated to physical addresses

// Page size (typically 4KB)
size_t page_size = sysconf(_SC_PAGESIZE);

// Allocate aligned memory
void* aligned = aligned_alloc(page_size, size);

// Memory-mapped files
int fd = open("file.dat", O_RDWR);
void* mapped = mmap(NULL, size, PROT_READ | PROT_WRITE, 
                    MAP_SHARED, fd, 0);

Page Faults and Demand Paging

// Pages are loaded on-demand
char* large_array = new char[1000000000];  // 1GB
// Memory not actually allocated yet!

large_array[0] = 'A';      // Page fault, page allocated
large_array[999999999] = 'Z';  // Another page fault

// View page faults
// cat /proc/<pid>/stat | awk '{print $10, $12}'
// Field 10: minor faults, Field 12: major faults

Program Startup Sequence

Before main() runs, extensive initialization occurs.

The Real Entry Point

// Not main(), but _start
// Simplified startup sequence:

_start:                    // Entry point from ELF header
    → __libc_start_main:   // glibc initialization
        → __libc_csu_init: // C startup
            → _init:       // Global constructors
            → main:        // Your code
            → _fini:       // Global destructors
        → exit:            // Cleanup and terminate

Before main()

// Global constructors run before main
class GlobalObject {
public:
    GlobalObject() {
        std::cout << "Before main!\n";
    }
    ~GlobalObject() {
        std::cout << "After main!\n";
    }
};

GlobalObject obj;  // Constructor runs before main()

// Using attributes
__attribute__((constructor))
void before_main() {
    std::cout << "Also before main!\n";
}

__attribute__((destructor))
void after_main() {
    std::cout << "After main returns!\n";
}

Initialization Order

// Within a translation unit: top to bottom
int a = 1;        // Initialized first
int b = a + 1;    // Initialized second

// Across translation units: undefined!
// file1.cpp
int x = get_value();  // When does this run?

// file2.cpp
int y = x + 1;        // Dangerous! x might not be initialized

// Solution: Use functions
int& get_x() {
    static int x = get_value();  // Initialized on first use
    return x;
}

Dynamic Linker/Loader

The dynamic linker (ld.so) handles runtime library loading.

How Dynamic Loading Works

# The kernel loads the program and notices INTERP segment
readelf -l program | grep INTERP
# [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]

# Kernel loads ld.so, which then:
# 1. Loads required shared libraries
# 2. Performs relocations
# 3. Runs initialization functions
# 4. Transfers control to program

Runtime Library Loading

#include <dlfcn.h>

// Load library at runtime
void* handle = dlopen("libplugin.so", RTLD_LAZY);
if (!handle) {
    std::cerr << "Error: " << dlerror() << std::endl;
    return;
}

// Get function pointer
typedef int (*func_t)(int);
func_t func = (func_t)dlsym(handle, "function_name");

if (!func) {
    std::cerr << "Symbol not found: " << dlerror() << std::endl;
    dlclose(handle);
    return;
}

// Call function
int result = func(42);

// Unload library
dlclose(handle);

LD Environment Variables

# Library search path
LD_LIBRARY_PATH=/custom/lib:$LD_LIBRARY_PATH ./program

# Preload libraries
LD_PRELOAD=./my_malloc.so ./program

# Debug dynamic linker
LD_DEBUG=all ./program 2>&1 | head -50

# LD_DEBUG options:
# bindings  - Symbol binding
# libs      - Library searching
# reloc     - Relocations
# symbols   - Symbol table processing
# all       - Everything

# Disable lazy binding
LD_BIND_NOW=1 ./program

# Show library dependencies
LD_TRACE_LOADED_OBJECTS=1 ./program  # Same as ldd

Stack and Heap

Understanding stack and heap behavior is crucial for C++ developers.

Stack Organization

void function(int param) {
    int local1 = 10;
    int local2 = 20;
    char buffer[100];
    // Stack frame:
    // [return address]
    // [saved rbp]
    // [local1]
    // [local2]
    // [buffer[0..99]]
    // [padding]
}

// View stack
#include <execinfo.h>
void print_backtrace() {
    void* array[10];
    size_t size = backtrace(array, 10);
    char** strings = backtrace_symbols(array, size);
    
    for (size_t i = 0; i < size; i++) {
        std::cout << strings[i] << std::endl;
    }
    free(strings);
}

Stack Overflow Detection

// Stack size limits
#include <sys/resource.h>

void check_stack_limit() {
    struct rlimit rl;
    getrlimit(RLIMIT_STACK, &rl);
    std::cout << "Stack limit: " << rl.rlim_cur << " bytes\n";
}

// Increase stack size
void increase_stack() {
    struct rlimit rl;
    getrlimit(RLIMIT_STACK, &rl);
    rl.rlim_cur = 16 * 1024 * 1024;  // 16MB
    setrlimit(RLIMIT_STACK, &rl);
}

// Guard pages detect overflow
// Accessing guard page triggers SIGSEGV

Heap Management

// Multiple heap allocation strategies
// 1. First-fit: Use first suitable block
// 2. Best-fit: Use smallest suitable block
// 3. Worst-fit: Use largest block (fragmentation)

// Modern allocators (ptmalloc, jemalloc, tcmalloc)
// use sophisticated strategies:
// - Thread-local caches
// - Size classes
// - Arena allocation

// Custom allocator
class PoolAllocator {
    char pool[1024 * 1024];  // 1MB pool
    size_t offset = 0;
    
public:
    void* allocate(size_t size) {
        if (offset + size > sizeof(pool))
            throw std::bad_alloc();
        void* ptr = pool + offset;
        offset += size;
        return ptr;
    }
    
    void deallocate(void* ptr) {
        // No-op for pool allocator
    }
};

Runtime Linking Mechanics

Understanding how symbols are resolved at runtime.

PLT/GOT in Action

; First call to printf
call printf@plt

; PLT stub for printf:
printf@plt:
    jmp *printf@got    ; Jump to GOT entry
    push $index        ; Push relocation index
    jmp .plt0          ; Jump to resolver

; After resolution, GOT contains actual address
; Subsequent calls go directly to printf

Symbol Interposition

// Override library functions
// interpose.cpp
#include <cstdio>

extern "C" int puts(const char* s) {
    printf("[INTERCEPTED] %s\n", s);
    return 0;
}

// Compile and use:
// g++ -shared -fPIC interpose.cpp -o interpose.so
// LD_PRELOAD=./interpose.so ./program

Weak Symbols and Runtime

// Weak symbol allows runtime override
__attribute__((weak))
void optional_feature() {
    printf("Default implementation\n");
}

// Can be overridden by LD_PRELOAD or linking

Thread Local Storage (TLS)

Each thread gets its own copy of TLS variables.

// Thread-local variable
thread_local int tls_var = 0;

// Or using __thread (C-style)
__thread int c_tls_var = 0;

void thread_function() {
    tls_var++;  // Each thread has its own copy
    std::cout << "Thread " << std::this_thread::get_id()
              << ": tls_var = " << tls_var << std::endl;
}

// TLS implementation:
// - Static TLS: Allocated at thread creation
// - Dynamic TLS: Allocated on first access

TLS Internals

// TLS is typically implemented using:
// 1. Thread pointer (FS segment on x86-64)
// 2. TLS descriptor
// 3. DTV (Dynamic Thread Vector)

// Access TLS variable (simplified):
mov %fs:tls_var@tpoff, %eax  // Read TLS variable

Security Features

Modern loaders implement various security features.

ASLR (Address Space Layout Randomization)

# Check if ASLR is enabled
cat /proc/sys/kernel/randomize_va_space
# 0 = disabled
# 1 = randomize stack and libs
# 2 = full randomization

# Disable for debugging
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

# Or for single execution
setarch x86_64 -R ./program

Stack Canaries

// Compiler adds canaries to detect buffer overflow
void vulnerable(char* input) {
    char buffer[64];
    strcpy(buffer, input);  // Potential overflow
    // Canary checked before return
}

// Compile with stack protection
// g++ -fstack-protector-all program.cpp

RELRO (Relocation Read-Only)

# Partial RELRO (default)
g++ program.cpp

# Full RELRO (GOT is read-only after relocation)
g++ -Wl,-z,relro,-z,now program.cpp

# Check RELRO status
checksec --file=program

NX Bit (No-Execute)

// Modern systems mark stack/heap as non-executable
// Prevents code injection attacks

// Check NX status
readelf -l program | grep GNU_STACK
# RW = NX enabled (good)
# RWE = NX disabled (bad)

Debugging Runtime Issues

Core Dumps

# Enable core dumps
ulimit -c unlimited

# Set core pattern
echo "core.%e.%p" | sudo tee /proc/sys/kernel/core_pattern

# Analyze core dump
gdb program core
(gdb) bt           # Backtrace
(gdb) info registers
(gdb) x/10x $rsp   # Examine stack

Memory Debugging

// Valgrind
// valgrind --leak-check=full ./program

// AddressSanitizer
// g++ -fsanitize=address -g program.cpp
// ./program

// Custom memory tracking
void* operator new(size_t size) {
    void* p = malloc(size);
    std::cout << "Allocated " << size << " bytes at " << p << std::endl;
    return p;
}

void operator delete(void* p) noexcept {
    std::cout << "Freed memory at " << p << std::endl;
    free(p);
}

Runtime Profiling

# Performance profiling
perf record ./program
perf report

# Memory profiling
valgrind --tool=massif ./program
ms_print massif.out.*

# Call graph profiling
valgrind --tool=callgrind ./program
kcachegrind callgrind.out.*

Optimizing Load Time

Preloading and Prelinking

# Preload commonly used libraries
LD_PRELOAD=/lib/x86_64-linux-gnu/libc.so.6 ./program

# Use prelinking (deprecated but educational)
sudo prelink -a

# Measure startup time
time ./program
perf stat ./program

Reducing Symbol Resolution

// Use -fvisibility=hidden by default
// Only export necessary symbols

class __attribute__((visibility("default"))) PublicAPI {
    // Exported
};

class InternalClass {
    // Hidden
};

Lazy Loading

// Load libraries only when needed
class PluginManager {
    void* handle = nullptr;
    
    void load_if_needed() {
        if (!handle) {
            handle = dlopen("plugin.so", RTLD_LAZY);
        }
    }
};

Platform-Specific Details

Linux-Specific

// Auxiliary vector (passed by kernel)
#include <sys/auxv.h>

void print_auxv() {
    std::cout << "Page size: " << getauxval(AT_PAGESZ) << std::endl;
    std::cout << "Entry point: " << std::hex 
              << getauxval(AT_ENTRY) << std::endl;
}

Windows Differences

// Windows uses PE format instead of ELF
// DLLs instead of .so files
// Different loader (ntdll.dll)

#ifdef _WIN32
    HMODULE handle = LoadLibrary("library.dll");
    FARPROC func = GetProcAddress(handle, "function");
    FreeLibrary(handle);
#endif

Best Practices

Minimize startup overhead
- Reduce library dependencies
- Use lazy initialization
- Consider static linking for small utilities
Optimize memory layout
- Group related data
- Consider cache lines
- Use appropriate allocators
Handle initialization order
- Avoid global constructors
- Use lazy initialization
- Document dependencies
Security considerations
- Enable all security features
- Validate input in constructors
- Be careful with SUID programs
Debug systematically
- Use proper tools (gdb, valgrind)
- Enable debug symbols
- Understand the startup sequence

Conclusion

The journey from executable file to running process involves sophisticated kernel mechanisms, dynamic linking, memory management, and security features. Understanding these concepts helps you:

Debug runtime issues effectively
Optimize program startup and memory usage
Write more secure code
Understand system behavior

The loading and runtime phase is where your compiled and linked code finally comes to life, transformed from static bytes on disk to a dynamic process in memory.