Linux Networking Stack: From Packets to Applications

The Network Highway of Linux

Every time you browse a website, SSH into a server, or stream a video, data flows through the intricate Linux networking stack. This isn't just about moving bytes from A to B - it's a sophisticated system of layers, protocols, filters, and routing decisions that happen millions of times per second.

Imagine the networking stack as a multi-level postal system. Your application writes a letter (data), which gets wrapped in envelopes (headers) at each level - TCP envelope, IP envelope, Ethernet envelope. Each post office (network layer) adds routing information, and security checkpoints (iptables) inspect and filter packages. Finally, the network card (physical layer) converts everything to electrical signals or radio waves.

Let's explore this fascinating journey from application to wire and back.

Interactive Networking Visualization

Explore the complete Linux networking stack, from layers to sockets to packet filtering:

Network Stack Layer Traversal

OSI Model

Application

HTTP, FTP, SSH

Presentation

SSL/TLS, Compression

Session

NetBIOS, SQL

Transport

TCP, UDP

Network

IP, ICMP, ARP

Data Link

Ethernet, WiFi

Physical

Cables, Radio

TCP/IP Model

Application

HTTP, FTP, SSH, DNS

Transport

TCP, UDP

Internet

IP, ICMP, ARP

Link

Ethernet, WiFi, PPP

Packet Structure

EthIPTCPHTTP Data

14B + 20B + 20B + Payload

The TCP/IP Stack

Layer Architecture

Linux implements a hybrid TCP/IP model:

Application Layer     [Userspace]
    ↓ System Calls
━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Transport Layer      [Kernel]
    ↓
Network Layer        [Kernel]
    ↓
Data Link Layer      [Kernel]
    ↓
Physical Layer       [Hardware]

Packet Journey Through the Stack

// Sending data (top-down)
Application: write(socket, "Hello", 5)
    ↓
Transport:   Add TCP header (port, seq, ack)
    ↓
Network:     Add IP header (src/dst IP)
    ↓
Data Link:   Add Ethernet header (MAC addresses)
    ↓
Physical:    Convert to electrical signals

// Receiving data (bottom-up)
Physical:    Electrical signals → bits
    ↓
Data Link:   Verify Ethernet checksum, extract payload
    ↓
Network:     Check IP address, route if needed
    ↓
Transport:   Verify TCP checksum, reassemble segments
    ↓
Application: read(socket, buffer, size)

Socket Programming

Socket Types

// TCP Socket (reliable, stream-oriented)
int tcp_sock = socket(AF_INET, SOCK_STREAM, 0);

// UDP Socket (unreliable, datagram-oriented)
int udp_sock = socket(AF_INET, SOCK_DGRAM, 0);

// Raw Socket (direct access to IP layer)
int raw_sock = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP);

// Unix Domain Socket (local IPC)
int unix_sock = socket(AF_UNIX, SOCK_STREAM, 0);

// Netlink Socket (kernel communication)
int netlink_sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);

TCP Server Implementation

#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int create_tcp_server(int port) {
    int server_fd, client_fd;
    struct sockaddr_in addr;
    
    // 1. Create socket
    server_fd = socket(AF_INET, SOCK_STREAM, 0);
    
    // 2. Set socket options
    int opt = 1;
    setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
    
    // 3. Bind to address
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = INADDR_ANY;
    addr.sin_port = htons(port);
    bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
    
    // 4. Listen for connections
    listen(server_fd, 128);  // Backlog of 128
    
    // 5. Accept connections
    while (1) {
        client_fd = accept(server_fd, NULL, NULL);
        handle_client(client_fd);
        close(client_fd);
    }
}

High-Performance I/O

// epoll for scalable I/O multiplexing
int epfd = epoll_create1(0);

struct epoll_event ev;
ev.events = EPOLLIN | EPOLLET;  // Edge-triggered
ev.data.fd = socket_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, socket_fd, &ev);

// Event loop
struct epoll_event events[MAX_EVENTS];
while (1) {
    int nfds = epoll_wait(epfd, events, MAX_EVENTS, -1);
    for (int i = 0; i < nfds; i++) {
        if (events[i].events & EPOLLIN) {
            handle_input(events[i].data.fd);
        }
    }
}

// io_uring for async I/O (Linux 5.1+)
struct io_uring ring;
io_uring_queue_init(256, &ring, 0);

struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_read(sqe, fd, buffer, size, offset);
io_uring_submit(&ring);

Netfilter and iptables

Netfilter Hooks

// Kernel netfilter hook points
enum nf_inet_hooks {
    NF_INET_PRE_ROUTING,   // After packet reception
    NF_INET_LOCAL_IN,      // Before local delivery
    NF_INET_FORWARD,       // Forwarded packets
    NF_INET_LOCAL_OUT,     // Local packets going out
    NF_INET_POST_ROUTING   // Before transmission
};

// Register a netfilter hook
static struct nf_hook_ops my_hook = {
    .hook = my_packet_filter,
    .pf = NFPROTO_IPV4,
    .hooknum = NF_INET_PRE_ROUTING,
    .priority = NF_IP_PRI_FIRST
};

nf_register_net_hook(&init_net, &my_hook);

iptables Rules

# Basic firewall rules
# Drop all incoming by default
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

# Allow established connections
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT

# Allow SSH from specific network
iptables -A INPUT -p tcp --dport 22 -s 192.168.1.0/24 -j ACCEPT

# Allow HTTP/HTTPS
iptables -A INPUT -p tcp -m multiport --dports 80,443 -j ACCEPT

# Rate limiting
iptables -A INPUT -p tcp --dport 22 -m recent --name ssh --set
iptables -A INPUT -p tcp --dport 22 -m recent --name ssh \
    --update --seconds 60 --hitcount 4 -j DROP

# NAT/Masquerading
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
iptables -t nat -A PREROUTING -p tcp --dport 80 \
    -j DNAT --to-destination 192.168.1.100:8080

# Connection tracking
iptables -A INPUT -m conntrack --ctstate NEW -p tcp --dport 443 -j ACCEPT
iptables -A INPUT -m conntrack --ctstate INVALID -j DROP

# Logging
iptables -A INPUT -j LOG --log-prefix "iptables-dropped: " --log-level 4

nftables (Modern Replacement)

# nftables - more efficient than iptables
nft add table inet filter
nft add chain inet filter input { type filter hook input priority 0; }

# Add rules
nft add rule inet filter input tcp dport 22 accept
nft add rule inet filter input tcp dport {80, 443} accept
nft add rule inet filter input ct state established,related accept
nft add rule inet filter input drop

# View rules
nft list ruleset

Routing and Forwarding

Routing Tables

# View routing table
ip route show
# or traditional
route -n

# Add static route
ip route add 10.0.0.0/8 via 192.168.1.1 dev eth0

# Add default gateway
ip route add default via 192.168.1.1

# Policy-based routing
# Create new routing table
echo "200 custom" >> /etc/iproute2/rt_tables

# Add rules to use custom table
ip rule add from 192.168.2.0/24 table custom
ip route add default via 10.0.0.1 table custom

# Source routing
ip route add 10.0.0.0/8 src 192.168.1.100

# Multipath routing
ip route add default \
    nexthop via 192.168.1.1 weight 1 \
    nexthop via 192.168.2.1 weight 2

IP Forwarding

# Enable IP forwarding
echo 1 > /proc/sys/net/ipv4/ip_forward
# or
sysctl -w net.ipv4.ip_forward=1

# Enable selective forwarding
sysctl -w net.ipv4.conf.eth0.forwarding=1

# Router advertisement (IPv6)
sysctl -w net.ipv6.conf.all.forwarding=1
sysctl -w net.ipv6.conf.eth0.accept_ra=2

Network Namespaces

Creating Isolated Networks

# Create namespace
ip netns add myns

# Execute in namespace
ip netns exec myns ip link show

# Create veth pair
ip link add veth0 type veth peer name veth1

# Move one end to namespace
ip link set veth1 netns myns

# Configure interfaces
ip addr add 10.0.0.1/24 dev veth0
ip link set veth0 up

ip netns exec myns ip addr add 10.0.0.2/24 dev veth1
ip netns exec myns ip link set veth1 up

# Test connectivity
ping 10.0.0.2

Container Networking

# Docker-style networking
# Create bridge
ip link add br0 type bridge
ip addr add 172.17.0.1/16 dev br0
ip link set br0 up

# For each container:
# 1. Create veth pair
ip link add veth0 type veth peer name eth0

# 2. Attach to bridge
ip link set veth0 master br0
ip link set veth0 up

# 3. Move to container namespace
ip link set eth0 netns container1
ip netns exec container1 ip addr add 172.17.0.2/16 dev eth0
ip netns exec container1 ip link set eth0 up
ip netns exec container1 ip route add default via 172.17.0.1

# Enable NAT for containers
iptables -t nat -A POSTROUTING -s 172.17.0.0/16 ! -o br0 -j MASQUERADE

Performance Tuning

TCP Tuning

# TCP buffer sizes
sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728
sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728"
sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728"

# TCP congestion control
sysctl -w net.ipv4.tcp_congestion_control=bbr
sysctl -w net.core.default_qdisc=fq

# Connection tracking
sysctl -w net.netfilter.nf_conntrack_max=1000000
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=86400

# TCP keepalive
sysctl -w net.ipv4.tcp_keepalive_time=120
sysctl -w net.ipv4.tcp_keepalive_probes=3
sysctl -w net.ipv4.tcp_keepalive_intvl=30

# Fast recovery
sysctl -w net.ipv4.tcp_early_retrans=1
sysctl -w net.ipv4.tcp_thin_linear_timeouts=1

Network Interface Tuning

# Increase ring buffer
ethtool -G eth0 rx 4096 tx 4096

# Enable offloading
ethtool -K eth0 gso on gro on tso on

# CPU affinity for interrupts
echo 2 > /proc/irq/24/smp_affinity  # Bind to CPU 2

# Receive Packet Steering (RPS)
echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus

# Transmit Packet Steering (XPS)
echo 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus

Network Monitoring

Essential Tools

# Connection monitoring
ss -tunap          # All connections
ss -lt             # Listening TCP
netstat -tunlp     # Traditional alternative

# Packet capture
tcpdump -i eth0 -w capture.pcap
tcpdump -i any 'tcp port 80'
tcpdump -i eth0 'host 192.168.1.1'

# Traffic monitoring
iftop -i eth0      # Real-time bandwidth
nethogs            # Per-process bandwidth
iptraf-ng          # Detailed statistics

# Network discovery
nmap -sn 192.168.1.0/24  # Ping scan
arp-scan -l              # ARP scan

# Performance testing
iperf3 -s          # Server mode
iperf3 -c server   # Client mode

BPF and XDP

// XDP program for high-speed packet filtering
SEC("xdp")
int xdp_filter(struct xdp_md *ctx) {
    void *data_end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;
    
    struct ethhdr *eth = data;
    if ((void *)eth + sizeof(*eth) > data_end)
        return XDP_DROP;
    
    // Drop non-IP packets
    if (eth->h_proto != htons(ETH_P_IP))
        return XDP_DROP;
    
    return XDP_PASS;
}

Security Considerations

Network Hardening

# Disable source routing
sysctl -w net.ipv4.conf.all.accept_source_route=0
sysctl -w net.ipv6.conf.all.accept_source_route=0

# Enable SYN cookies
sysctl -w net.ipv4.tcp_syncookies=1

# Ignore ICMP redirects
sysctl -w net.ipv4.conf.all.accept_redirects=0
sysctl -w net.ipv6.conf.all.accept_redirects=0

# Enable reverse path filtering
sysctl -w net.ipv4.conf.all.rp_filter=1

# Log martians
sysctl -w net.ipv4.conf.all.log_martians=1

Common Issues and Solutions

Connection Refused

# Check if service is listening
ss -tlnp | grep :80

# Check firewall rules
iptables -L -n -v

# Check SELinux
getenforce
semanage port -l | grep http

Slow Network

# Check for packet loss
ping -c 100 google.com | grep loss

# Check MTU issues
ping -M do -s 1472 google.com

# Check DNS resolution
dig google.com
nslookup google.com

# Check route
traceroute google.com
mtr google.com  # Better alternative

Best Practices

Use connection pooling - Reuse TCP connections
Enable TCP keepalive - Detect dead connections
Tune buffer sizes - Based on bandwidth-delay product
Use appropriate socket options - SO_REUSEADDR, TCP_NODELAY
Implement proper error handling - Check all return values
Monitor connection states - Watch for TIME_WAIT accumulation
Use modern APIs - epoll, io_uring for scalability
Secure by default - Whitelist approach for firewall rules

Conclusion

The Linux networking stack is a marvel of engineering, efficiently handling everything from simple pings to massive data center traffic. Through our interactive visualizations, you've seen how packets flow through layers, how iptables filters traffic, and how sockets connect applications to the network.

Understanding this stack empowers you to build high-performance network applications, debug complex connectivity issues, and secure your systems against network attacks. Remember: every web request, every SSH session, and every packet traverses this intricate system.

Next: Boot Process → ← Back to System Calls

Table of Contents

Network Stack Layer Traversal

OSI Model

TCP/IP Model

Packet Structure