Linux Networking Stack: From Packets to Applications

25 min

Master the Linux networking stack through interactive visualizations. Understand TCP/IP layers, sockets, iptables, routing, and network namespaces.

Best viewed on desktop for optimal interactive experience

The Network Highway of Linux

Every time you browse a website, SSH into a server, or stream a video, data flows through the intricate Linux networking stack. This isn't just about moving bytes from A to B - it's a sophisticated system of layers, protocols, filters, and routing decisions that happen millions of times per second.

Imagine the networking stack as a multi-level postal system. Your application writes a letter (data), which gets wrapped in envelopes (headers) at each level - TCP envelope, IP envelope, Ethernet envelope. Each post office (network layer) adds routing information, and security checkpoints (iptables) inspect and filter packages. Finally, the network card (physical layer) converts everything to electrical signals or radio waves.

Let's explore this fascinating journey from application to wire and back.

Interactive Networking Visualization

Explore the complete Linux networking stack, from layers to sockets to packet filtering:

Network Stack Layer Traversal

OSI Model

7
Application
HTTP, FTP, SSH
6
Presentation
SSL/TLS, Compression
5
Session
NetBIOS, SQL
4
Transport
TCP, UDP
3
Network
IP, ICMP, ARP
2
Data Link
Ethernet, WiFi
1
Physical
Cables, Radio

TCP/IP Model

4
Application
HTTP, FTP, SSH, DNS
3
Transport
TCP, UDP
2
Internet
IP, ICMP, ARP
1
Link
Ethernet, WiFi, PPP
Packet Structure
EthIPTCPHTTP Data
14B + 20B + 20B + Payload

The TCP/IP Stack

Layer Architecture

Linux implements a hybrid TCP/IP model:

Application Layer [Userspace] ↓ System Calls ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Transport Layer [Kernel] Network Layer [Kernel] Data Link Layer [Kernel] Physical Layer [Hardware]

Packet Journey Through the Stack

// Sending data (top-down) Application: write(socket, "Hello", 5) Transport: Add TCP header (port, seq, ack) Network: Add IP header (src/dst IP) Data Link: Add Ethernet header (MAC addresses) Physical: Convert to electrical signals // Receiving data (bottom-up) Physical: Electrical signals → bits Data Link: Verify Ethernet checksum, extract payload Network: Check IP address, route if needed Transport: Verify TCP checksum, reassemble segments Application: read(socket, buffer, size)

Socket Programming

Socket Types

// TCP Socket (reliable, stream-oriented) int tcp_sock = socket(AF_INET, SOCK_STREAM, 0); // UDP Socket (unreliable, datagram-oriented) int udp_sock = socket(AF_INET, SOCK_DGRAM, 0); // Raw Socket (direct access to IP layer) int raw_sock = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP); // Unix Domain Socket (local IPC) int unix_sock = socket(AF_UNIX, SOCK_STREAM, 0); // Netlink Socket (kernel communication) int netlink_sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);

TCP Server Implementation

#include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> int create_tcp_server(int port) { int server_fd, client_fd; struct sockaddr_in addr; // 1. Create socket server_fd = socket(AF_INET, SOCK_STREAM, 0); // 2. Set socket options int opt = 1; setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); // 3. Bind to address addr.sin_family = AF_INET; addr.sin_addr.s_addr = INADDR_ANY; addr.sin_port = htons(port); bind(server_fd, (struct sockaddr *)&addr, sizeof(addr)); // 4. Listen for connections listen(server_fd, 128); // Backlog of 128 // 5. Accept connections while (1) { client_fd = accept(server_fd, NULL, NULL); handle_client(client_fd); close(client_fd); } }

High-Performance I/O

// epoll for scalable I/O multiplexing int epfd = epoll_create1(0); struct epoll_event ev; ev.events = EPOLLIN | EPOLLET; // Edge-triggered ev.data.fd = socket_fd; epoll_ctl(epfd, EPOLL_CTL_ADD, socket_fd, &ev); // Event loop struct epoll_event events[MAX_EVENTS]; while (1) { int nfds = epoll_wait(epfd, events, MAX_EVENTS, -1); for (int i = 0; i < nfds; i++) { if (events[i].events & EPOLLIN) { handle_input(events[i].data.fd); } } } // io_uring for async I/O (Linux 5.1+) struct io_uring ring; io_uring_queue_init(256, &ring, 0); struct io_uring_sqe *sqe = io_uring_get_sqe(&ring); io_uring_prep_read(sqe, fd, buffer, size, offset); io_uring_submit(&ring);

Netfilter and iptables

Netfilter Hooks

// Kernel netfilter hook points enum nf_inet_hooks { NF_INET_PRE_ROUTING, // After packet reception NF_INET_LOCAL_IN, // Before local delivery NF_INET_FORWARD, // Forwarded packets NF_INET_LOCAL_OUT, // Local packets going out NF_INET_POST_ROUTING // Before transmission }; // Register a netfilter hook static struct nf_hook_ops my_hook = { .hook = my_packet_filter, .pf = NFPROTO_IPV4, .hooknum = NF_INET_PRE_ROUTING, .priority = NF_IP_PRI_FIRST }; nf_register_net_hook(&init_net, &my_hook);

iptables Rules

# Basic firewall rules # Drop all incoming by default iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT ACCEPT # Allow established connections iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT # Allow loopback iptables -A INPUT -i lo -j ACCEPT # Allow SSH from specific network iptables -A INPUT -p tcp --dport 22 -s 192.168.1.0/24 -j ACCEPT # Allow HTTP/HTTPS iptables -A INPUT -p tcp -m multiport --dports 80,443 -j ACCEPT # Rate limiting iptables -A INPUT -p tcp --dport 22 -m recent --name ssh --set iptables -A INPUT -p tcp --dport 22 -m recent --name ssh \ --update --seconds 60 --hitcount 4 -j DROP # NAT/Masquerading iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE iptables -t nat -A PREROUTING -p tcp --dport 80 \ -j DNAT --to-destination 192.168.1.100:8080 # Connection tracking iptables -A INPUT -m conntrack --ctstate NEW -p tcp --dport 443 -j ACCEPT iptables -A INPUT -m conntrack --ctstate INVALID -j DROP # Logging iptables -A INPUT -j LOG --log-prefix "iptables-dropped: " --log-level 4

nftables (Modern Replacement)

# nftables - more efficient than iptables nft add table inet filter nft add chain inet filter input { type filter hook input priority 0; } # Add rules nft add rule inet filter input tcp dport 22 accept nft add rule inet filter input tcp dport {80, 443} accept nft add rule inet filter input ct state established,related accept nft add rule inet filter input drop # View rules nft list ruleset

Routing and Forwarding

Routing Tables

# View routing table ip route show # or traditional route -n # Add static route ip route add 10.0.0.0/8 via 192.168.1.1 dev eth0 # Add default gateway ip route add default via 192.168.1.1 # Policy-based routing # Create new routing table echo "200 custom" >> /etc/iproute2/rt_tables # Add rules to use custom table ip rule add from 192.168.2.0/24 table custom ip route add default via 10.0.0.1 table custom # Source routing ip route add 10.0.0.0/8 src 192.168.1.100 # Multipath routing ip route add default \ nexthop via 192.168.1.1 weight 1 \ nexthop via 192.168.2.1 weight 2

IP Forwarding

# Enable IP forwarding echo 1 > /proc/sys/net/ipv4/ip_forward # or sysctl -w net.ipv4.ip_forward=1 # Enable selective forwarding sysctl -w net.ipv4.conf.eth0.forwarding=1 # Router advertisement (IPv6) sysctl -w net.ipv6.conf.all.forwarding=1 sysctl -w net.ipv6.conf.eth0.accept_ra=2

Network Namespaces

Creating Isolated Networks

# Create namespace ip netns add myns # Execute in namespace ip netns exec myns ip link show # Create veth pair ip link add veth0 type veth peer name veth1 # Move one end to namespace ip link set veth1 netns myns # Configure interfaces ip addr add 10.0.0.1/24 dev veth0 ip link set veth0 up ip netns exec myns ip addr add 10.0.0.2/24 dev veth1 ip netns exec myns ip link set veth1 up # Test connectivity ping 10.0.0.2

Container Networking

# Docker-style networking # Create bridge ip link add br0 type bridge ip addr add 172.17.0.1/16 dev br0 ip link set br0 up # For each container: # 1. Create veth pair ip link add veth0 type veth peer name eth0 # 2. Attach to bridge ip link set veth0 master br0 ip link set veth0 up # 3. Move to container namespace ip link set eth0 netns container1 ip netns exec container1 ip addr add 172.17.0.2/16 dev eth0 ip netns exec container1 ip link set eth0 up ip netns exec container1 ip route add default via 172.17.0.1 # Enable NAT for containers iptables -t nat -A POSTROUTING -s 172.17.0.0/16 ! -o br0 -j MASQUERADE

Performance Tuning

TCP Tuning

# TCP buffer sizes sysctl -w net.core.rmem_max=134217728 sysctl -w net.core.wmem_max=134217728 sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728" sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728" # TCP congestion control sysctl -w net.ipv4.tcp_congestion_control=bbr sysctl -w net.core.default_qdisc=fq # Connection tracking sysctl -w net.netfilter.nf_conntrack_max=1000000 sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=86400 # TCP keepalive sysctl -w net.ipv4.tcp_keepalive_time=120 sysctl -w net.ipv4.tcp_keepalive_probes=3 sysctl -w net.ipv4.tcp_keepalive_intvl=30 # Fast recovery sysctl -w net.ipv4.tcp_early_retrans=1 sysctl -w net.ipv4.tcp_thin_linear_timeouts=1

Network Interface Tuning

# Increase ring buffer ethtool -G eth0 rx 4096 tx 4096 # Enable offloading ethtool -K eth0 gso on gro on tso on # CPU affinity for interrupts echo 2 > /proc/irq/24/smp_affinity # Bind to CPU 2 # Receive Packet Steering (RPS) echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus # Transmit Packet Steering (XPS) echo 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus

Network Monitoring

Essential Tools

# Connection monitoring ss -tunap # All connections ss -lt # Listening TCP netstat -tunlp # Traditional alternative # Packet capture tcpdump -i eth0 -w capture.pcap tcpdump -i any 'tcp port 80' tcpdump -i eth0 'host 192.168.1.1' # Traffic monitoring iftop -i eth0 # Real-time bandwidth nethogs # Per-process bandwidth iptraf-ng # Detailed statistics # Network discovery nmap -sn 192.168.1.0/24 # Ping scan arp-scan -l # ARP scan # Performance testing iperf3 -s # Server mode iperf3 -c server # Client mode

BPF and XDP

// XDP program for high-speed packet filtering SEC("xdp") int xdp_filter(struct xdp_md *ctx) { void *data_end = (void *)(long)ctx->data_end; void *data = (void *)(long)ctx->data; struct ethhdr *eth = data; if ((void *)eth + sizeof(*eth) > data_end) return XDP_DROP; // Drop non-IP packets if (eth->h_proto != htons(ETH_P_IP)) return XDP_DROP; return XDP_PASS; }

Security Considerations

Network Hardening

# Disable source routing sysctl -w net.ipv4.conf.all.accept_source_route=0 sysctl -w net.ipv6.conf.all.accept_source_route=0 # Enable SYN cookies sysctl -w net.ipv4.tcp_syncookies=1 # Ignore ICMP redirects sysctl -w net.ipv4.conf.all.accept_redirects=0 sysctl -w net.ipv6.conf.all.accept_redirects=0 # Enable reverse path filtering sysctl -w net.ipv4.conf.all.rp_filter=1 # Log martians sysctl -w net.ipv4.conf.all.log_martians=1

Common Issues and Solutions

Connection Refused

# Check if service is listening ss -tlnp | grep :80 # Check firewall rules iptables -L -n -v # Check SELinux getenforce semanage port -l | grep http

Slow Network

# Check for packet loss ping -c 100 google.com | grep loss # Check MTU issues ping -M do -s 1472 google.com # Check DNS resolution dig google.com nslookup google.com # Check route traceroute google.com mtr google.com # Better alternative

Best Practices

  1. Use connection pooling - Reuse TCP connections
  2. Enable TCP keepalive - Detect dead connections
  3. Tune buffer sizes - Based on bandwidth-delay product
  4. Use appropriate socket options - SO_REUSEADDR, TCP_NODELAY
  5. Implement proper error handling - Check all return values
  6. Monitor connection states - Watch for TIME_WAIT accumulation
  7. Use modern APIs - epoll, io_uring for scalability
  8. Secure by default - Whitelist approach for firewall rules

Conclusion

The Linux networking stack is a marvel of engineering, efficiently handling everything from simple pings to massive data center traffic. Through our interactive visualizations, you've seen how packets flow through layers, how iptables filters traffic, and how sockets connect applications to the network.

Understanding this stack empowers you to build high-performance network applications, debug complex connectivity issues, and secure your systems against network attacks. Remember: every web request, every SSH session, and every packet traverses this intricate system.

Next: Boot Process → ← Back to System Calls

If you found this explanation helpful, consider sharing it with others.

Mastodon