Linux Networking Stack: From Packets to Applications
Master the Linux networking stack through interactive visualizations. Understand TCP/IP layers, sockets, iptables, routing, and network namespaces.
Best viewed on desktop for optimal interactive experience
The Network Highway of Linux
Every time you browse a website, SSH into a server, or stream a video, data flows through the intricate Linux networking stack. This isn't just about moving bytes from A to B - it's a sophisticated system of layers, protocols, filters, and routing decisions that happen millions of times per second.
Imagine the networking stack as a multi-level postal system. Your application writes a letter (data), which gets wrapped in envelopes (headers) at each level - TCP envelope, IP envelope, Ethernet envelope. Each post office (network layer) adds routing information, and security checkpoints (iptables) inspect and filter packages. Finally, the network card (physical layer) converts everything to electrical signals or radio waves.
Let's explore this fascinating journey from application to wire and back.
Interactive Networking Visualization
Explore the complete Linux networking stack, from layers to sockets to packet filtering:
Network Stack Layer Traversal
OSI Model
TCP/IP Model
Packet Structure
The TCP/IP Stack
Layer Architecture
Linux implements a hybrid TCP/IP model:
Application Layer [Userspace] ↓ System Calls ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Transport Layer [Kernel] ↓ Network Layer [Kernel] ↓ Data Link Layer [Kernel] ↓ Physical Layer [Hardware]
Packet Journey Through the Stack
// Sending data (top-down) Application: write(socket, "Hello", 5) ↓ Transport: Add TCP header (port, seq, ack) ↓ Network: Add IP header (src/dst IP) ↓ Data Link: Add Ethernet header (MAC addresses) ↓ Physical: Convert to electrical signals // Receiving data (bottom-up) Physical: Electrical signals → bits ↓ Data Link: Verify Ethernet checksum, extract payload ↓ Network: Check IP address, route if needed ↓ Transport: Verify TCP checksum, reassemble segments ↓ Application: read(socket, buffer, size)
Socket Programming
Socket Types
// TCP Socket (reliable, stream-oriented) int tcp_sock = socket(AF_INET, SOCK_STREAM, 0); // UDP Socket (unreliable, datagram-oriented) int udp_sock = socket(AF_INET, SOCK_DGRAM, 0); // Raw Socket (direct access to IP layer) int raw_sock = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP); // Unix Domain Socket (local IPC) int unix_sock = socket(AF_UNIX, SOCK_STREAM, 0); // Netlink Socket (kernel communication) int netlink_sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
TCP Server Implementation
#include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> int create_tcp_server(int port) { int server_fd, client_fd; struct sockaddr_in addr; // 1. Create socket server_fd = socket(AF_INET, SOCK_STREAM, 0); // 2. Set socket options int opt = 1; setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); // 3. Bind to address addr.sin_family = AF_INET; addr.sin_addr.s_addr = INADDR_ANY; addr.sin_port = htons(port); bind(server_fd, (struct sockaddr *)&addr, sizeof(addr)); // 4. Listen for connections listen(server_fd, 128); // Backlog of 128 // 5. Accept connections while (1) { client_fd = accept(server_fd, NULL, NULL); handle_client(client_fd); close(client_fd); } }
High-Performance I/O
// epoll for scalable I/O multiplexing int epfd = epoll_create1(0); struct epoll_event ev; ev.events = EPOLLIN | EPOLLET; // Edge-triggered ev.data.fd = socket_fd; epoll_ctl(epfd, EPOLL_CTL_ADD, socket_fd, &ev); // Event loop struct epoll_event events[MAX_EVENTS]; while (1) { int nfds = epoll_wait(epfd, events, MAX_EVENTS, -1); for (int i = 0; i < nfds; i++) { if (events[i].events & EPOLLIN) { handle_input(events[i].data.fd); } } } // io_uring for async I/O (Linux 5.1+) struct io_uring ring; io_uring_queue_init(256, &ring, 0); struct io_uring_sqe *sqe = io_uring_get_sqe(&ring); io_uring_prep_read(sqe, fd, buffer, size, offset); io_uring_submit(&ring);
Netfilter and iptables
Netfilter Hooks
// Kernel netfilter hook points enum nf_inet_hooks { NF_INET_PRE_ROUTING, // After packet reception NF_INET_LOCAL_IN, // Before local delivery NF_INET_FORWARD, // Forwarded packets NF_INET_LOCAL_OUT, // Local packets going out NF_INET_POST_ROUTING // Before transmission }; // Register a netfilter hook static struct nf_hook_ops my_hook = { .hook = my_packet_filter, .pf = NFPROTO_IPV4, .hooknum = NF_INET_PRE_ROUTING, .priority = NF_IP_PRI_FIRST }; nf_register_net_hook(&init_net, &my_hook);
iptables Rules
# Basic firewall rules # Drop all incoming by default iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT ACCEPT # Allow established connections iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT # Allow loopback iptables -A INPUT -i lo -j ACCEPT # Allow SSH from specific network iptables -A INPUT -p tcp --dport 22 -s 192.168.1.0/24 -j ACCEPT # Allow HTTP/HTTPS iptables -A INPUT -p tcp -m multiport --dports 80,443 -j ACCEPT # Rate limiting iptables -A INPUT -p tcp --dport 22 -m recent --name ssh --set iptables -A INPUT -p tcp --dport 22 -m recent --name ssh \ --update --seconds 60 --hitcount 4 -j DROP # NAT/Masquerading iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE iptables -t nat -A PREROUTING -p tcp --dport 80 \ -j DNAT --to-destination 192.168.1.100:8080 # Connection tracking iptables -A INPUT -m conntrack --ctstate NEW -p tcp --dport 443 -j ACCEPT iptables -A INPUT -m conntrack --ctstate INVALID -j DROP # Logging iptables -A INPUT -j LOG --log-prefix "iptables-dropped: " --log-level 4
nftables (Modern Replacement)
# nftables - more efficient than iptables nft add table inet filter nft add chain inet filter input { type filter hook input priority 0; } # Add rules nft add rule inet filter input tcp dport 22 accept nft add rule inet filter input tcp dport {80, 443} accept nft add rule inet filter input ct state established,related accept nft add rule inet filter input drop # View rules nft list ruleset
Routing and Forwarding
Routing Tables
# View routing table ip route show # or traditional route -n # Add static route ip route add 10.0.0.0/8 via 192.168.1.1 dev eth0 # Add default gateway ip route add default via 192.168.1.1 # Policy-based routing # Create new routing table echo "200 custom" >> /etc/iproute2/rt_tables # Add rules to use custom table ip rule add from 192.168.2.0/24 table custom ip route add default via 10.0.0.1 table custom # Source routing ip route add 10.0.0.0/8 src 192.168.1.100 # Multipath routing ip route add default \ nexthop via 192.168.1.1 weight 1 \ nexthop via 192.168.2.1 weight 2
IP Forwarding
# Enable IP forwarding echo 1 > /proc/sys/net/ipv4/ip_forward # or sysctl -w net.ipv4.ip_forward=1 # Enable selective forwarding sysctl -w net.ipv4.conf.eth0.forwarding=1 # Router advertisement (IPv6) sysctl -w net.ipv6.conf.all.forwarding=1 sysctl -w net.ipv6.conf.eth0.accept_ra=2
Network Namespaces
Creating Isolated Networks
# Create namespace ip netns add myns # Execute in namespace ip netns exec myns ip link show # Create veth pair ip link add veth0 type veth peer name veth1 # Move one end to namespace ip link set veth1 netns myns # Configure interfaces ip addr add 10.0.0.1/24 dev veth0 ip link set veth0 up ip netns exec myns ip addr add 10.0.0.2/24 dev veth1 ip netns exec myns ip link set veth1 up # Test connectivity ping 10.0.0.2
Container Networking
# Docker-style networking # Create bridge ip link add br0 type bridge ip addr add 172.17.0.1/16 dev br0 ip link set br0 up # For each container: # 1. Create veth pair ip link add veth0 type veth peer name eth0 # 2. Attach to bridge ip link set veth0 master br0 ip link set veth0 up # 3. Move to container namespace ip link set eth0 netns container1 ip netns exec container1 ip addr add 172.17.0.2/16 dev eth0 ip netns exec container1 ip link set eth0 up ip netns exec container1 ip route add default via 172.17.0.1 # Enable NAT for containers iptables -t nat -A POSTROUTING -s 172.17.0.0/16 ! -o br0 -j MASQUERADE
Performance Tuning
TCP Tuning
# TCP buffer sizes sysctl -w net.core.rmem_max=134217728 sysctl -w net.core.wmem_max=134217728 sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728" sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728" # TCP congestion control sysctl -w net.ipv4.tcp_congestion_control=bbr sysctl -w net.core.default_qdisc=fq # Connection tracking sysctl -w net.netfilter.nf_conntrack_max=1000000 sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=86400 # TCP keepalive sysctl -w net.ipv4.tcp_keepalive_time=120 sysctl -w net.ipv4.tcp_keepalive_probes=3 sysctl -w net.ipv4.tcp_keepalive_intvl=30 # Fast recovery sysctl -w net.ipv4.tcp_early_retrans=1 sysctl -w net.ipv4.tcp_thin_linear_timeouts=1
Network Interface Tuning
# Increase ring buffer ethtool -G eth0 rx 4096 tx 4096 # Enable offloading ethtool -K eth0 gso on gro on tso on # CPU affinity for interrupts echo 2 > /proc/irq/24/smp_affinity # Bind to CPU 2 # Receive Packet Steering (RPS) echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus # Transmit Packet Steering (XPS) echo 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus
Network Monitoring
Essential Tools
# Connection monitoring ss -tunap # All connections ss -lt # Listening TCP netstat -tunlp # Traditional alternative # Packet capture tcpdump -i eth0 -w capture.pcap tcpdump -i any 'tcp port 80' tcpdump -i eth0 'host 192.168.1.1' # Traffic monitoring iftop -i eth0 # Real-time bandwidth nethogs # Per-process bandwidth iptraf-ng # Detailed statistics # Network discovery nmap -sn 192.168.1.0/24 # Ping scan arp-scan -l # ARP scan # Performance testing iperf3 -s # Server mode iperf3 -c server # Client mode
BPF and XDP
// XDP program for high-speed packet filtering SEC("xdp") int xdp_filter(struct xdp_md *ctx) { void *data_end = (void *)(long)ctx->data_end; void *data = (void *)(long)ctx->data; struct ethhdr *eth = data; if ((void *)eth + sizeof(*eth) > data_end) return XDP_DROP; // Drop non-IP packets if (eth->h_proto != htons(ETH_P_IP)) return XDP_DROP; return XDP_PASS; }
Security Considerations
Network Hardening
# Disable source routing sysctl -w net.ipv4.conf.all.accept_source_route=0 sysctl -w net.ipv6.conf.all.accept_source_route=0 # Enable SYN cookies sysctl -w net.ipv4.tcp_syncookies=1 # Ignore ICMP redirects sysctl -w net.ipv4.conf.all.accept_redirects=0 sysctl -w net.ipv6.conf.all.accept_redirects=0 # Enable reverse path filtering sysctl -w net.ipv4.conf.all.rp_filter=1 # Log martians sysctl -w net.ipv4.conf.all.log_martians=1
Common Issues and Solutions
Connection Refused
# Check if service is listening ss -tlnp | grep :80 # Check firewall rules iptables -L -n -v # Check SELinux getenforce semanage port -l | grep http
Slow Network
# Check for packet loss ping -c 100 google.com | grep loss # Check MTU issues ping -M do -s 1472 google.com # Check DNS resolution dig google.com nslookup google.com # Check route traceroute google.com mtr google.com # Better alternative
Best Practices
- Use connection pooling - Reuse TCP connections
- Enable TCP keepalive - Detect dead connections
- Tune buffer sizes - Based on bandwidth-delay product
- Use appropriate socket options - SO_REUSEADDR, TCP_NODELAY
- Implement proper error handling - Check all return values
- Monitor connection states - Watch for TIME_WAIT accumulation
- Use modern APIs - epoll, io_uring for scalability
- Secure by default - Whitelist approach for firewall rules
Conclusion
The Linux networking stack is a marvel of engineering, efficiently handling everything from simple pings to massive data center traffic. Through our interactive visualizations, you've seen how packets flow through layers, how iptables filters traffic, and how sockets connect applications to the network.
Understanding this stack empowers you to build high-performance network applications, debug complex connectivity issues, and secure your systems against network attacks. Remember: every web request, every SSH session, and every packet traverses this intricate system.
Next: Boot Process → ← Back to System Calls