Sitemap
A visual representation of the site structure to help you navigate through the content.
Site Structure
Main landing page with introduction and recent articles
Learn more about me, my background, and expertise
My talks, presentations, and speaking engagements
Collection of articles I've written on various topics
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Article content
Research papers and publications
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Paper content
Interactive explanations of machine learning concepts
Deep dive into initramfs (initial RAM filesystem), understanding the Linux boot process, early userspace, and how the kernel transitions from boot to the real root filesystem.
Master the Linux kernel architecture through interactive visualizations. Explore kernel layers, memory management, process scheduling, VFS, and the complete boot process.
Explore the inner workings of RAM through beautiful animations and interactive visualizations. Understand memory cells, addressing, and the memory hierarchy.
How Python compiles source code to bytecode and executes it
3D-stacked DRAM architecture providing massive bandwidth for GPUs and AI accelerators
Master GPU memory hierarchy from registers to global memory, understand coalescing patterns, bank conflicts, and optimization strategies for maximum performance
Explore how filesystems work through beautiful interactive visualizations. From the VFS abstraction layer to modern CoW filesystems like Btrfs and ZFS. Understand the magic behind storing and retrieving your data.
How CPython manages memory with PyMalloc, object pools, and reference counting
Unified virtual address space enabling seamless CPU-GPU memory sharing with automatic page migration
Discover inodes through interactive visualizations—the invisible data structures that track everything about your files except their names. Learn why running out of inodes means no new files, even with free space!
Understanding Python's GIL, its impact on multithreading, and workarounds
Understanding virtual memory page migration, fault handling, and TLB management in CPU-GPU systems
Understand FUSE (Filesystem in Userspace) - the framework that lets you implement filesystems without writing kernel code. Learn how NTFS, SSHFS, and cloud storage work on Linux.
Understanding PyObject, type system, and how Python objects work internally
Deep dive into ext4 (fourth extended filesystem) - the default filesystem for most Linux distributions. Learn about journaling, extents, and why ext4 remains the reliable choice.
How Python handles memory cleanup with reference counting and cyclic GC
Understand NTFS (New Technology File System) and how Linux provides support through NTFS-3G FUSE driver. Learn about MFT, alternate data streams, and cross-platform challenges.
Comprehensive exploration of CPU pipeline stages, hazards, superscalar, and out-of-order execution
Performance optimization strategies and CPython optimizations
Learn representations by pulling similar samples together and pushing dissimilar ones apart
Explore Btrfs (B-tree filesystem) - the modern Linux filesystem with built-in snapshots, RAID, compression, and advanced features for data integrity and flexibility.
Align embeddings across languages for multilingual understanding
Deep dive into ZFS (Zettabyte File System) - the most advanced filesystem with unmatched data integrity, pooled storage, snapshots, and enterprise features.
Deep dive into the differences between green threads (user-space threads) and OS threads (kernel threads), with interactive visualizations showing scheduling, context switching, and performance implications.
Adapt embeddings from source to target domains while preserving knowledge
Explore XFS - the high-performance 64-bit journaling filesystem optimized for large files and parallel I/O. Learn why it excels at handling massive data workloads.
Deep dive into Python's asyncio library, understanding event loops, coroutines, tasks, and async/await patterns with interactive visualizations.
Ultra-compact 1-bit representations for massive-scale retrieval
Understand FAT32 and exFAT - the universal filesystems for cross-platform compatibility. Learn their limitations, use cases, and why they remain essential for removable media.
Deep dive into Translation Lookaside Buffers - the critical cache that makes virtual memory fast. Interactive visualizations of address translation, page walks, and TLB management.
Combining sparse and dense retrieval for optimal search performance
Master RAID storage through interactive visualizations. Understand RAID 0, 1, 5, 6, and 10 - how they work, when to use them, and what happens during disk failures.
Explore how memory controllers orchestrate data flow between CPU and RAM. Interactive visualizations of channels, ranks, banks, and the complex scheduling that maximizes memory bandwidth.
Probabilistic ranking function for information retrieval with term frequency saturation
Master Linux process management through interactive visualizations. Understand process lifecycle, fork/exec operations, zombies, orphans, and CPU scheduling.
Explore Linux memory management through interactive visualizations. Understand virtual memory, page tables, TLB, swapping, and memory allocation.
Understand Linux system calls through interactive visualizations. Learn how user programs communicate with the kernel, protection rings, and syscall performance.
Master the Linux networking stack through interactive visualizations. Understand TCP/IP layers, sockets, iptables, routing, and network namespaces.
Understand the Linux boot process through interactive visualizations. Learn about BIOS/UEFI, bootloaders, kernel initialization, and the journey to userspace.
Compare Linux init systems through interactive visualizations. Understand the evolution from SysV Init to systemd, service management, and boot orchestration.
Master Linux kernel modules through interactive visualizations. Learn how to load, unload, develop, and debug kernel modules that extend Linux functionality.
Understanding Python's __slots__ for memory optimization and faster attribute access
Essential calculus concepts for understanding gradients, optimization, and backpropagation
Explore Flynn's Classification of computer architectures through interactive visualizations of SISD, SIMD, MISD, and MIMD systems.
Master thread safety concepts through interactive visualizations of race conditions, mutexes, atomic operations, and deadlock scenarios.
Master binary search trees through interactive visualizations of insertions, deletions, rotations, and self-balancing algorithms like AVL and Red-Black trees.
Master hash tables through interactive visualizations of hash functions, collision resolution strategies, load factors, and performance characteristics.
Master the convolution operation through interactive visualizations of sliding windows, feature detection, and the mathematical mechanics behind convolutional neural networks.
Understand cross-entropy loss through interactive visualizations of probability distributions, gradient flow, and its connection to maximum likelihood estimation.
Master dilated (atrous) convolutions through interactive visualizations of dilation rates, receptive field expansion, gridding artifacts, and applications in segmentation.
Understand Feature Pyramid Networks (FPN) through interactive visualizations of top-down pathways, lateral connections, and multi-scale object detection.
Explore how receptive fields grow through CNN layers with interactive visualizations of effective vs theoretical fields, architecture comparisons, and pixel contributions.
Explore the latent space of Variational Autoencoders through interactive visualizations of encoding, decoding, interpolation, and the reparameterization trick.
Explore virtual memory management through interactive visualizations of page tables, TLB operations, page faults, and memory mapping.
Explore CPU pipeline stages, instruction-level parallelism, pipeline hazards, and branch prediction through interactive visualizations.
Master pipeline hazards through interactive visualizations of data dependencies, control hazards, structural conflicts, and advanced detection mechanisms.
Explore how CPU cache lines work, understand spatial locality, and see why memory access patterns dramatically impact performance through interactive visualizations.
Understand how different memory access patterns impact cache performance, prefetcher efficiency, and overall application speed through interactive visualizations.
Explore NUMA (Non-Uniform Memory Access) architecture, understanding how modern multi-socket systems manage memory locality and the performance implications of local vs remote memory access.
Explore the concept of CUDA contexts, their role in managing GPU resources, and how they enable parallel execution across multiple CPU threads.
Learn how the CLS token acts as a global information aggregator in Vision Transformers, enabling whole-image classification through attention mechanisms.
Explore how hierarchical attention enables Vision Transformers (ViT) to process sequential data by encoding relative positions.
Explore how multi-head attention enables Vision Transformers (ViT) to process sequential data by encoding relative positions.
Explore how positional embeddings enable Vision Transformers (ViT) to process sequential data by encoding relative positions.
Interactively explore how self-attention allows Vision Transformers (ViT) to understand images by capturing global context. Click, explore, and see how it differs from CNNs.
Understand cross-attention, the mechanism that enables transformers to align and fuse information from different sources, sequences, or modalities.
Learn how masked attention enables autoregressive generation and prevents information leakage in transformers, essential for language models and sequential generation.
Master the fundamental building block of transformers - scaled dot-product attention. Learn why scaling is crucial and how the mechanism enables parallel computation.
Compare all approximate nearest neighbor algorithms side-by-side: HNSW, IVF-PQ, LSH, Annoy, and ScaNN. Find the best approach for your use case.
Interactive visualization of HNSW - the graph-based algorithm that powers modern vector search with logarithmic complexity.
Explore the fundamental data structures powering vector databases: trees, graphs, hash tables, and hybrid approaches for efficient similarity search.
Learn how IVF-PQ combines clustering and compression to enable billion-scale vector search with minimal memory footprint.
Explore how LSH uses probabilistic hash functions to find similar vectors in sub-linear time, perfect for streaming and high-dimensional data.
Master vector compression techniques from scalar to product quantization. Learn how to reduce memory usage by 10-100× while preserving search quality.
Understanding adaptive tiling in vision transformers - a technique that dynamically adjusts image partitioning based on complexity to optimize token usage while preserving detail.
Understanding emergent abilities in large language models - sudden capabilities that appear at scale thresholds, from arithmetic to reasoning and self-reflection.
Master the art of prompt engineering - from basic composition to advanced techniques like Chain-of-Thought and Tree-of-Thoughts.
Deep dive into how different prompt components influence model behavior across transformer layers, from surface patterns to abstract reasoning.
Understanding neural scaling laws - the power law relationships between model size, data, compute, and performance that govern AI capabilities and guide development decisions.
Understanding how AI models analyze visual complexity to optimize processing - measuring entropy, edge density, saliency, and texture for intelligent resource allocation.
Understand the fundamental differences between independent and joint encoding architectures for neural retrieval systems.
Interactive visualization of high-dimensional vector spaces, word relationships, and semantic arithmetic operations.
Learn about nested representations that enable flexible dimension reduction without retraining models.
Explore ColBERT and other multi-vector retrieval models that use fine-grained token-level matching for superior search quality.
Explore memory-accuracy trade-offs in embedding quantization from float32 to binary representations.
Compare lexical (BM25/TF-IDF) and semantic (BERT) retrieval approaches, understanding their trade-offs and hybrid strategies.
Interactive visualization of context window mechanisms in LLMs - sliding windows, expanding contexts, and attention patterns that define what models can "remember".
Interactive visualization of Flash Attention - the breakthrough algorithm that makes attention memory-efficient through tiling, recomputation, and kernel fusion.
Interactive visualization of key-value caching in LLMs - how caching transformer attention states enables efficient text generation without quadratic recomputation.
Interactive exploration of tokenization methods in LLMs - BPE, SentencePiece, and WordPiece. Understand how text becomes tokens that models can process.
Exploring LoRA, adapters, and other parameter-efficient methods for fine-tuning large vision-language models.
Understanding long polling - an efficient approach where the server holds requests open until data is available.
Understanding short polling - a simple but inefficient approach to fetching data at regular intervals.
Visualize eigenvalues and eigenvectors - key concepts for PCA, spectral methods, and matrix analysis.
Visualize gradient descent optimization - how neural networks learn by following gradients.
Essential linear algebra concepts for machine learning with interactive visualizations
Understand how memory interleaving distributes addresses across multiple banks to enable parallel access, dramatically improving memory bandwidth in modern systems from DDR5 to GPU memory.
Exploring the challenge of aligning visual and textual representations in multimodal AI systems.
Understanding the fundamental separation between visual and textual representations in multimodal models.
Understanding how vision-language models scale with data, parameters, and compute following empirical power laws.
Understanding different client-server communication patterns - from simple polling to real-time WebSocket connections.
Side-by-side comparison of Short Polling, Long Polling, and WebSockets to help you choose the right protocol for your application.
Understanding WebSockets - the protocol that enables full-duplex communication channels over a single TCP connection.
Explore how C++ code is parsed into an Abstract Syntax Tree with interactive visualizations.
Understand the complete C++ compilation pipeline from source code to object files.
Learn classic design patterns implemented in modern C++. Explore Singleton, Observer, Factory, and Strategy patterns with interactive examples.
Master dynamic linking and runtime library loading with interactive visualizations.
Understand how object files are linked together to create executables.
Understand how C++ programs are loaded and executed by the operating system.
Learn Resource Acquisition Is Initialization (RAII) - the cornerstone of C++ memory management. Understand automatic resource cleanup and exception safety.
Explore modern C++ features including auto, lambdas, ranges, and coroutines. Learn how C++11/14/17/20 transformed the language.
Master C++ OOP concepts including inheritance, polymorphism, virtual functions, and modern object-oriented design principles with interactive examples.
Discover how compilers optimize your C++ code through various transformation techniques with interactive demos.
Master C++ pointers and references through interactive visualizations. Learn memory addressing, dereferencing, smart pointers, and avoid common pitfalls.
Master the C++ preprocessor with interactive visualizations of macros, includes, and conditional compilation.
Master C++11 smart pointers through interactive examples. Learn unique_ptr, shared_ptr, and weak_ptr with reference counting visualizations.
Understand stack and heap memory allocation with interactive visualizations.
Master C++ templates and the Standard Template Library. Learn generic programming, template metaprogramming, and STL containers and algorithms.
Learn how the linker resolves symbols and fixes undefined references with interactive visualizations.
Understanding how gradients propagate through deep neural networks and the vanishing/exploding gradient problems.
Understand vectors and matrices - the fundamental data structures in machine learning.
Advanced framework for intelligent token allocation in vision transformers based on visual complexity metrics
Understanding NVIDIA's specialized matrix multiplication hardware for AI workloads
Understanding layer normalization technique that normalizes inputs across features, making it ideal for sequence models and transformers.
Understanding the distribution shift problem in deep neural networks that batch normalization solves.
Understanding batch normalization technique that normalizes inputs to accelerate training and improve neural network performance.
Understanding skip connections, residual blocks, and their crucial role in training deep neural networks.
Deep dive into C++ virtual tables (vtables), virtual dispatch mechanism, inheritance types, and object memory layout
Understanding CPU cycles, memory hierarchy, cache optimization, and performance analysis techniques
Adaptive attention-based aggregation for graph neural networks - multi-head attention, learned weights, and interpretable graph learning
Understanding node importance through centrality measures, shortest paths, hop distances, clustering coefficients, and fundamental graph metrics
Deep dive into Graph Convolutional Networks - spectral graph theory, message passing, aggregation mechanisms, and applications in node classification and graph learning
Learning low-dimensional vector representations of graphs through random walks, DeepWalk, Node2Vec, and skip-gram models
Hierarchical graph coarsening techniques - TopK, SAGPool, DiffPool, and readout operations for graph-level representations
Understanding sparse mixture of experts models - architecture, routing mechanisms, load balancing, and efficient scaling strategies for large language models
Deep dive into the fundamental processing unit of modern GPUs - the Streaming Multiprocessor architecture, execution model, and memory hierarchy
Tools, software, and hardware I use
My professional experience and qualifications
A curated collection of articles and resources I find valuable
Services and consulting offerings
Confirmation page after form submissions
Visual representation of the site structure