Sitemap

A visual representation of the site structure to help you navigate through the content.

Site Structure

Main landing page with introduction and recent articles

About/about

Learn more about me, my background, and expertise

Speaking/speaking

My talks, presentations, and speaking engagements

Articles/articles

Collection of articles I've written on various topics

H264 implementation applications/articles/h264-implementation-applications

Article content

H264 transform quantization/articles/h264-transform-quantization

Article content

H264 fundamentals/articles/h264-fundamentals

Article content

Zettel/articles/zettel

Article content

Compiling pytorch kernel/articles/compiling-pytorch-kernel

Article content

View size not compatible/articles/view-size-not-compatible

Article content

Gpu boot errors/articles/gpu-boot-errors

Article content

H264 interactive guide/articles/h264-interactive-guide

Article content

Ggml structure/articles/ggml-structure

Article content

Quantization deep dive/articles/quantization-deep-dive

Article content

How tensorrt works/articles/how-tensorrt-works

Article content

Kernel fusion/articles/kernel-fusion

Article content

Visualizing yolov5/articles/visualizing-yolov5

Article content

Cpython internals/articles/cpython-internals

Article content

Cpp compilation process/articles/cpp-compilation-process

Article content

Cpp linking in depth/articles/cpp-linking-in-depth

Article content

Cpp loading runtime/articles/cpp-loading-runtime

Article content

Registry pattern/articles/registry-pattern

Article content

Magic numbers/articles/magic-numbers

Article content

Image encoding/articles/image-encoding

Article content

Text encoding/articles/text-encoding

Article content

Papers/papers

Research papers and publications

Visual instruction tuning/papers/visual-instruction-tuning

Paper content

Vit object detection/papers/vit-object-detection

Paper content

Yolo/papers/yolo

Paper content

Efficientnet/papers/efficientnet

Paper content

Faster rcnn/papers/faster-rcnn

Paper content

Sam/papers/sam

Paper content

DETR/papers/DETR

Paper content

Blip2/papers/blip2

Paper content

Image worth 16x16/papers/image-worth-16x16

Paper content

Optimizing transformer inference/papers/optimizing-transformer-inference

Paper content

Surf/papers/surf

Paper content

Swin transformer/papers/swin-transformer

Paper content

Clip/papers/clip

Paper content

Deeplearning go brr/papers/deeplearning-go-brr

Paper content

Attention is all you need/papers/attention-is-all-you-need

Paper content

Data movement transformer/papers/data-movement-transformer

Paper content

Deep residual learning/papers/deep-residual-learning

Paper content

Concepts/concepts

Interactive explanations of machine learning concepts

initramfs: The Initial RAM Filesystem Explained/concepts/linux/initramfs-boot-process

Deep dive into initramfs (initial RAM filesystem), understanding the Linux boot process, early userspace, and how the kernel transitions from boot to the real root filesystem.

Explore the inner workings of RAM through beautiful animations and interactive visualizations. Understand memory cells, addressing, and the memory hierarchy.

Python Bytecode Compilation/concepts/python/bytecode-compilation

How Python compiles source code to bytecode and executes it

High Bandwidth Memory (HBM)/concepts/gpu/hbm-memory

3D-stacked DRAM architecture providing massive bandwidth for GPUs and AI accelerators

Filesystems: The Digital DNA of Data Storage/concepts/linux/filesystems-overview

Explore how filesystems work through beautiful interactive visualizations. From the VFS abstraction layer to modern CoW filesystems like Btrfs and ZFS. Understand the magic behind storing and retrieving your data.

Python Memory Management/concepts/python/memory-management

How CPython manages memory with PyMalloc, object pools, and reference counting

CUDA Unified Memory/concepts/gpu/unified-memory

Unified virtual address space enabling seamless CPU-GPU memory sharing with automatic page migration

Discover inodes through interactive visualizations—the invisible data structures that track everything about your files except their names. Learn why running out of inodes means no new files, even with free space!

Global Interpreter Lock (GIL)/concepts/python/global-interpreter-lock

Understanding Python's GIL, its impact on multithreading, and workarounds

Page Migration & Fault Handling/concepts/gpu/page-migration

Understanding virtual memory page migration, fault handling, and TLB management in CPU-GPU systems

FUSE: Filesystem in Userspace Explained/concepts/linux/fuse-filesystem

Understand FUSE (Filesystem in Userspace) - the framework that lets you implement filesystems without writing kernel code. Learn how NTFS, SSHFS, and cloud storage work on Linux.

Python Object Model/concepts/python/object-model

Understanding PyObject, type system, and how Python objects work internally

ext4: The Linux Workhorse Filesystem/concepts/linux/ext4-filesystem

Deep dive into ext4 (fourth extended filesystem) - the default filesystem for most Linux distributions. Learn about journaling, extents, and why ext4 remains the reliable choice.

Python Garbage Collection/concepts/python/garbage-collection

How Python handles memory cleanup with reference counting and cyclic GC

NTFS on Linux: Windows Filesystem Support/concepts/linux/ntfs-filesystem

Understand NTFS (New Technology File System) and how Linux provides support through NTFS-3G FUSE driver. Learn about MFT, alternate data streams, and cross-platform challenges.

Python Optimization Techniques/concepts/python/python-optimization

Performance optimization strategies and CPython optimizations

Contrastive Learning/concepts/embeddings/contrastive-learning

Learn representations by pulling similar samples together and pushing dissimilar ones apart

Btrfs: Modern Copy-on-Write Filesystem/concepts/linux/btrfs-filesystem

Explore Btrfs (B-tree filesystem) - the modern Linux filesystem with built-in snapshots, RAID, compression, and advanced features for data integrity and flexibility.

Cross-Lingual Alignment/concepts/embeddings/cross-lingual-alignment

Align embeddings across languages for multilingual understanding

ZFS: The Ultimate Filesystem/concepts/linux/zfs-filesystem

Deep dive into ZFS (Zettabyte File System) - the most advanced filesystem with unmatched data integrity, pooled storage, snapshots, and enterprise features.

Green Threads vs OS Threads: Understanding Concurrency Models/concepts/python/green-threads-vs-os-threads

Deep dive into the differences between green threads (user-space threads) and OS threads (kernel threads), with interactive visualizations showing scheduling, context switching, and performance implications.

Domain Adaptation/concepts/embeddings/domain-adaptation

Adapt embeddings from source to target domains while preserving knowledge

XFS: High-Performance Filesystem/concepts/linux/xfs-filesystem

Explore XFS - the high-performance 64-bit journaling filesystem optimized for large files and parallel I/O. Learn why it excels at handling massive data workloads.

Python asyncio: Mastering Asynchronous Programming/concepts/python/asyncio-event-loop

Deep dive into Python's asyncio library, understanding event loops, coroutines, tasks, and async/await patterns with interactive visualizations.

Binary Embeddings/concepts/embeddings/binary-embeddings

Ultra-compact 1-bit representations for massive-scale retrieval

FAT32 & exFAT: Universal Filesystems/concepts/linux/fat-filesystems

Understand FAT32 and exFAT - the universal filesystems for cross-platform compatibility. Learn their limitations, use cases, and why they remain essential for removable media.

TLB: How CPUs Translate Virtual to Physical Memory/concepts/memory/tlb-translation-lookaside-buffer

Deep dive into Translation Lookaside Buffers - the critical cache that makes virtual memory fast. Interactive visualizations of address translation, page walks, and TLB management.

Hybrid Retrieval Systems/concepts/embeddings/hybrid-retrieval-systems

Combining sparse and dense retrieval for optimal search performance

Master RAID storage through interactive visualizations. Understand RAID 0, 1, 5, 6, and 10 - how they work, when to use them, and what happens during disk failures.

Memory Controllers: The Brain Behind RAM Management/concepts/memory/memory-controllers

Explore how memory controllers orchestrate data flow between CPU and RAM. Interactive visualizations of channels, ranks, banks, and the complex scheduling that maximizes memory bandwidth.

BM25 Algorithm/concepts/embeddings/bm25-algorithm

Probabilistic ranking function for information retrieval with term frequency saturation

Linux Process Management: Fork, Exec, and Beyond/concepts/linux/process-management

Master Linux process management through interactive visualizations. Understand process lifecycle, fork/exec operations, zombies, orphans, and CPU scheduling.

Explore Linux memory management through interactive visualizations. Understand virtual memory, page tables, TLB, swapping, and memory allocation.

Understand Linux system calls through interactive visualizations. Learn how user programs communicate with the kernel, protection rings, and syscall performance.

Master the Linux networking stack through interactive visualizations. Understand TCP/IP layers, sockets, iptables, routing, and network namespaces.

Linux Boot Process: From Power-On to Login/concepts/linux/boot-process

Understand the Linux boot process through interactive visualizations. Learn about BIOS/UEFI, bootloaders, kernel initialization, and the journey to userspace.

Linux Init Systems: From SysV to systemd/concepts/linux/init-systems

Compare Linux init systems through interactive visualizations. Understand the evolution from SysV Init to systemd, service management, and boot orchestration.

Master Linux kernel modules through interactive visualizations. Learn how to load, unload, develop, and debug kernel modules that extend Linux functionality.

__slots__ Optimization/concepts/python/slots-optimization

Understanding Python's __slots__ for memory optimization and faster attribute access

Calculus for Machine Learning/concepts/math-for-ml/calculus-basics

Essential calculus concepts for understanding gradients, optimization, and backpropagation

Flynn's Classification: Taxonomy of Computer Architectures/concepts/performance/flynns-classification

Explore Flynn's Classification of computer architectures through interactive visualizations of SISD, SIMD, MISD, and MIMD systems.

Master thread safety concepts through interactive visualizations of race conditions, mutexes, atomic operations, and deadlock scenarios.

Binary Search Trees: Self-Balancing Data Structures/concepts/data-structures/binary-search-trees

Master binary search trees through interactive visualizations of insertions, deletions, rotations, and self-balancing algorithms like AVL and Red-Black trees.

Hash Tables: Fast Lookups with Collision Resolution/concepts/data-structures/hash-tables

Master hash tables through interactive visualizations of hash functions, collision resolution strategies, load factors, and performance characteristics.

Convolution Operation: The Foundation of CNNs/concepts/deep-learning/convolution-operation

Master the convolution operation through interactive visualizations of sliding windows, feature detection, and the mathematical mechanics behind convolutional neural networks.

Cross-Entropy Loss: The Foundation of Classification/concepts/deep-learning/cross-entropy-loss

Understand cross-entropy loss through interactive visualizations of probability distributions, gradient flow, and its connection to maximum likelihood estimation.

Dilated Convolutions: Expanding Receptive Fields Efficiently/concepts/deep-learning/dilated-convolutions

Master dilated (atrous) convolutions through interactive visualizations of dilation rates, receptive field expansion, gridding artifacts, and applications in segmentation.

Feature Pyramid Networks: Multi-Scale Feature Fusion/concepts/deep-learning/feature-pyramid-networks

Understand Feature Pyramid Networks (FPN) through interactive visualizations of top-down pathways, lateral connections, and multi-scale object detection.

Receptive Field: Understanding CNN Vision/concepts/deep-learning/receptive-field

Explore how receptive fields grow through CNN layers with interactive visualizations of effective vs theoretical fields, architecture comparisons, and pixel contributions.

VAE Latent Space: Understanding Variational Autoencoders/concepts/deep-learning/vae-latent-space

Explore the latent space of Variational Autoencoders through interactive visualizations of encoding, decoding, interpolation, and the reparameterization trick.

Explore virtual memory management through interactive visualizations of page tables, TLB operations, page faults, and memory mapping.

Explore CPU pipeline stages, instruction-level parallelism, pipeline hazards, and branch prediction through interactive visualizations.

Hazard Detection: Pipeline Dependencies and Solutions/concepts/performance/hazard-detection

Master pipeline hazards through interactive visualizations of data dependencies, control hazards, structural conflicts, and advanced detection mechanisms.

CPU Cache Lines: The Unit of Memory Transfer/concepts/memory/cpu-cache-lines

Explore how CPU cache lines work, understand spatial locality, and see why memory access patterns dramatically impact performance through interactive visualizations.

Understanding CUDA Contexts/concepts/gpu/cuda-context

Explore the concept of CUDA contexts, their role in managing GPU resources, and how they enable parallel execution across multiple CPU threads.

Explore how hierarchical attention enables Vision Transformers (ViT) to process sequential data by encoding relative positions.

Hierarchical Attention in Vision Transformers/concepts/attention/hierarchical-attention

Explore how hierarchical attention enables Vision Transformers (ViT) to process sequential data by encoding relative positions.

Multi-Head Attention in Vision Transformers/concepts/attention/multihead-attention

Explore how multi-head attention enables Vision Transformers (ViT) to process sequential data by encoding relative positions.

Positional Embeddings in Vision Transformers/concepts/attention/positional-embeddings-vit

Explore how positional embeddings enable Vision Transformers (ViT) to process sequential data by encoding relative positions.

Interactive Look: Self-Attention in Vision Transformers/concepts/attention/self-attention-vit

Interactively explore how self-attention allows Vision Transformers (ViT) to understand images by capturing global context. Click, explore, and see how it differs from CNNs.

Adaptive Tiling: Efficient Visual Token Generation/concepts/deep-learning/adaptive-tiling

Understanding adaptive tiling in vision transformers - a technique that dynamically adjusts image partitioning based on complexity to optimize token usage while preserving detail.

Emergent Abilities: When AI Suddenly "Gets It"/concepts/deep-learning/emergent-abilities

Understanding emergent abilities in large language models - sudden capabilities that appear at scale thresholds, from arithmetic to reasoning and self-reflection.

Prompt Engineering: Guiding AI Through Language/concepts/deep-learning/prompt-engineering

Master the art of prompt engineering - from basic composition to advanced techniques like Chain-of-Thought and Tree-of-Thoughts.

Deep dive into how different prompt components influence model behavior across transformer layers, from surface patterns to abstract reasoning.

Understanding neural scaling laws - the power law relationships between model size, data, compute, and performance that govern AI capabilities and guide development decisions.

Visual Complexity Analysis: Smart Image Processing/concepts/deep-learning/visual-complexity-analysis

Understanding how AI models analyze visual complexity to optimize processing - measuring entropy, edge density, saliency, and texture for intelligent resource allocation.

Cross-Encoder vs Bi-Encoder/concepts/embeddings/cross-encoder-vs-bi-encoder

Understand the fundamental differences between independent and joint encoding architectures for neural retrieval systems.

Dense Embeddings Space Explorer/concepts/embeddings/dense-embeddings

Interactive visualization of high-dimensional vector spaces, word relationships, and semantic arithmetic operations.

Matryoshka Embeddings/concepts/embeddings/matryoshka-embeddings

Learn about nested representations that enable flexible dimension reduction without retraining models.

Multi-Vector Late Interaction/concepts/embeddings/multi-vector-late-interaction

Explore ColBERT and other multi-vector retrieval models that use fine-grained token-level matching for superior search quality.

Quantization Effects Simulator/concepts/embeddings/quantization-effects

Explore memory-accuracy trade-offs in embedding quantization from float32 to binary representations.

Sparse vs Dense Embeddings/concepts/embeddings/sparse-vs-dense

Compare lexical (BM25/TF-IDF) and semantic (BERT) retrieval approaches, understanding their trade-offs and hybrid strategies.

The Vision-Language Alignment Problem/concepts/multimodal/alignment-problem

Exploring the challenge of aligning visual and textual representations in multimodal AI systems.

Multimodal Scaling Laws/concepts/multimodal/scaling-laws

Understanding how vision-language models scale with data, parameters, and compute following empirical power laws.

Exploring LoRA, adapters, and other parameter-efficient methods for fine-tuning large vision-language models.

Client-Server Communication: Polling vs WebSockets/concepts/networking/client-server-communication

Understanding different client-server communication patterns - from simple polling to real-time WebSocket connections.

Side-by-side comparison of Short Polling, Long Polling, and WebSockets to help you choose the right protocol for your application.

Short Polling: The Impatient Client Pattern/concepts/networking/short-polling

Understanding short polling - a simple but inefficient approach to fetching data at regular intervals.

Understanding WebSockets - the protocol that enables full-duplex communication channels over a single TCP connection.

C++ AST & Parsing/concepts/cpp/ast-parsing

Explore how C++ code is parsed into an Abstract Syntax Tree with interactive visualizations.

C++ Compilation Overview/concepts/cpp/compilation

Understand the complete C++ compilation pipeline from source code to object files.

Design Patterns in C++/concepts/cpp/design-patterns

Learn classic design patterns implemented in modern C++. Explore Singleton, Observer, Factory, and Strategy patterns with interactive examples.

C++ Dynamic Linking/concepts/cpp/dynamic-linking

Master dynamic linking and runtime library loading with interactive visualizations.

C++ Linking Overview/concepts/cpp/linking

Understand how object files are linked together to create executables.

C++ Program Loading/concepts/cpp/loading

Understand how C++ programs are loaded and executed by the operating system.

Memory Management & RAII in C++/concepts/cpp/memory-raii

Learn Resource Acquisition Is Initialization (RAII) - the cornerstone of C++ memory management. Understand automatic resource cleanup and exception safety.

Modern C++ Features (C++11 and Beyond)/concepts/cpp/modern-cpp-features

Explore modern C++ features including auto, lambdas, ranges, and coroutines. Learn how C++11/14/17/20 transformed the language.

Object-Oriented Programming in C++/concepts/cpp/oop-inheritance

Master C++ OOP concepts including inheritance, polymorphism, virtual functions, and modern object-oriented design principles with interactive examples.

C++ Compiler Optimization/concepts/cpp/optimization

Discover how compilers optimize your C++ code through various transformation techniques with interactive demos.

Pointers & References in C++/concepts/cpp/pointers-references

Master C++ pointers and references through interactive visualizations. Learn memory addressing, dereferencing, smart pointers, and avoid common pitfalls.

C++ Preprocessor/concepts/cpp/preprocessor

Master the C++ preprocessor with interactive visualizations of macros, includes, and conditional compilation.

Smart Pointers in Modern C++/concepts/cpp/smart-pointers

Master C++11 smart pointers through interactive examples. Learn unique_ptr, shared_ptr, and weak_ptr with reference counting visualizations.

C++ Stack vs Heap/concepts/cpp/stack-heap

Understand stack and heap memory allocation with interactive visualizations.

C++ Symbol Resolution/concepts/cpp/symbol-resolution

Learn how the linker resolves symbols and fixes undefined references with interactive visualizations.

Templates & STL in C++/concepts/cpp/templates-stl

Master C++ templates and the Standard Template Library. Learn generic programming, template metaprogramming, and STL containers and algorithms.

Gradient Flow in Deep Networks/concepts/deep-learning/gradient-flow

Understanding how gradients propagate through deep neural networks and the vanishing/exploding gradient problems.

Eigenvalues & Eigenvectors/concepts/math-for-ml/eigenvalues-eigenvectors

Visualize eigenvalues and eigenvectors - key concepts for PCA, spectral methods, and matrix analysis.

Gradient Descent/concepts/math-for-ml/gradient-descent

Visualize gradient descent optimization - how neural networks learn by following gradients.

Layer Normalization/concepts/deep-learning/layer-normalization

Understanding layer normalization technique that normalizes inputs across features, making it ideal for sequence models and transformers.

Internal Covariate Shift/concepts/deep-learning/internal-covariate-shift

Understanding the distribution shift problem in deep neural networks that batch normalization solves.

Batch Normalization/concepts/deep-learning/batch-normalization

Understanding batch normalization technique that normalizes inputs to accelerate training and improve neural network performance.

Linear Algebra Fundamentals/concepts/math-for-ml/linear-algebra

Essential linear algebra concepts for machine learning with interactive visualizations

Memory Access Patterns: Sequential vs Strided/concepts/memory/memory-access-patterns

Understand how different memory access patterns impact cache performance, prefetcher efficiency, and overall application speed through interactive visualizations.

Memory Interleaving: Parallel Memory Access/concepts/memory/memory-interleaving

Understand how memory interleaving distributes addresses across multiple banks to enable parallel access, dramatically improving memory bandwidth in modern systems from DDR5 to GPU memory.

NUMA Architecture: Non-Uniform Memory Access/concepts/memory/numa-architecture

Explore NUMA (Non-Uniform Memory Access) architecture, understanding how modern multi-socket systems manage memory locality and the performance implications of local vs remote memory access.

The Modality Gap/concepts/multimodal/modality-gap

Understanding the fundamental separation between visual and textual representations in multimodal models.

Long Polling: The Patient Connection/concepts/networking/long-polling

Understanding long polling - an efficient approach where the server holds requests open until data is available.

Probability Distributions/concepts/math-for-ml/probability-distributions

Interactive visualizations of probability distributions used in machine learning.

Vectors & Matrices/concepts/math-for-ml/vectors-matrices

Understand vectors and matrices - the fundamental data structures in machine learning.

Skip Connections/concepts/deep-learning/skip-connections

Understanding skip connections, residual blocks, and their crucial role in training deep neural networks.

Uses/uses

Tools, software, and hardware I use

Resume/resume

My professional experience and qualifications

Bookmarks/bookmarks

A curated collection of articles and resources I find valuable

Consulting/consulting

Services and consulting offerings

Thank You/thank-you

Confirmation page after form submissions

Sitemap/sitemap

Visual representation of the site structure

Mastodon