Step 1: Adding the CLS Token
CLS
(Special Learnable Token)+
(Sequence of Image Patches) Inspired by BERT, a special [CLS]
token is added to the start of the image patch sequence. Its goal is to aggregate information from all patches and represent the entire image for classification.