HOW ROBOT DATA BECOMES TOKENS
SO-ARM101 Bimanual Observation
🤖 Same Tokenization Pattern!
Camera images become ViT patch tokens

Joint angles become proprioceptive tokens

One unified sequence → Same transformer encoder!