This can noticeably lower the number of tokens for the subsequent diffusion transformer model, enabling us to train videos at the first resolution and frame level. A clear, turquoise river flows through a rocky canyon...A transparent, turquoise river flows via a rocky canyon, cascading in excess of a small waterfall https://www.youtube.com/@antonchakma9384