Attention Is All You Need, But All You Can't Afford | Hybrid Attention

r/artificial
Machine Learning Generative AI AI Tools

Repo: I've been building a small Rust-focused language model from scratch in PyTorch. Not a finetune - byte-level, trained from random init on a Rust-heavy corpus assembled in this repo. The run: 25.6M parameters 512 context length 173.5M-byte corpus 30k