Attention Is All You Need, But All You Can't Afford | Hybrid Attention
r/artificial
•
Machine Learning
Generative AI
AI Tools
Repo: I've been building a small Rust-focused language model from scratch in PyTorch. Not a finetune - byte-level, trained from random init on a Rust-heavy corpus assembled in this repo. The run: 25.6M parameters 512 context length 173.5M-byte corpus 30k