235M param LLM from scratch on a single RTX 5080

Hey everyone, Been working on this for a while and figured I'd share it here too. I made a small transformer language model completely from scratch in PyTorch. No pretrained weights, no HuggingFace downloads. Every parameter was trained from raw text on a single consumer GPU. Current release is Plasma 1.0 (235M params, 18 layers, hidden size 1024). LLaMA-style: GQA with 16 query heads and 4 KV heads (head_dim 64), SwiGLU FFN with 2816 intermediate dim, RoPE with theta 10000, RMSNorm pre-norm, tied embeddings. 32k SentencePiece BPE vocab.