Here's how my LLM's decoder block changed while training on 5B tokens
r/LocalLLaMA
•
Generative AI
AI Research
I'm monitoring an experimental model's ongoing