Here's how my LLM's decoder block changed while training on 5B tokens

r/LocalLLaMA
Generative AI AI Research

I'm monitoring an experimental model's ongoing