I built a 5M model to see if it outperforms my 350M model...

r/LocalLLaMA
Generative AI Open Source AI

Hi r/LocalLLaMA! I built a 5M Llama model with HF Transformers on 2x T4 in Kaggle to see, if it is able to be as good as my previous Apex 350M model. Link to the research site: It came out, that if you optimize the model enough and train it on much data it can be nearly as good as a 70 times heavier model (like Apex 350M; GPT-2 architecture). Tell me what you think about it! Spark v5 coming soon. Expect it to be good 😃 submitted by /u/LH-Tech_AI [link] [comments]