Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

r/LocalLLaMA
Generative AI

So, here's an update to my GRPO