Training LFM-2.5-350M on Reddit post summarization with GRPO on my 3x Mac Minis — final evals and t-test evals are here

r/LocalLLaMA
Generative AI

So, with this project I want to see if a length constrained (like 64 tokens only) quality summarization can be done by tiny LLMs using GRPO! So, I trained two variants of this task: using just length penalty using a single quality reward/combination of those and length penalty I ran LLM-As-A-Judge eval for checking the summarization quality using DeepEval tools.