Tinylora shows lora training works at 13 parameters + own experiments to verify claims

r/LocalLLaMA
AI Research

The tinylora paper shows that we can alter model behavior with only a few parameters. I tried replicating the paper, and made a tinylora implementation for qwen3.5, and it does work, it's crazy to think about. I got the same results as the paper, for example, increasing the rank just made the optimization space too large for it to converge correctly. What did improve it, was giving the MLP and attention layers their own shared 13 parameters to adjust. IE all mlp layers has 13 parameters together, and all attention layers has 13, so a total of 26.