TinyLoRA + nightly RL updates = simulated neuroplasticity? Thinking through the implications.

Meta's TinyLoRA paper shows 13 parameters matching full fine-tuning performance on GSM8K when trained with RL. The key finding that jumped out at me: RL is 100-1000x parameter-efficient than SFT because the reward signal is cleaner and sparser. This got me thinking about an application nobody seems to be discussing. Minsky's Emotion Machine argues that human cognition works through multiple "Ways to Think" - different configurations the brain switches between based on the problem type. Anger, curiosity, fear aren't emotions separate from thinking.