Reinforcement learning implementation in AI Toolkit

I always wanted to try to fine-tune models to my own preferences to make them a bit personalized. LoRA can train a certain character or style - this thing lets you steer model outputs directly without any references at all or even fine-tune an existing LoRA. This is in a way what Midjourney does when it gives you two pictures to vote and then builds your own slightly custom version of their model. The PR is open here: Default parameters seem quite well tuned for quick results within a few iterations.