SFT + DPO on open-sourced SLMs

Hey folks, this is for those who appreciate experimentation on open-sourced AI models. We fine-tuned open-sourced SMLs (3B and 7B parameters) with SFT + DPO against commercial models like GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, Google Document API, and open-source alternatives like OlmOCR, Deepseek-OCR, GLMOCR, and Qwen3. The specialized models won. Scores: 0.925 (7B parameters) and 0.911 (3B), higher performance scores than all LLMs. DPO was used to reduce degenerate outputs as rejected examples and reduced the failure rate by up to 87.6.