Code Review Dataset: 200k+ Cases of Human-Written Code Reviews from Top OSS Projects
r/LocalLLaMA
•
Machine Learning
Generative AI
AI Research
AI Tools
I compiled 200k+ human-written code reviews from top OSS projects including React, Tensorflow, VSCode, and more. This dataset helped me finetune a version of Qwen2.5-Coder-32B-Instruct specialized in code reviews. The finetuned model showed significant improvements in generating better code fixes and review comments as it achieved 4x improved BLEU-4, ROUGE-L, SBERT scores compared to base model. Feel free to integrate this dataset into your.