AI RESEARCH

What Makes Words Hard? Sakura at BEA 2026 Shared Task on Vocabulary Difficulty Prediction

arXiv CS.CL

ArXi:2605.14257v1 Announce Type: new We describe two types of models for vocabulary difficulty prediction: a high-accuracy black-box model, which achieved the top shared task result in the open track, and an explainable model, which outperforms a fine-tuned encoder baseline. As the black-box model, we fine-tuned an LLM using a soft-target loss function for effective application to the rating task, achieving r > 0.91. The explainable model provides insights into what impacts the difficulty of each item while maintaining a strong correlation (r > 0.77.