AI RESEARCH
End-to-End Chatbot Evaluation with Adaptive Reasoning and Uncertainty Filtering
arXiv CS.CL
•
ArXi:2603.10570v1 Announce Type: new Large language models (LLMs) combined with retrieval augmented generation have enabled the deployment of domain-specific chatbots, but these systems remain prone to generating uned or incorrect answers. Reliable evaluation is therefore critical, yet manual review is costly and existing frameworks often depend on curated test sets and static metrics, limiting scalability. We propose an end-to-end automatic evaluator designed to substantially reduce human effort.