Is it possible to train LLMs, through targeted training, to learn to admit “I don’t know”?

Like, using a custom dataset where the labeled response to unanswerable questions is simply “I don’t know” for RL; penalizing the model for hallucinating, and rewarding it for either giving the correct answer or admitting it’s not sure. I’m not an AI pro, so take my example with a grain of salt. What I’m really wondering is whether the concept in the title is theoretically feasible? submitted by /u/KonstancjaCarla [link] [comments]