How do you do OOD detection on a closed LLM API with no latent access?
r/artificial
•
Generative AI
AI Research
Classical OOD detection assumes you can see the model. Mahalanobis on features and energy on logits are typical, and both require cracking the model open. With closed LLM APIs you get text in, text out, and maybe top K logprobs per token if you are lucky. The methods that survive that constraint are sampling consistency like SelfCheckGPT, token level entropy on whatever logprobs the API exposes, proxy embeddings from your own encoder, or a separate verifier model on the output.