Prompt Stability Scoring for Text Annotation with Large Language Models

ArXi:2407.02039v3 Announce Type: replace Researchers are increasingly using language models (LMs) for text annotation. These approaches rely only on a prompt telling the model to return a given output according to a set of instructions. The reproducibility of LM outputs may nonetheless be vulnerable to small changes in the prompt design. This calls into question the replicability of classification routines.