Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements

ArXi:2604.19790v1 Announce Type: cross Large language models (LLMs) are increasingly deployed under diverse numerical precision configurations, including standard floating-point formats (e.g., bfloat16 and float16) and quantized integer formats (e.g., int16 and int8), to meet efficiency and resource constraints. However, minor inconsistencies between LLMs of different precisions are difficult to detect and are often overlooked by existing evaluation methods.