AI RESEARCH
How we catch silent NPU fallback on Snapdragon in CI [D]
r/MachineLearning
•
Posting because I've now seen this exact bug at multiple teams shipping ML to Snapdragon, and the pattern is worth writing up. ONNX Runtime's QNN execution provider (the one that targets Qualcomm's Hexagon NPU on Snapdragon SoCs) will silently route uned ops to the CPU. Your accuracy is fine, your eval latency on the de board looks fine, but production latency mysteriously triples because the input distribution stresses fallback paths differently - and the runtime never raises anything louder than a startup-log line nobody reads.