Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge

ArXi:2604.15357v1 Announce Type: cross Precise estimation of model inference latency is crucial for time-critical mobile edge applications, enabling devices to calculate latency margins against deadlines and trade them for enhanced model performance or resource savings. However, the ubiquity of Dynamic Voltage and Frequency Scaling (DVFS) renders traditional static profiling invalid in real-world deployments, as inference latency fluctuates with varying processor (CPU and GPU) frequencies.