Reliable Microservice Tail Latency Prediction via Decoupled Dual-Stream Learning and Gradient Modulation

ArXi:2508.01635v2 Announce Type: replace Microservice architectures enable scalable cloud-native applications; however, the distributed nature of these systems complicates the maintenance of strict Service Level Objectives. Accurately predicting window-level P95 tail latency remains difficult due to the complex interactions between software workload propagation and infrastructure resource limits. Existing predictive models struggle to capture these dynamics because the lack of explicit separation between traffic metrics and resource metrics causes misaligned feature representations.