Hierarchical Embedding Fusion for Retrieval-Augmented Code Generation

ArXi:2603.06593v1 Announce Type: cross Retrieval-augmented code generation often conditions the decoder on large retrieved code snippets. This ties online inference cost to repository size and On RepoBench and RepoEval, HEF with a 1.8B-parameter pipeline achieves exact-match accuracy comparable to snippet-based retrieval baselines, while operating at sub-second median latency on a single A100 GPU. Compared to graph-based and iterative retrieval systems in our experimental setup, HEF reduces median end-to-end latency by 13 to 26 times. We also.