Query-Conditioned Test-Time Self-Training for Large Language Models

ArXi:2605.13369v1 Announce Type: cross Large language models (LLMs) are typically deployed with fixed parameters, and their performance is often improved by allocating computation at inference time. While such test-time scaling can be effective, it cannot correct model misconceptions or adapt the model to the specific structure of an individual query. Test-time optimization addresses this limitation by enabling parameter updates during inference, but existing approaches either rely on external data or optimize generic self-supervised objectives that lack query-specific alignment.