Why are we actually sampling reasoning and output the same way?

r/LocalLLaMA
Generative AI

I've started to notice that my usual setup doesn't work as well in other languages as it did in English - the model sometimes made grammar mistakes and generated genuine garbage. Its reasoning stayed in English and I preferred to leave it that way, as this is the language most LLM's are obviously most 'confident' in. The answer to some of the problems of generating in less trained language was using lower temp. But then again, that influences reasoning, which is in English, and makes creative writing less 'creative'. Regenerating from the same context became deterministic.