Why the performances tests with contexts of around 500 tokens and missing information
r/LocalLLaMA
•
Generative AI
AI Research
Wanting to make sure I’m not missing something here. I see a lot of posts around performance on new hardware and it feels like it’s always on a small context at missing the information around quantization. I’m under the impression that use cases for llms generally require substantially larger contexts. Mine range from 4-8k with embedding to 50k+ when working on my small code bases. I’m also aware of the impact that quants make on the models performance in what it returns and its speed (inc. k quants.