Is Qwen 3.5 0.8B the optimal choice for local RAG implementations in 2026?

r/LocalLLaMA
Generative AI AI Safety Open Source AI

Recent benchmarks, specifically regarding the AA-Omniscience Hallucination Rate, suggest a counter-intuitive trend. While larger models in the Qwen 3.5 family (9B and 397B) show hallucination rates exceeding 80% in "all-knowing" tests, the Qwen 3.5 0.8B variant nstrates a significantly lower rate of approximately 37%. For those using AnythingLLM, have you found that the 0.8B parameter scale provides better "faithfulness" to the retrieved embeddings compared to larger models? submitted by /u/koloved [link] [comments.