Qwen3.5-4B|Gemma4-E2B/E4B uncensored models comparison
r/LocalLLaMA
•
AI Research
I had the idea of splitting the cross-entropy difference into two sums (positive and negative; or the PPL into two ratios >1 and <1) while doing PPL evals of uncensored GGUFs. The inspiration came from looking at the area under the PPL ratio convergence plot (2nd graph) and thinking "what if I scattered the positive and negative area in 2D?". After all: negative delta => predicted the text better than the base model. An uncensored model should score high when evaluated on a censored dataset (correlates with improvement/uncensored knowledge -- assuming a high quality dataset.