RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language'
r/LocalLLaMA
•
Generative AI
NLP
So, I've had my H100s grind for you all, and have some interesting new results AND fresh models! So, what did I find? Well because my blog article are too damn long ( I know some of you are not reading the whole thing. ), here is a TL;DR: I found that LLMs seem to think in a universal language. During the middle layers, the models latent representations are similar on the same content in Chinese and English than different content in the same language. I tried a bunch of different stuff, but in the end, repeating blocks in the middle of the transformer stack works the best.