Training a number-aware embedding model + Text JEPA doesn't work too well + Text auto-encoders have a strange frequency bias [R][P]

Hi guys! I've spent 1y trying to predict company growth from the full text of their 10-k filings. It completely failed. But I've had a lot of fun playing with encoder transformers and making them good at numbers (bypassing the tokenizer/prediction head when it sees one). I've MLM-trained a modified ModernBERT for this and it works really well. The model is available on HF: Then, I've made this MLM-trained model into a nice sequence embedder. I've experimented with JEPA, but it failed. The auto-encoder setup worked much better.