500k context on 48gb VRAM!! - 21tok/s (coding)

I found this model hiding in the corner of huggingface: Looks to be tuned specifically for math but i thought i'd give it a try since i cant run the full 120b nemotron super and it seem to hold up like a champ in agentic coding for some odd reason. been using it to code all my projects for a week now its amazing. Wouldnt dream of having 500k tokens on my potato dual TITAN RTX. If you do happen to try it drop a cmment on your experience with it where did it break what usecase did u use it for ETC. submitted by /u/Express_Quail_1493 [link] [comments.