I put a transformer model on a stock Commodore 64
r/LocalLLaMA
•
Generative AI
NLP
AI Research
Not a chatbot pretending. Not a lookup table with a trench coat. A proper decoder-only transformer. Attention, RMSNorm, feed-forward, residuals, the works. Two layers, four heads, about 25,000 parameters. All int8. Trained with quantization-aware