[P] Yet another garage model - Prisma: Interpretability-Inspired Architecture

Hey y'all! I think some of you might be interested in this creature. Don't roast me that much, as I really wanted to collect your feedback and ideas about this crap prototype. At least it is not GPT/Llama/Mistral/Qwen architecture based, I based it on some ideas that I had while studying other models. The basic differences are: Attention and output weight sharing (reduces parameters); Additional weight set in the FFN (increases parameters, yay!