Building an LLM From Scratch: I Trained Word Embeddings on Dostoevsky. Here’s What I Found.
Towards AI
•
Generative AI
NLP
In my past article I wrote about how I implemented Character Level Tokenization over a very small corpus and understood the most basic and initial phases of NLP and base of LLMs. This time I implemented the next step towards Modern NLP and LLM system “Embeddings” and implemented it from scratch and trained it over my laptop which took 30+ hours. Let’s dive into the concept, mathematics and code that i understood in my process of building LLM from scratch and implementing the embeddings to my corpus of nearly 1M words.