AI RESEARCH
Adaptive Engram Memory System for Indonesian Language Model: Generative AI Based on TOBA LM for Batak and Minang Language
arXiv CS.CL
•
ArXi:2603.10006v1 Announce Type: new This study presents TOBA-LM, a trilingual language model based on GPT-2 architecture with 1.2B parameters, trained on a corpus encompassing Indonesian, Batak, and Minangkabau using syllabic-agglutinative tokenization. The architecture integrates an Engram Memory mechanism, an adaptive n-gram-based memory system with a 500,000 x 768 embedding table that captures morphological dependencies through bigram and trigram pathways. Empirical results nstrate a.