AI RESEARCH
ONTO: A Token-Efficient Columnar Notation for LLM Input Optimization
arXiv CS.LG
•
ArXi:2604.17512v1 Announce Type: cross Serialization formats designed for document interchange impose structural overhead that becomes prohibitive when large language models consume operational data at scale. A modest dataset of 1,000 IoT sensor readings serialized as JSON requires approximately 80,000 tokens - the majority spent on repeated field names, nested braces, and structural punctuation rather than semantic content.