Prompting Underestimates LLM Capability for Time Series Classification

ArXi:2601.03464v2 Announce Type: replace Prompt-based evaluations suggest that large language models (LLMs) perform poorly on time series classification, raising doubts about whether they encode meaningful temporal structure. We show that this