Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition

ArXi:2605.10772v1 Announce Type: cross Large language-vision models (LLVM), such as OpenAI's ChatGPT and GPT-4, have gained prominence as powerful tools for analyzing text and imagery. The merging of these data domains represents a significant paradigm shift with far-reaching implications for automatic target recognition (ATR). Recent transformer-based LLVM research has shown substantial improvements for geospatial perception tasks.