LLMs for Qualitative Data Analysis Fail on Security-specificComments in Human Experiments

ArXi:2604.10834v1 Announce Type: cross [Background:] Thematic analysis of free-text justifications in human experiments provides significant qualitative insights. Yet, it is costly because reliable annotations require multiple domain experts. Large language models (LLMs) seem ideal candidates to replace human annotators. [Problem:] Coding security-specific aspects (code identifiers mentioned, lines-of-code mentioned, security keywords mentioned) may require deeper contextual understanding than sentiment classification.