REFLEX: Reference-Free Evaluation of Log Summarization via Large Language Model Judgment

ArXi:2511.07458v2 Announce Type: replace-cross Evaluating log summarization systems is challenging due to the lack of high-quality reference summaries and the limitations of existing metrics like ROUGE and BLEU, which depend on surface-level lexical overlap. We