Stress Testing Factual Consistency Metrics for Long-Document Summarization

ArXi:2511.07689v2 Announce Type: replace-cross Evaluating the factual consistency of abstractive text summarization remains a significant challenge, particularly for long documents, where conventional metrics struggle with input length limitations and long-range dependencies. In this work, we systematically evaluate the reliability of six widely used reference-free factuality metrics, originally proposed for short-form summarization, in the long-document setting.