AI RESEARCH

Learning Page Order in Shuffled WOO Releases

arXiv CS.LG

ArXi:2602.11040v2 Announce Type: replace We investigate document page ordering on 5,461 shuffled WOO documents (Dutch freedom of information releases) using page embeddings. These documents are heterogeneous collections such as emails, legal texts, and spreadsheets compiled into single PDFs, where semantic ordering signals are unreliable. We compare five methods, including pointer networks, seq2seq transformers, and specialized pairwise ranking models.