AI RESEARCH

bioLeak: Leakage-Aware Modeling and Diagnostics for Machine Learning in R

arXiv CS.LG

ArXi:2604.10965v1 Announce Type: cross Data leakage remains a recurrent source of optimistic bias in biomedical machine learning studies. Standard row-wise cross-validation and globally estimated preprocessing steps are often inappropriate for data with repeated measurements, study-level heterogeneity, batch effects, or temporal dependencies. This paper describes bioLeak, an R package for constructing leakage-aware resampling workflows and for auditing fitted models for common leakage mechanisms.