MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning

ArXi:2603.16738v1 Announce Type: new Medical language models must be updated as evidence and terminology evolve, yet sequential updating can trigger catastrophic forgetting. Although biomedical NLP has many static benchmarks, no unified, task-diverse benchmark exists for evaluating continual learning under standardized protocols, robustness to task order and compute-aware reporting. We