When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining

ArXi:2605.07756v1 Announce Type: cross Modern deep models are often pretrained on large-scale data with missing labels using composite objectives, where the relative weights of multiple loss terms act as hyperparameters. Tuning these weights with random search or Bayesian optimization is computationally expensive, as it requires many independent