From Words to Amino Acids: Does the Curse of Depth Persist?

ArXi:2602.21750v2 Announce Type: replace Protein language models (PLMs) have become widely adopted as general-purpose models, nstrating strong performance in protein engineering and de novo design. Like large language models (LLMs), they are typically trained as deep transformers with next-token or masked-token prediction objectives on massive sequence corpora and are scaled by increasing model depth. Recent work on autoregressive LLMs has identified the Curse of Depth: many later layers contribute little to the final output predictions.