[P] I've trained my own OMR model (Optical Music Recognition)

Hi i trained an optical music recognition model and wanted to share it here because I think my approach can get improvments and feedback. Clarity-OMR takes sheet music PDFs and converts them to MusicXML files. The core is a DaViT-Base encoder paired with a custom Transformer decoder that outputs a 487-token music vocabulary. The whole thing runs as a 4-stage pipeline: YOLO for staff detection → DaViT+RoPE decoder for recognition → grammar FSA for constrained beam search → MusicXML export.