AI RESEARCH
I trained a NER model on 33,000 Indian Supreme Court judgments (1950–2024) CASE_CITATION hits 97.76% F1, +17 points over the only prior baseline [P]
r/MachineLearning
•
TL;DR: Released en_legal_ner_ind_trf v0.1 - InLegalBERT fine-tuned on ~34,700 silver-annotated chunks from 33k Indian SC judgments. 13 labels. 78.67% overall F1. CASE_CITATION at 97.76% already exceeds OpenNyAI's PRECEDENT score by +17 points. Free, Apache-2.0. Why this exists OpenNyAI is the only prior Indian legal NER model with any community presence. It's unmaintained and degrades on pre-1990 OCR-era text - the first 40 years of India's constitutional jurisprudence. No replacement existed.