Building a Fast Multilingual OCR Model with Synthetic Data

The Problem: Data, Not Architecture A Generic Synthetic Data Pipeline Text: mOSCAR Rendering: Modified SynthDoG What the Data Looks Like Dataset at a Glance Extensibility The Model: Nemotron OCR v2 Why the Model Is Fast The architecture is based on the FOTS (Fast Oriented Text Spotting) design, which unifies detection and recognition into a single network with a shared convolutional backbone. The detection backbone (RegNetX-8GF) processes the input image once and produces feature maps that are r