AI RESEARCH
Autoregressive Synthesis of Sparse and Semi-Structured Mixed-Type Data
arXiv CS.LG
•
ArXi:2603.01444v2 Announce Type: replace Synthetic data generation is an important capability for privacy-preserving data sharing, system benchmarking and test data provisioning. For mixed-type data, existing synthesizers largely target dense, fixed-schema tables, but many modern data systems and exchange sparse, semi-structured JSON with nested objects, variable-length arrays and optional keys. Applying tabular synthesizers to such data requires flattening records into wide, sparse tables, turning nested structure and arrays into column-layout artifacts.