MermaidSeqBench: An Evaluation Benchmark for NL-to-Mermaid Sequence Diagram Generation

ArXi:2511.14967v2 Announce Type: replace-cross Large language models (LLMs) have shown great promise in generating structured diagrams from natural language descriptions, particularly Mermaid sequence diagrams for software engineering. However, the lack of existing benchmarks to assess the LLM's correctness on this task hinders the reliable deployment of these models in production environments. To address this shortcoming, we