SO-Bench: A Structural Output Evaluation of Multimodal LLMs
CVPR 2026
We conduct a comprehensive study of visual structural output capabilities for MLLMs with SO-Bench, covering four visual domains with over 6.5K JSON schemas and 1.8K curated image-schema pairs, revealing persistent gaps in schema-compliant outputs.















