From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios

Abstract

Dense prediction tasks hold significant importance of computer vision, aiming to learn pixel-wise annotated label for an input image. Despite advances in this field, existing methods primarily focus on idealized conditions, with limited generalization to real-world scenarios and facing the challenging scarcity of real-world data. We aim to expand dense prediction to a broader range of practical real-world scenarios while reducing the reliance on large-scale data under limited supervision. To systematically study this problem, we first introduce DenseWorld, a benchmark spanning a broad set of 25 dense prediction tasks that correspond to urgent real-world applications, featuring unified evaluation across tasks. Then, we propose DenseDiT, which maximally exploits generative models' visual priors to perform diverse real-world dense prediction tasks through a unified strategy. DenseDiT combines a parameter-reuse mechanism and two lightweight branches that adaptively integrate multi-scale context, working with less than 0.1% additional parameters. Evaluations on DenseWorld reveal significant performance drops in existing general and specialized baselines, highlighting their limited real-world generalization. In contrast, DenseDiT achieves superior results using less than 0.01% training data of baselines, underscoring its practical value for real-world deployment.

DenseWorld Benchmark

Overview of the DenseWorld benchmark. Upper left: the construction pipeline. Center left: examples of representative tasks across five real-world categories. Lower left: unified evaluation. Right: full taxonomy of 25 dense prediction tasks, each aligned with a practical application scenario.

DenseDiT Architecture

Overview of the DenseDiT architecture. DenseDiT is a generative-based dense prediction framework tailored for diverse real-world scenarios. It operates via a parameter-reuse mechanism, preserving the visual priors of the pretrained DiT backbone. To enhance task adaptability, DenseDiT introduces two lightweight branches—the prompt branch and the demonstration branch—providing semantic and visual contextual cues under the control of a Distribution Alignment Indicator (DAI). This design ensures both data-efficiency and strong generalization without modifying the core generative architecture.

BibTeX

@misc{xia2025idealrealunifieddataefficient,
      title={From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios}, 
      author={Changliang Xia and Chengyou Jia and Zhuohang Dang and Minnan Luo},
      year={2025},
      eprint={2506.20279},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.20279}, 
    }

From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios

More Demos and Comparison

Abstract

DenseWorld Benchmark

DenseDiT Architecture

BibTeX