Paper Detail

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Haoyu Zhang, Zeyu Zhang, Zedong Zhou, Yang Zhao, Hao Tang

huggingface Score 9.5

Published 2026-05-12 · First seen 2026-05-13

General AI

Abstract

Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: dense multi-view attention creates substantial token-mixing overhead, and low-precision execution can destabilize geometry-sensitive representations and degrade depth, pose, and 3D consistency. To address the first challenge, we propose Lite3R, a model-agnostic teacher-student framework that replaces dense attention with Sparse Linear Attention to preserve important geometric interactions while reducing attention cost. To address the second challenge, we introduce a parameter-efficient FP8-aware quantization-aware training (FP8-aware QAT) strategy with partial attention distillation, which freezes the vast majority of pretrained backbone parameters and trains only lightweight linear-branch projection layers, enabling stable low-precision deployment while retaining pretrained geometric priors. We further evaluate Lite3R on two representative backbones, VGGT and DA3-Large, over BlendedMVS and DTU64, showing that it substantially reduces latency (1.7-2.0x) and memory usage (1.9-2.4x) while preserving competitive reconstruction quality overall. These results demonstrate that Lite3R provides an effective algorithm-system co-design approach for practical transformer-based 3D reconstruction. Code: https://github.com/AIGeeksGroup/Lite3R. Website: https://aigeeksgroup.github.io/Lite3R.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
soon
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{zhang2026lite3r,
  title = {Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction},
  author = {Haoyu Zhang and Zeyu Zhang and Zedong Zhou and Yang Zhao and Hao Tang},
  year = {2026},
  abstract = {Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: dense multi-view attention creates substantial token-mixing overhead},
  url = {https://huggingface.co/papers/2605.11354},
  keywords = {transformer-based 3D reconstruction, multi-view attention, dense attention, sparse linear attention, parameter-efficient fine-tuning, FP8-aware quantization-aware training, attention distillation, pretrained backbone, geometric priors, latent space, 3D consistency, depth estimation, pose estimation, code available, huggingface daily},
  eprint = {2605.11354},
  archiveprefix = {arXiv},
}

Metadata

{}