Paper Detail

Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining

Junxuan Li, Rawal Khirodkar, Chengan He, Zhongshi Jiang, Giljoo Nam, Lingchen Yang, Jihyun Lee, Egor Zakharov, Zhaoen Su, Rinat Abdrashitov, Yuan Dong, Julieta Martinez, Kai Li, Qingyang Tan, Takaaki Shiratori, Matthew Hu, Peihong Guo, Xuhua Huang, Ariyan Zarei, Marco Pesavento, Yichen Xu, He Wen, Teng Deng, Wyatt Borsos, Anjali Thakrar, Jean-Charles Bazin, Carsten Stoll, Ginés Hidalgo, James Booth, Lucy Wang, Xiaowen Ma, Yu Rong, Sairanjith Thalanki, Chen Cao, Christian Häne, Abhishek Kar, Sofien Bouaziz, Jason Saragih, Yaser Sheikh, Shunsuke Saito

arxiv Score 4.8

Published 2026-04-02 · First seen 2026-04-04

General AI

Abstract

High-quality 3D avatar modeling faces a critical trade-off between fidelity and generalization. On the one hand, multi-view studio data enables high-fidelity modeling of humans with precise control over expressions and poses, but it struggles to generalize to real-world data due to limited scale and the domain gap between the studio environment and the real world. On the other hand, recent large-scale avatar models trained on millions of in-the-wild samples show promise for generalization across a wide range of identities, yet the resulting avatars are often of low-quality due to inherent 3D ambiguities. To address this, we present Large-Scale Codec Avatars (LCA), a high-fidelity, full-body 3D avatar model that generalizes to world-scale populations in a feedforward manner, enabling efficient inference. Inspired by the success of large language models and vision foundation models, we present, for the first time, a pre/post-training paradigm for 3D avatar modeling at scale: we pretrain on 1M in-the-wild videos to learn broad priors over appearance and geometry, then post-train on high-quality curated data to enhance expressivity and fidelity. LCA generalizes across hair styles, clothing, and demographics while providing precise, fine-grained facial expressions and finger-level articulation control, with strong identity preservation. Notably, we observe emergent generalization to relightability and loose garment support to unconstrained inputs, and zero-shot robustness to stylized imagery, despite the absence of direct supervision.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
later
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{li2026large,
  title = {Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining},
  author = {Junxuan Li and Rawal Khirodkar and Chengan He and Zhongshi Jiang and Giljoo Nam and Lingchen Yang and Jihyun Lee and Egor Zakharov and Zhaoen Su and Rinat Abdrashitov and Yuan Dong and Julieta Martinez and Kai Li and Qingyang Tan and Takaaki Shiratori and Matthew Hu and Peihong Guo and Xuhua Huang and Ariyan Zarei and Marco Pesavento and Yichen Xu and He Wen and Teng Deng and Wyatt Borsos and Anjali Thakrar and Jean-Charles Bazin and Carsten Stoll and Ginés Hidalgo and James Booth and Lucy Wang and Xiaowen Ma and Yu Rong and Sairanjith Thalanki and Chen Cao and Christian Häne and Abhishek Kar and Sofien Bouaziz and Jason Saragih and Yaser Sheikh and Shunsuke Saito},
  year = {2026},
  abstract = {High-quality 3D avatar modeling faces a critical trade-off between fidelity and generalization. On the one hand, multi-view studio data enables high-fidelity modeling of humans with precise control over expressions and poses, but it struggles to generalize to real-world data due to limited scale and the domain gap between the studio environment and the real world. On the other hand, recent large-scale avatar models trained on millions of in-the-wild samples show promise for generalization across},
  url = {https://arxiv.org/abs/2604.02320},
  keywords = {cs.CV, cs.GR},
  eprint = {2604.02320},
  archiveprefix = {arXiv},
}

Metadata

{}