Paper Detail

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

NVIDIA, Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, Tyler Poon, Danial Mohseni Taheri, Ilia Karmanov, Guilin Liu, Jarno Seppanen, Arushi Goel, Mike Ranzinger, Greg Heinrich, Guo Chen, Lukas Voegtle, Philipp Fischer, Timo Roman, Karan Sapra, Collin McCarthy, Shaokun Zhang, Fuxiao Liu, Hanrong Ye, Yi Dong, Mingjie Liu, Yifan Peng, Piotr Zelasko, Zhehuai Chen, Nithin Rao Koluguri, Nune Tadevosyan, Lilit Grigoryan, Ehsan Hosseini Asl, Pritam Biswas, Leili Tavabi, Yuanhang Su, Zhiding Yu, Peter Jin, Alexandre Milesi, Netanel Haber, Yao Xu, Sarah Amiraslani, Nabin Mulepati, Eric Tramel, Jaehun Jung, Ximing Lu, Brandon Cui, Jin Xu, Zhiqi Li, Shihao Wang, Yuanguo Kuang, Huck Yang, Boyi Li, Hongxu Yin, Song Han, Pavlo Molchanov, Adi Renduchintala, Charles Wang, David Mosallanezhad, Soumye Singhal, Luis Vega, Katherine Cheung, Sreyan Ghosh, Yian Zhang, Alexander Bukharin, Venkat Srinivasan, Johnny Greco, Andre Manoel, Maarten Van Segbroeck, Suseella Panguliri, Rohit Watve, Divyanshu Kakwani, Shubham Pachori, Jeffrey Glick, Radha Sri-Tharan, Aileen Zaman, Khanh Nguyen, Shi Chen, Jiaheng Fang, Qing Miao, Wenfei Zhou, Yu Wang, Zaid Pervaiz Bhat, Varun Praveen, Arihant Jain, Ramanathan Arunachalam, Tomasz Kornuta, Ashton Sharabiani, Amy Shen, Wei Huang, Yi-Fu Wu, Ali Roshan Ghias, Huiying Li, Brian Yu, Nima Tajbakhsh, Chen Cui, Wenwen Gao, Li Ding, Terry Kong, Manoj Kilaru, Anahita Bhiwandiwalla, Marek Wawrzos, Daniel Korzekwa, Pablo Ribalta, Grzegorz Chlebus, Besmira Nushi, Ewa Dobrowolska, Maciej Jakub Mikulski, Kunal Dhawan, Steve Huang, Jagadeesh Balam, Yongqiang Wang, Nikolay Karpov, Valentin Mendelev, George Zelenfroynd, Meline Mkrtchyan, Omri Almog, Bhavesh Pawar, Rameshwar Shivbhakta, Sudeep Sabnis, Ashrton Sharabiani, Negar Habibi, Geethapriya Venkataramani, Pamela Peng, Prerit Rodney, Serge Panev, Richard Mazzarese, Nicky Liu, Michael Fukuyama, Andrii Skliar, Roger Waleffe, Duncan Riach, Yunheng Zou, Jian Hu, Hao Zhang, Binfeng Xu, Yuhao Yang, Zuhair Ahmed, Carlo del Mundo, Chad Voegele, Zhiyu Cheng, Nave Assaf, Daniel Afrimi, Natan Bagrov, Ran Zilberstein, Ofri Masad, Eugene Khvedchenia, Borys Tymchenko, Tomer Asida, Parth Mannan, Victor Cui, Michael Evans, Katherine Luna, Jie Lou, Pinky Xu, Guyue Huang, Michael Boone, Pradeep Thalasta, Adeola Adesoba, Dina Yared, Christopher Parisien, Leon Derczynski, Shaona Ghosh, Wes Feely, Micah Schaffer, Barnaby Simkin, Tomasz Grzegorzek, Rishabh Garg, Aastha Jhunjhunwala, Sergei Kolchenko, Farzan Memarian, Haran Kumar, Shiv Kumar, Isabel Hulseman, Anjali Shah, Kari Briski, Padmavathy Subramanian, Joey Conway, Udi Karpas, Jane Polak Scowcroft, Annie Surla, Shilpa Ammireddy, Ellie Evans, Jesse Oliver, Tom Balough, Chia-Chih Chen, Sandip Bhaskar, Alejandra Rico, Bardiya Sadeghi, Seph Mard, Meredith Price, Laya Sleiman, Saori Kaji, Wesley Helmholz, Wendy Quan, Michael Lightstone, Jonathan Cohen, Jian Zhang, Oleksii Kuchaiev, Boris Ginsburg, Jan Kautz, Eileen Long, Mohammad Shoeybi, Mostofa Patwary, Oluwatobi Olabiyi, Andrew Tao, Bryan Catanzaro

huggingface Score 10.0

Published 2026-04-27 · First seen 2026-05-02

Research Track B · General AI

Abstract

We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni delivers consistent accuracy improvements over its predecessor, Nemotron Nano V2 VL, across all modalities, enabled by advances in architecture, training data and recipes. In particular, Nemotron 3 delivers leading results in real-world document understanding, long audio-video comprehension, and agentic computer use. Built on the highly efficient Nemotron 3 Nano 30B-A3B backbone, Nemotron 3 Nano Omni further incorporates innovative multimodal token-reduction techniques to deliver substantially lower inference latency and higher throughput than other models of similar size. We are releasing model checkpoints in BF16, FP8, and FP4 formats, along with portions of the training data and codebase to facilitate further research and development.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
soon
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{nvidia2026nemotron,
  title = {Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence},
  author = {NVIDIA and Amala Sanjay Deshmukh and Kateryna Chumachenko and Tuomas Rintamaki and Matthieu Le and Tyler Poon and Danial Mohseni Taheri and Ilia Karmanov and Guilin Liu and Jarno Seppanen and Arushi Goel and Mike Ranzinger and Greg Heinrich and Guo Chen and Lukas Voegtle and Philipp Fischer and Timo Roman and Karan Sapra and Collin McCarthy and Shaokun Zhang and Fuxiao Liu and Hanrong Ye and Yi Dong and Mingjie Liu and Yifan Peng and Piotr Zelasko and Zhehuai Chen and Nithin Rao Koluguri and Nune Tadevosyan and Lilit Grigoryan and Ehsan Hosseini Asl and Pritam Biswas and Leili Tavabi and Yuanhang Su and Zhiding Yu and Peter Jin and Alexandre Milesi and Netanel Haber and Yao Xu and Sarah Amiraslani and Nabin Mulepati and Eric Tramel and Jaehun Jung and Ximing Lu and Brandon Cui and Jin Xu and Zhiqi Li and Shihao Wang and Yuanguo Kuang and Huck Yang and Boyi Li and Hongxu Yin and Song Han and Pavlo Molchanov and Adi Renduchintala and Charles Wang and David Mosallanezhad and Soumye Singhal and Luis Vega and Katherine Cheung and Sreyan Ghosh and Yian Zhang and Alexander Bukharin and Venkat Srinivasan and Johnny Greco and Andre Manoel and Maarten Van Segbroeck and Suseella Panguliri and Rohit Watve and Divyanshu Kakwani and Shubham Pachori and Jeffrey Glick and Radha Sri-Tharan and Aileen Zaman and Khanh Nguyen and Shi Chen and Jiaheng Fang and Qing Miao and Wenfei Zhou and Yu Wang and Zaid Pervaiz Bhat and Varun Praveen and Arihant Jain and Ramanathan Arunachalam and Tomasz Kornuta and Ashton Sharabiani and Amy Shen and Wei Huang and Yi-Fu Wu and Ali Roshan Ghias and Huiying Li and Brian Yu and Nima Tajbakhsh and Chen Cui and Wenwen Gao and Li Ding and Terry Kong and Manoj Kilaru and Anahita Bhiwandiwalla and Marek Wawrzos and Daniel Korzekwa and Pablo Ribalta and Grzegorz Chlebus and Besmira Nushi and Ewa Dobrowolska and Maciej Jakub Mikulski and Kunal Dhawan and Steve Huang and Jagadeesh Balam and Yongqiang Wang and Nikolay Karpov and Valentin Mendelev and George Zelenfroynd and Meline Mkrtchyan and Omri Almog and Bhavesh Pawar and Rameshwar Shivbhakta and Sudeep Sabnis and Ashrton Sharabiani and Negar Habibi and Geethapriya Venkataramani and Pamela Peng and Prerit Rodney and Serge Panev and Richard Mazzarese and Nicky Liu and Michael Fukuyama and Andrii Skliar and Roger Waleffe and Duncan Riach and Yunheng Zou and Jian Hu and Hao Zhang and Binfeng Xu and Yuhao Yang and Zuhair Ahmed and Carlo del Mundo and Chad Voegele and Zhiyu Cheng and Nave Assaf and Daniel Afrimi and Natan Bagrov and Ran Zilberstein and Ofri Masad and Eugene Khvedchenia and Borys Tymchenko and Tomer Asida and Parth Mannan and Victor Cui and Michael Evans and Katherine Luna and Jie Lou and Pinky Xu and Guyue Huang and Michael Boone and Pradeep Thalasta and Adeola Adesoba and Dina Yared and Christopher Parisien and Leon Derczynski and Shaona Ghosh and Wes Feely and Micah Schaffer and Barnaby Simkin and Tomasz Grzegorzek and Rishabh Garg and Aastha Jhunjhunwala and Sergei Kolchenko and Farzan Memarian and Haran Kumar and Shiv Kumar and Isabel Hulseman and Anjali Shah and Kari Briski and Padmavathy Subramanian and Joey Conway and Udi Karpas and Jane Polak Scowcroft and Annie Surla and Shilpa Ammireddy and Ellie Evans and Jesse Oliver and Tom Balough and Chia-Chih Chen and Sandip Bhaskar and Alejandra Rico and Bardiya Sadeghi and Seph Mard and Meredith Price and Laya Sleiman and Saori Kaji and Wesley Helmholz and Wendy Quan and Michael Lightstone and Jonathan Cohen and Jian Zhang and Oleksii Kuchaiev and Boris Ginsburg and Jan Kautz and Eileen Long and Mohammad Shoeybi and Mostofa Patwary and Oluwatobi Olabiyi and Andrew Tao and Bryan Catanzaro},
  year = {2026},
  abstract = {We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni delivers consistent accuracy improvements over its predecessor, Nemotron Nano V2 VL, across all modalities, enabled by advances in architecture, training data and recipes. In particular, Nemotron 3 delivers leading results in real-world document understanding, long audio-video comprehension, and agentic comput},
  url = {https://huggingface.co/papers/2604.24954},
  keywords = {multipmodal, audio inputs, text inputs, image inputs, video inputs, document understanding, long audio-video comprehension, agentic computer use, token-reduction techniques, inference latency, throughput, model checkpoints, training data, codebase, huggingface daily},
  eprint = {2604.24954},
  archiveprefix = {arXiv},
}

Metadata

{}