Paper Detail

OpenThoughts-Agent: Data Recipes for Agentic Models

Negin Raoof, Richard Zhuang, Marianna Nezhurina, Etash Guha, Atula Tejaswi, Ryan Marten, Charlie F. Ruan, Tyler Griggs, Alexander Glenn Shaw, Hritik Bansal, E. Kelly Buchanan, Artem Gazizov, Reinhard Heckel, Chinmay Hegde, Sankalp Jajee, Daanish Khazi, Emmanouil Koukoumidis, Xiangyi Li, Hange Liu, Shlok Natarajan, Harsh Raj, Nicholas Roberts, Ethan Shen, Nishad Singhi, Michael Siu, Ashima Suvarna, Hanwen Xing, Patrick Yubeaton, Robert Zhang, Leon Liangyu Chen, Xiaokun Chen, Steven Dillmann, Saadia Gabriel, Xunyi Jiang, Anurag Kashyap, Boxuan Li, Yein Park, Minh Pham, Sujay Sanghavi, Lin Shi, Ke Sun, Yixin Wang, Zhiwei Xu, Erica Zhang, Siyan Zhao, Wanjia Zhao, Jenia Jitsev, Alex Dimakis, Benjamin Feuer, Ludwig Schmidt

Browse

Workflow Queues

arxiv Score 9.2

Published 2026-06-23 · First seen 2026-06-24

General AI

Open paper source

Abstract

Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that generalize across diverse agentic tasks. The OpenThoughts-Agent (OT-Agent) project addresses this gap with a fully open data curation pipeline for training agentic models. We conduct more than 100 controlled ablation experiments to systematically investigate each stage of the pipeline, yielding insights on the importance of task sources and diversity. We then assemble a training set of 100K examples from our pipeline and fine-tune Qwen3-32B on this dataset, which yields an average accuracy of 44.8% across seven agentic benchmarks and a 3.9 percentage point improvement over the strongest existing open data agentic model (Nemotron-Terminal-32B, 40.9%). Moreover, our training data exhibits strong scaling properties, outperforming alternative open datasets at every training set size in compute-controlled comparisons. We publicly release our training sets, data pipeline, experimental data, and models at openthoughts.ai to support future open research on agentic model training.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: soon
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{raoof2026openthoughts,
  title = {OpenThoughts-Agent: Data Recipes for Agentic Models},
  author = {Negin Raoof and Richard Zhuang and Marianna Nezhurina and Etash Guha and Atula Tejaswi and Ryan Marten and Charlie F. Ruan and Tyler Griggs and Alexander Glenn Shaw and Hritik Bansal and E. Kelly Buchanan and Artem Gazizov and Reinhard Heckel and Chinmay Hegde and Sankalp Jajee and Daanish Khazi and Emmanouil Koukoumidis and Xiangyi Li and Hange Liu and Shlok Natarajan and Harsh Raj and Nicholas Roberts and Ethan Shen and Nishad Singhi and Michael Siu and Ashima Suvarna and Hanwen Xing and Patrick Yubeaton and Robert Zhang and Leon Liangyu Chen and Xiaokun Chen and Steven Dillmann and Saadia Gabriel and Xunyi Jiang and Anurag Kashyap and Boxuan Li and Yein Park and Minh Pham and Sujay Sanghavi and Lin Shi and Ke Sun and Yixin Wang and Zhiwei Xu and Erica Zhang and Siyan Zhao and Wanjia Zhao and Jenia Jitsev and Alex Dimakis and Benjamin Feuer and Ludwig Schmidt},
  year = {2026},
  abstract = {Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that generalize across diverse agentic tasks. The OpenThoughts-Agent (OT-Agent) project addresses this gap with a fully open data curation pipeline for training agentic models. We conduct},
  url = {https://arxiv.org/abs/2606.24855},
  keywords = {cs.AI, agentic language models, data curation pipeline, training data, fine-tune, benchmarks, controlled ablation experiments, scaling properties, open-source, huggingface daily},
  eprint = {2606.24855},
  archiveprefix = {arXiv},
}

Metadata

{}