Paper Detail

Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

Adriana Aida, Walida Amer, Katarina Bankovic, Dhruv Behl, Fabian Busch, Annie Bhalla, Minh Duong, Florian Gienger, Rohan Godse, Denis Grachev, Ralf Gulde, Elisa Hagensieker, Junpeng Hu, Shivam Joshi, Tobias Knoblauch, Likith Kumar, Damien LaRocque, Keerthana Lokesh, Omar Moured, Khiem Nguyen, Christian Preyss, Ranjith Sriganesan, Vikram Singh, Carsten Sponner, Anh Tong, Dominik Tuscher, Marc Tuscher, Pavan Upputuri

huggingface Score 6.5

Published 2026-04-22 · First seen 2026-04-23

General AI

Abstract

Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Language-Action models have demonstrated strong generalization, they remain fundamentally reactive. By optimizing the next action given the current observation without evaluating potential futures, they are brittle to the compounding failure modes of long-horizon tasks. Cortex 2.0 shifts from reactive control to plan-and-act by generating candidate future trajectories in visual latent space, scoring them for expected success and efficiency, then committing only to the highest-scoring candidate. We evaluate Cortex 2.0 on a single-arm and dual-arm manipulation platform across four tasks of increasing complexity: pick and place, item and trash sorting, screw sorting, and shoebox unpacking. Cortex 2.0 consistently outperforms state-of-the-art Vision-Language-Action baselines, achieving the best results across all tasks. The system remains reliable in unstructured environments characterized by heavy clutter, frequent occlusions, and contact-rich manipulation, where reactive policies fail. These results demonstrate that world-model-based planning can operate reliably in complex industrial environments.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
soon
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{aida2026cortex,
  title = {Cortex 2.0: Grounding World Models in Real-World Industrial Deployment},
  author = {Adriana Aida and Walida Amer and Katarina Bankovic and Dhruv Behl and Fabian Busch and Annie Bhalla and Minh Duong and Florian Gienger and Rohan Godse and Denis Grachev and Ralf Gulde and Elisa Hagensieker and Junpeng Hu and Shivam Joshi and Tobias Knoblauch and Likith Kumar and Damien LaRocque and Keerthana Lokesh and Omar Moured and Khiem Nguyen and Christian Preyss and Ranjith Sriganesan and Vikram Singh and Carsten Sponner and Anh Tong and Dominik Tuscher and Marc Tuscher and Pavan Upputuri},
  year = {2026},
  abstract = {Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Language-Action models have demonstrated strong generalization, they remain fundamentally reactive. By optimizing the next action given the current observation without evaluating potential futures, they are brittle to the compounding failure modes of long-horizon tasks. Cortex 2.0 shifts from reactive control to plan-and-act by generating candidate fu},
  url = {https://huggingface.co/papers/2604.20246},
  keywords = {Vision-Language-Action models, plan-and-act control, visual latent space, trajectory generation, world-model-based planning, huggingface daily},
  eprint = {2604.20246},
  archiveprefix = {arXiv},
}

Metadata

{}