Paper Detail

GROW$^2$: Grounding Which and Where for Robot Tool Use

Yuhong Deng, Yuyao Liu, David Hsu

arxiv Score 14.8

Published 2026-06-29 · First seen 2026-06-30

General AI

Abstract

Can the robot use a plate to cut a cake if no knife is available? Tool use greatly expands robot capabilities, but to use tools creatively beyond their intended functions, the robot faces the challenge of $\textit{open-world affordance grounding}$: select an open-category object to act as a tool and localize its specific region of action. To this end, we introduce GROW$^2$ (GROunding Which and Where), which leverages object parts as a natural abstraction to split the grounding process hierarchically into semantic and geometric levels, thus bypassing the need for data-heavy, end-to-end training. Semantically, GROW$^2$ harnesses the commonsense reasoning of Vision-Language Models (VLMs) to parse a natural-language task instruction, select a suitable object as the tool, and identify task-relevant parts on the tool and the target object. Geometrically, vision foundation models then ground the selected parts into precise 3D regions from a single RGB-D image. Experiments on established benchmarks show that GROW$^2$ outperforms state-of-the-art baselines on affordance prediction benchmarks. Further, it achieves zero-shot generalization over open-category objects and outperforms baselines in both simulated and real-world robot tool use experiments.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{deng2026grow,
  title = {GROW\$\textasciicircum{}2\$: Grounding Which and Where for Robot Tool Use},
  author = {Yuhong Deng and Yuyao Liu and David Hsu},
  year = {2026},
  abstract = {Can the robot use a plate to cut a cake if no knife is available? Tool use greatly expands robot capabilities, but to use tools creatively beyond their intended functions, the robot faces the challenge of \$\textbackslash{}textit\{open-world affordance grounding\}\$: select an open-category object to act as a tool and localize its specific region of action. To this end, we introduce GROW\$\textasciicircum{}2\$ (GROunding Which and Where), which leverages object parts as a natural abstraction to split the grounding process hierarchic},
  url = {https://arxiv.org/abs/2606.30632},
  keywords = {cs.RO, cs.AI, cs.CV},
  eprint = {2606.30632},
  archiveprefix = {arXiv},
}

Metadata

{}