Paper Detail

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

Hanxu Hu, Zdeněk Šnajdr, Pinzhen Chen, Jannis Vamvas, Rico Sennrich

Browse

Workflow Queues

huggingface Score 12.5

Published 2026-06-04 · First seen 2026-06-05

Research Track A · General AI

Open paper source

Abstract

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{hu2026reinforcement,
  title = {Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation},
  author = {Hanxu Hu and Zdeněk Šnajdr and Pinzhen Chen and Jannis Vamvas and Rico Sennrich},
  year = {2026},
  abstract = {Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, },
  url = {https://huggingface.co/papers/2606.06428},
  keywords = {large language models, reinforcement learning, in-context learning, supervised fine-tuning, chrF, linguistic context, zero-shot transfer, meta-skill, code available, huggingface daily},
  eprint = {2606.06428},
  archiveprefix = {arXiv},
}

Metadata

{}