Paper Detail

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

Hanxu Hu, Zdeněk Šnajdr, Pinzhen Chen, Jannis Vamvas, Rico Sennrich

huggingface Score 12.5

Published 2026-06-04 · First seen 2026-06-05

Research Track A · General AI

Abstract

Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{hu2026reinforcement,
  title = {Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation},
  author = {Hanxu Hu and Zdeněk Šnajdr and Pinzhen Chen and Jannis Vamvas and Rico Sennrich},
  year = {2026},
  abstract = {Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, },
  url = {https://huggingface.co/papers/2606.06428},
  keywords = {large language models, reinforcement learning, in-context learning, supervised fine-tuning, chrF, linguistic context, zero-shot transfer, meta-skill, code available, huggingface daily},
  eprint = {2606.06428},
  archiveprefix = {arXiv},
}

Metadata

{}