Paper Detail

Lifelong In-Context Learning with Transformers Requires Parametric Forms of Attention

Luke McDermott, Robert W. Heath, Rahul Parhi

Browse

Workflow Queues

arxiv Score 25.9

Published 2026-06-24 · First seen 2026-06-25

Research Track A · General AI

Open paper source

Abstract

Lifelong continual learning remains an obstacle on the path to human-like intelligence. Modern transformers show sparks of intelligence with in-context learning. The quadratic nature of attention, however, prohibits transformers from performing this process on arbitrarily long sequences. In this work, we argue that extending in-context learning to lifelong settings is a practical solution for continual learning in AI agents. In particular, we argue that \emph{parametric forms of attention} are needed to understand a lifetime of context with transformers on a fixed hardware budget. These attention mechanisms learn the relationship between keys and their associated values at test-time with parametric regression. Our generalization of parametric approaches (linear attention, state-space models, fast weight programmers, and test-time training layers) contrasts with nonparametric counterparts like softmax attention. They replace the ever-growing key-value cache with an online-trainable neural network, maintaining a constant memory footprint. We highlight how parametric attention currently fall short of lifelong learning due to limited memory capacity or costly online updates. To address these issues, we pose a set of open questions with novel insights to guide the field toward long-horizon agents.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{mcdermott2026lifelong,
  title = {Lifelong In-Context Learning with Transformers Requires Parametric Forms of Attention},
  author = {Luke McDermott and Robert W. Heath and Rahul Parhi},
  year = {2026},
  abstract = {Lifelong continual learning remains an obstacle on the path to human-like intelligence. Modern transformers show sparks of intelligence with in-context learning. The quadratic nature of attention, however, prohibits transformers from performing this process on arbitrarily long sequences. In this work, we argue that extending in-context learning to lifelong settings is a practical solution for continual learning in AI agents. In particular, we argue that \textbackslash{}emph\{parametric forms of attention\} are n},
  url = {https://arxiv.org/abs/2606.25342},
  keywords = {cs.LG},
  eprint = {2606.25342},
  archiveprefix = {arXiv},
}

Metadata

{}