Paper Detail

Lifelong In-Context Learning with Transformers Requires Parametric Forms of Attention

Luke McDermott, Robert W. Heath, Rahul Parhi

arxiv Score 25.9

Published 2026-06-24 · First seen 2026-06-25

Research Track A · General AI

Abstract

Lifelong continual learning remains an obstacle on the path to human-like intelligence. Modern transformers show sparks of intelligence with in-context learning. The quadratic nature of attention, however, prohibits transformers from performing this process on arbitrarily long sequences. In this work, we argue that extending in-context learning to lifelong settings is a practical solution for continual learning in AI agents. In particular, we argue that \emph{parametric forms of attention} are needed to understand a lifetime of context with transformers on a fixed hardware budget. These attention mechanisms learn the relationship between keys and their associated values at test-time with parametric regression. Our generalization of parametric approaches (linear attention, state-space models, fast weight programmers, and test-time training layers) contrasts with nonparametric counterparts like softmax attention. They replace the ever-growing key-value cache with an online-trainable neural network, maintaining a constant memory footprint. We highlight how parametric attention currently fall short of lifelong learning due to limited memory capacity or costly online updates. To address these issues, we pose a set of open questions with novel insights to guide the field toward long-horizon agents.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{mcdermott2026lifelong,
  title = {Lifelong In-Context Learning with Transformers Requires Parametric Forms of Attention},
  author = {Luke McDermott and Robert W. Heath and Rahul Parhi},
  year = {2026},
  abstract = {Lifelong continual learning remains an obstacle on the path to human-like intelligence. Modern transformers show sparks of intelligence with in-context learning. The quadratic nature of attention, however, prohibits transformers from performing this process on arbitrarily long sequences. In this work, we argue that extending in-context learning to lifelong settings is a practical solution for continual learning in AI agents. In particular, we argue that \textbackslash{}emph\{parametric forms of attention\} are n},
  url = {https://arxiv.org/abs/2606.25342},
  keywords = {cs.LG},
  eprint = {2606.25342},
  archiveprefix = {arXiv},
}

Metadata

{}