Paper Detail

FOGO: Forgetting-aware Orthogonalization Optimizer

Toan Nguyen, Yang Liu, Trung Le, Celso de Melo, Flora D. Salim

arxiv Score 27.5

Published 2026-06-09 · First seen 2026-06-10

Research Track A · General AI

Abstract

We argue that forgetting is not confined to continual learning but is a general optimization phenomenon: during standard training, dominant mini-batch gradients suppress rare but useful update directions, causing short-term forgetting at every step. When such knowledge is never revisited, these losses compound into long-term forgetting-the classical failure mode of continual learning. We introduce FOGO, a scalable optimizer that continuously detects and resolves gradient interference across both regimes. FOGO spectrally orthogonalizes momentum updates to prevent dominant directions from monopolizing optimization, then stores representative past directions in a compact codebook memory built on random projection, where pairwise distances are provably preserved in low-dimensional space. At each step, conflicts between the current update and stored directions are resolved via lightweight orthogonal correction and lifted back through a proximal step, with minimal overhead and no data storage. Across class-imbalanced classification, continual visual learning under domain and class shifts, continual fine-tuning of LLaVA-7B, and GPT-2 pretraining, FOGO consistently improves convergence and knowledge retention, outperforming Adam and Muon.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{nguyen2026fogo,
  title = {FOGO: Forgetting-aware Orthogonalization Optimizer},
  author = {Toan Nguyen and Yang Liu and Trung Le and Celso de Melo and Flora D. Salim},
  year = {2026},
  abstract = {We argue that forgetting is not confined to continual learning but is a general optimization phenomenon: during standard training, dominant mini-batch gradients suppress rare but useful update directions, causing short-term forgetting at every step. When such knowledge is never revisited, these losses compound into long-term forgetting-the classical failure mode of continual learning. We introduce FOGO, a scalable optimizer that continuously detects and resolves gradient interference across both},
  url = {https://arxiv.org/abs/2606.10406},
  keywords = {cs.LG, cs.AI},
  eprint = {2606.10406},
  archiveprefix = {arXiv},
}

Metadata

{}