Paper Detail

FOGO: Forgetting-aware Orthogonalization Optimizer

Toan Nguyen, Yang Liu, Trung Le, Celso de Melo, Flora D. Salim

Browse

Workflow Queues

arxiv Score 27.5

Published 2026-06-09 · First seen 2026-06-10

Research Track A · General AI

Open paper source

Abstract

We argue that forgetting is not confined to continual learning but is a general optimization phenomenon: during standard training, dominant mini-batch gradients suppress rare but useful update directions, causing short-term forgetting at every step. When such knowledge is never revisited, these losses compound into long-term forgetting-the classical failure mode of continual learning. We introduce FOGO, a scalable optimizer that continuously detects and resolves gradient interference across both regimes. FOGO spectrally orthogonalizes momentum updates to prevent dominant directions from monopolizing optimization, then stores representative past directions in a compact codebook memory built on random projection, where pairwise distances are provably preserved in low-dimensional space. At each step, conflicts between the current update and stored directions are resolved via lightweight orthogonal correction and lifted back through a proximal step, with minimal overhead and no data storage. Across class-imbalanced classification, continual visual learning under domain and class shifts, continual fine-tuning of LLaVA-7B, and GPT-2 pretraining, FOGO consistently improves convergence and knowledge retention, outperforming Adam and Muon.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{nguyen2026fogo,
  title = {FOGO: Forgetting-aware Orthogonalization Optimizer},
  author = {Toan Nguyen and Yang Liu and Trung Le and Celso de Melo and Flora D. Salim},
  year = {2026},
  abstract = {We argue that forgetting is not confined to continual learning but is a general optimization phenomenon: during standard training, dominant mini-batch gradients suppress rare but useful update directions, causing short-term forgetting at every step. When such knowledge is never revisited, these losses compound into long-term forgetting-the classical failure mode of continual learning. We introduce FOGO, a scalable optimizer that continuously detects and resolves gradient interference across both},
  url = {https://arxiv.org/abs/2606.10406},
  keywords = {cs.LG, cs.AI},
  eprint = {2606.10406},
  archiveprefix = {arXiv},
}

Metadata

{}