Paper Detail

MEME: Multi-entity & Evolving Memory Evaluation

Seokwon Jung, Alexander Rubinstein, Arnas Uselis, Sangdoo Yun, Seong Joon Oh

arxiv Score 24.3

Published 2026-05-12 · First seen 2026-05-13

General AI

Abstract

LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not scored by prior work: Cascade and Absence (dependency reasoning) and Deletion (post-removal state). Evaluating six memory systems spanning three memory paradigms on 100 controlled episodes, we find that all systems collapse on dependency reasoning under the default configuration (Cascade: 3%, Absence: 1% in average accuracy) despite adequate static retrieval performance. Prompt optimization, deeper retrieval, reduced filler noise, and most stronger LLMs fail to close this gap. Only a file-based agent paired with Claude Opus 4.7 as its internal LLM partially closes the gap, but at ~70x the baseline cost, indicating closure currently depends on configurations that are not practical at scale. Code and data are available on the project page: https://seokwonjung-jay.github.io/meme-eval/.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@article{jung2026meme,
  title = {MEME: Multi-entity \& Evolving Memory Evaluation},
  author = {Seokwon Jung and Alexander Rubinstein and Arnas Uselis and Sangdoo Yun and Seong Joon Oh},
  year = {2026},
  abstract = {LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not scored by prior work: Cascade and Absence (dependency reasoning) and Deletion (post-removal state). Evaluating six memory systems spanning three memory paradigms on 100 controlled },
  url = {https://arxiv.org/abs/2605.12477},
  keywords = {cs.LG, cs.CL, LLM-based agents, persistent environments, memory systems, memory paradigms, dependency reasoning, Cascade, Absence, Deletion, code available, huggingface daily},
  eprint = {2605.12477},
  archiveprefix = {arXiv},
}

Metadata

{}