Paper Detail

AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

Genghan Zhang, Shaowei Zhu, Anjiang Wei, Zhenyu Song, Allen Nie, Zhen Jia, Nandita Vijaykumar, Yida Wang, Kunle Olukotun

Browse

Workflow Queues

huggingface Score 16.5

Published 2026-04-15 · First seen 2026-04-20

General AI

Open paper source

Abstract

We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered slow-fast kernel pairs. We build NKIBench, a new benchmark suite of AWS Trainium accelerator kernels with varying complexity extracted from real-world LLM workloads to evaluate the effectiveness of AccelOpt. Our evaluation confirms that AccelOpt's capability improves over time, boosting the average percentage of peak throughput from 49% to 61% on Trainium 1 and from 45% to 59% on Trainium 2 for NKIBench kernels. Moreover, AccelOpt is highly cost-effective: using open-source models, it matches the kernel improvements of Claude Sonnet 4 while being 26times cheaper. The code is open-sourced at https://github.com/zhang677/AccelOpt.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@misc{zhang2026accelopt,
  title = {AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization},
  author = {Genghan Zhang and Shaowei Zhu and Anjiang Wei and Zhenyu Song and Allen Nie and Zhen Jia and Nandita Vijaykumar and Yida Wang and Kunle Olukotun},
  year = {2026},
  abstract = {We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered slow-fast kernel pairs. We build NKIBench, a new benchmark suite of AWS Trainium accelerator },
  url = {https://huggingface.co/papers/2511.15915},
  keywords = {large language model, agentic system, kernel optimization, optimization memory, AWS Trainium accelerator, NKIBench, throughput improvement, cost-effectiveness, code available, huggingface daily},
  eprint = {2511.15915},
  archiveprefix = {arXiv},
}

Metadata

{}