Paper Detail

AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

Genghan Zhang, Shaowei Zhu, Anjiang Wei, Zhenyu Song, Allen Nie, Zhen Jia, Nandita Vijaykumar, Yida Wang, Kunle Olukotun

huggingface Score 17.5

Published 2026-04-15 · First seen 2026-04-20

General AI

Abstract

We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered slow-fast kernel pairs. We build NKIBench, a new benchmark suite of AWS Trainium accelerator kernels with varying complexity extracted from real-world LLM workloads to evaluate the effectiveness of AccelOpt. Our evaluation confirms that AccelOpt's capability improves over time, boosting the average percentage of peak throughput from 49% to 61% on Trainium 1 and from 45% to 59% on Trainium 2 for NKIBench kernels. Moreover, AccelOpt is highly cost-effective: using open-source models, it matches the kernel improvements of Claude Sonnet 4 while being 26times cheaper. The code is open-sourced at https://github.com/zhang677/AccelOpt.

Workflow Status

Review status
pending
Role
unreviewed
Read priority
now
Vote
Not set.
Saved
no
Collections
Not filed yet.
Next action
Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

Tags

No tags.

BibTeX

@misc{zhang2026accelopt,
  title = {AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization},
  author = {Genghan Zhang and Shaowei Zhu and Anjiang Wei and Zhenyu Song and Allen Nie and Zhen Jia and Nandita Vijaykumar and Yida Wang and Kunle Olukotun},
  year = {2026},
  abstract = {We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered slow-fast kernel pairs. We build NKIBench, a new benchmark suite of AWS Trainium accelerator },
  url = {https://huggingface.co/papers/2511.15915},
  keywords = {large language model, agentic system, kernel optimization, optimization memory, AWS Trainium accelerator, NKIBench, throughput improvement, cost-effectiveness, code available, huggingface daily},
  eprint = {2511.15915},
  archiveprefix = {arXiv},
}

Metadata

{}