Paper Detail

PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models

Han Bao, Penghao Zhang, Yue Huang, Zhengqing Yuan, Yanchi Ru, Rui Su, Yujun Zhou, Xiangqi Wang, Kehan Guo, Nitesh V Chawla, Yanfang Ye, Xiangliang Zhang

Browse

Workflow Queues

arxiv Score 13.3

Published 2026-04-14 · First seen 2026-04-15

General AI

Open paper source

Abstract

Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to comprehend and reason about policy-related content remains underexplored. To fill this gap, we present \textbf{\textit{PolicyBench}}, the first large-scale cross-system benchmark (US-China) evaluating policy comprehension, comprising 21K cases across a broad spectrum of policy areas, capturing the diversity and complexity of real-world governance. Following Bloom's taxonomy, the benchmark assesses three core capabilities: (1) \textbf{Memorization}: factual recall of policy knowledge, (2) \textbf{Understanding}: conceptual and contextual reasoning, and (3) \textbf{Application}: problem-solving in real-life policy scenarios. Building on this benchmark, we further propose \textbf{\textit{PolicyMoE}}, a domain-specialized Mixture-of-Experts (MoE) model with expert modules aligned to each cognitive level. The proposed models demonstrate stronger performance on application-oriented policy tasks than on memorization or conceptual understanding, and yields the highest accuracy on structured reasoning tasks. Our results reveal key limitations of current LLMs in policy understanding and suggest paths toward more reliable, policy-focused models.

Workflow Status

Review status: pending
Role: unreviewed
Read priority: now
Vote: Not set.
Saved: no
Collections: Not filed yet.
Next action: Not filled yet.

Reading Brief

No structured notes yet. Add `summary_sections`, `why_relevant`, `claim_impact`, or `next_action` in `papers.jsonl` to enrich this view.

Why It Surfaced

No ranking explanation is available yet.

BibTeX

@article{bao2026policyllm,
  title = {PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models},
  author = {Han Bao and Penghao Zhang and Yue Huang and Zhengqing Yuan and Yanchi Ru and Rui Su and Yujun Zhou and Xiangqi Wang and Kehan Guo and Nitesh V Chawla and Yanfang Ye and Xiangliang Zhang},
  year = {2026},
  abstract = {Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to comprehend and reason about policy-related content remains underexplored. To fill this gap, we present \textbackslash{}textbf\{\textbackslash{}textit\{PolicyBench\}\}, the first large-scale cross-system benchmark (US-China) evaluating policy comprehension, comprising 21K cases across a broad spectrum of policy areas, capturing the diversity and complexity of real-world governan},
  url = {https://arxiv.org/abs/2604.12995},
  keywords = {cs.CL, cs.CY},
  eprint = {2604.12995},
  archiveprefix = {arXiv},
}

Metadata

{}