Skip to content

sail-sg/ActivePRM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient Process Reward Model Training via Active Learning

The official Implementation for Paper "Efficient Process Reward Model Training via Active Learning".

Paper Hugging Face Collection

🔥 Updates

  • 16/04/2025: Our paper is available on arxiv now!
  • 14/04/2025: We release our code, models and data. Paper will be available soon.
  • 14/04/2025: Within 7B PRMs, our model sail/ActPRM-X (based on Qwen/Qwen2.5-Math-PRM-7B) achieved new SOTA performance on ProcessBench (76.0%) and PRMBench (66.7%).

🏴󠁶󠁵󠁭󠁡󠁰󠁿 Overview

TL;DR: We achieved SOTA performance on ProcessBench (75.0%) and PRMBench (65.5%) with merely 5% labeling cost compared with Qwen/Qwen2.5-Math-PRM-7B.

📊 Results

ProcessBench Figure 1
PRMBench Figure 1

⚡️ Quickstart

Installation

git clone https://github.com/sail-sg/ActivePRM.git
cd ActivePRM
pip install -e . # tested in conda env where python==3.11

Replication

  • Evaluate our sail/ActPRM-X and sail/ActPRM on ProcessBench simply by running
cd examples
python py_scripts/test_actprm_on_processbench.py
  • Training PRM with Active Learning
cd examples
bash scripts/pool_based_active_learning.sh sail/ActPRMData

Citation

If you find our repo or paper helpful, please cite

@misc{duan2025actprm,
      title={Efficient Process Reward Model Training via Active Learning}, 
      author={Keyu Duan and Zichen Liu and Xin Mao and Tianyu Pang and Changyu Chen and Qiguang Chen and Michael Qizhe Shieh and Longxu Dou},
      year={2025},
      eprint={2504.10559},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.10559}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published