The official Implementation for Paper "Efficient Process Reward Model Training via Active Learning".
- 16/04/2025: Our paper is available on arxiv now!
- 14/04/2025: We release our code, models and data. Paper will be available soon.
- 14/04/2025: Within 7B PRMs, our model
sail/ActPRM-X
(based onQwen/Qwen2.5-Math-PRM-7B
) achieved new SOTA performance on ProcessBench (76.0%) and PRMBench (66.7%).
TL;DR: We achieved SOTA performance on ProcessBench (75.0%) and PRMBench (65.5%) with merely 5% labeling cost compared with Qwen/Qwen2.5-Math-PRM-7B
.
git clone https://github.com/sail-sg/ActivePRM.git
cd ActivePRM
pip install -e . # tested in conda env where python==3.11
- Evaluate our
sail/ActPRM-X
andsail/ActPRM
on ProcessBench simply by running
cd examples
python py_scripts/test_actprm_on_processbench.py
- Training PRM with Active Learning
cd examples
bash scripts/pool_based_active_learning.sh sail/ActPRMData
If you find our repo or paper helpful, please cite
@misc{duan2025actprm,
title={Efficient Process Reward Model Training via Active Learning},
author={Keyu Duan and Zichen Liu and Xin Mao and Tianyu Pang and Changyu Chen and Qiguang Chen and Michael Qizhe Shieh and Longxu Dou},
year={2025},
eprint={2504.10559},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2504.10559},
}