✨ About Me

Hello, my name is Kongcheng Zhang (张孔枨). I am currently a second-year master student in the College of Computer Science and Technology at Zhejiang University and a member of VIPA Group, supervised by Prof. Mingli Song. In 2024, I received my B.Eng. degree in Computer Science from Zhejiang University and was admitted to persue my M.S. degree in Zhejiang University without entrance examination.

My research field is Large Language Models (LLMs), particularly focusing on pushing forward the reasoning (e.g., math and instruction following) and agentic (e.g., coding and tool use) capabilities in LLMs through Reinforcement Learning (RL). Please feel free to contact me if you are interested in my research :)

📖 Educations

📝 Selected Publications

* denotes equal contribution.

  • Replay Failures as Successes: Sample-Efficient Reinforcement Learning for Instruction Following

    Instruction Following
    Kongcheng Zhang, Qi Yao, Shunyu Liu, Wenjian Zhang, Min Cen, Yang Zhou, Wenkai Fang, Yiru Zhao, Baisheng Lai, Mingli Song

    arXiv preprint arXiv:2512.23457

  • Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning

    Self Rewarding
    Kongcheng Zhang, Qi Yao, Shunyu Liu, Yingjie Wang, Baisheng Lai, Jieping Ye, Mingli Song, Dacheng Tao

    Advances in Neural Information Processing Systems (NeurIPS), 2025

  • Reasoning with Reinforced Functional Token Tuning

    Math Reasoning
    Kongcheng Zhang, Qi Yao, Baisheng Lai, Jiaxing Huang, Wenkai Fang, Dacheng Tao, Mingli Song, Shunyu Liu

    International Conference on Learning Representations (ICLR), 2026

  • Odyssey: Empowering Minecraft Agents with Open-World Skills

    Agent Skill
    Shunyu Liu*, Yaoru Li*, Kongcheng Zhang*, Zhenyu Cui*, Wenkai Fang*, Yuxuan Zheng, Tongya Zheng, Mingli Song

    International Joint Conference on Artificial Intelligence (IJCAI), 2025

  • SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data

    Self Play
    Wenkai Fang, Shunyu Liu, Yang Zhou, Kongcheng Zhang, Tongya Zheng, Kaixuan Chen, Mingli Song, Dacheng Tao

    Advances in Neural Information Processing Systems (NeurIPS), 2025

  • MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models

    Safety
    Siyu Yan, Long Zeng, Xuecheng Wu, Chengcheng Han, Kongcheng Zhang, Chong Peng, Xuezhi Cao, Xunliang Cai, Chenjuan Guo

    Empirical Methods in Natural Language Processing (EMNLP), 2025

  • Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

    Rubrics
    Yang Zhou, Sunzhu Li, Shunyu Liu, Wenkai Fang, Kongcheng Zhang, Jiale Zhao, Jingwen Yang, Yihe Zhou, Jianwei Lv, Tongya Zheng, Hengtong Lu, Wei Chen, Yan Xie, Mingli Song

    arXiv preprint arXiv:2508.16949

💬 Academic Services

Reviewer: ICLR 2026, ICML 2026