RL for Reasoning
Post-training recipes for math, STEM, code, and long-horizon reasoning models.
Senior Researcher, Tencent · Hunyuan Department
I am a Senior Researcher at Tencent Hunyuan, where I work on RL post-training and agentic code-reasoning systems for foundation models. I have been a core contributor to Hy3.0 preview, HY2.0, Hunyuan-A13B, Hunyuan-TurboS, Hunyuan-T1, and Hunyuan-Large.
Current Focus
Post-training recipes for math, STEM, code, and long-horizon reasoning models.
Training loops for web coding agents, tool use, critic feedback, and credit assignment.
Repository-level generation, automatic benchmark generation, visual-to-code, and code critique.
Evaluation for audio-visual understanding, artifacts, tool-use, and model iteration loops.
Hunyuan
A compact view of Hunyuan model releases and technical reports where I have been a core contributor.
Open-source MoE model with long-context support.
Reasoning-focused model with code-reasoning optimization and code evaluation.
Efficient reasoning releases: adaptive CoT in TurboS and open-source MoE A13B with hybrid reasoning.
Think and Instruct variants for reasoning and instruction-following.
Open-source preview model with reasoning, coding, and agentic workflows.
Research
I am currently working on foundation models, especially code/reasoning models, reinforcement learning, agentic RL, math and STEM reasoning, code intelligence, multimodal systems, and evaluation.
Hy3.0 preview, HY2.0, Hunyuan-A13B, Hunyuan-TurboS, Hunyuan-T1, Hunyuan-Large, MAP-Neo, OpenCoder, D-CPT Law, E2-LLM, DDK.
Math/STEM reasoning optimization, RL for code and reasoning models, reinforcement learning on pre-training data, critic-guided RL, parallel thinking, and credit assignment.
Agentic coding evaluation, long-horizon repository generation, automatic benchmark generation, visual-to-code, code critique, artifact evaluation, and code LLM pretraining.
Math, code, tool-use, audio-visual, multimodal browsing, group identity evaluation, and survey resources.
Updates
Selected paper releases, conference updates, and open-source project milestones.
ICML 2026: SWE-Compass, NL2Repo-Bench, and From Diagrams to Code.
ACL 2026: CriticLean, RLPT, ReLook, and Rhombus.
Hy3.0 preview open-sourced; #1 on OpenRouter daily global API usage.
ICLR 2026: AutoCodeBench and OmniVideoBench.
Core contributor to the HY2.0 model family.
Core contributor to open-source Hunyuan-A13B, an efficient MoE model with hybrid reasoning and agent capabilities.
Hunyuan-TurboS technical report released.
ACL 2025: OpenCoder.
Core contributor to reasoning-focused Hunyuan-T1.
ICLR 2025: MTU-Bench.
Hunyuan-Large weights and technical report released.
ACL 2024: E2-LLM and ConceptMath.
Selected Work
Representative models, papers, benchmarks, and survey resources.
Open-source Hunyuan 3.0 preview model for reasoning, coding, and agentic workflows.
My role: Core contributor to post-training, RL recipes, and reasoning optimization across STEM and code.
Next-generation Hunyuan model family for reasoning and instruction-following scenarios.
My role: Core contributor to post-training and evaluation for reasoning and instruction-following models.
Open-source fine-grained MoE model with 80B total parameters, 13B active parameters, hybrid reasoning, long-context support, and strong agent capabilities.
My role: Core author and contributor, with contribution level comparable to Hunyuan-T1.
My role: Core contributor to reasoning model optimization and adaptive CoT evaluation.
My role: Core contributor to reasoning post-training, code-reasoning optimization, LiveCodeBench SOTA, and code evaluation.
Open-source MoE LLM with long-context support.
My role: Core contributor to model development and evaluation for the Hunyuan open-source LLM family.
* Equal contribution. My role: Built unified evaluation for agentic coding abilities and code-agent capability analysis.
* Equal contribution. My role: Designed agentic RL evaluation and multimodal critic loops for web coding.
* Equal contribution. My role: Built multimodal web-coding evaluation and analysis for stronger code and agentic capabilities.
Automated multimodal evaluation for interactive visual artifacts.
My role: First author; built visual-interactive code evaluation for artifact generation.
My role: First author; built holistic critique evaluation for code LLMs and critic-model analysis.
Survey and resource hub for reinforcement learning credit assignment in LLMs and agents.
* Equal contribution. My role: Co-first author.
My role: Core author; wrote most of the underlying codebase for domain-specific continual pre-training and scaling-law experiments.
Background
Research, AI infrastructure, and engineering roles across large-scale language model systems.
Hunyuan Department · Senior Researcher
Algorithm Engineer · AI Infrastructure & Research, with early work focused on AI infra
Algorithm Engineer Intern · AI Infrastructure & research systems
Algorithm Engineer Intern