About Me
I am a final year Ph.D candidate in the College of Computer Science and Technology at Zhejiang University, under the supervision of Prof. Weiming Lu and Prof. Yueting Zhuang. Meanwhile, I am a research intern at Alibaba DAMO Academy, supervised by Xin Li, Lidong Bing. Previously, I received my master’s and bachelor’s degrees also from Zhejiang University.
Research Interests
My research interest includes Large language models, Multi-modal models, and their applications in the field of Embodied Intelligence. I have published over 10 papers at top international AI conferences for natural language processing.
-
LLM Agent: LLM-powered autonomous agents, particularly in their cognitive reasoning, task-planning and self-evolution capabilities.
-
Embodied Intelligence: Combine VLM, robot control, and reinforcement learning to build an embodied robot for daily tasks, e.g. obstacle avoidance, navigation, and manipulation.
-
AI for Education: Deploying LLMs for fundamental education, such as mathematics and psychology, to improve the performance of LLMs in education domains.
🔥 News
- 2025.01: 🎉 Our Multimodal Textbook ranks #2 in Huggingface Trending, over 7k downloads in just two weeks.
- 2024.11: 🎉 I am honored to be awarded the distinguished reviewer award of CIKM 2024.
- 2024.10: 🎉 One papers are accepted by NIPS 2024, focusing on benchmarking LLM’s task planning ability (TaskBench).
- 2024.10: 🎉 Two papers are accepted by EMNLP 2024, focusing on multimodal data synthesis (Multimodal Self-instruct) and LLM O1-style reasoning
- 2024.05: 🎉 Four papers are accepted by ACL 2024 about LLM reflection (Self-contrast), self-evolving agent (Agent-Pro), time perception (Time-ToM), and PEFT.
- 2024.05: 🎉 One papers are accepted by IEEE-ACM Transactions on Audio Speech and Language Processing, focusing on math reasoning.
- 2023.10: 🎉 Data-Copilot has been accepted by Outstanding Paper of LLM Agent Workshop@ICLR 2024.
- 2023.10: 🎉 Two papers are accepted by EMNLP 2023, focusing on math reasoning and emotion analysis.
📝 Selected Publications
(# indicates corresponding author)
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang, Hang Zhang, Xin Li, Jiashuo Sun, Yongliang Shen, Weiming Lu, Deli Zhao, Yueting Zhuang, Lidong Bing
- Interleaved image-text pretraining corpus from instructional videos
- All the images and text are extracted from online instructional videos (22,000 class hours), covering multiple fundamental subjects, e.g., mathematics, physics, and chemistry.
- Our textbook corpus providing a more coherent context and richer knowledge for image-text aligning.
- More than 7000 downloads within two weeks (Rank #2 in Huggingface Trending)
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang
- LLM-powered autonomous data analysis agent
- Automated data querying, analysis, and visualization
- Enterprise-level scenario
- Over 1.4k stars on Github
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang
- Multimodal data engine
- Synthetic massive abstract chart data
- Enhance the abstract image perception and reasoning ability of multimodal models
- Over 13k downloads on Huggingface
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
Wenqi Zhang, Yongliang Shen, Linjuan Wu, Qiuying Peng, Jun Wang, Yueting Zhuang, Weiming Lu
- Investigate LLM’s self-reflection ability
- Break the blind faith in LLM’s self-reflection ability
- Inference time scale-up for better reasoning ability
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, Weiming Lu
- Self-evolving LLM agent
- Policy-level reflection and optimization
- Dynamic environment and game scenarios
A Closed-Loop Perception, Decision-Making and Reasoning Mechanism for Human-Like Navigation
Wenqi Zhang, Kai Zhao, Peng Li, Xiao Zhu, Yongliang Shen, Yanna Ma, Yingfeng Chen, Weiming Lu
- Autonomous navigation framework for robots
- Self-exploration for better navigation strategy
- Action-to-State inverse reasoning process
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing
- Open-source Video-language model
- Over 20k downloads on Huggingface
-
TASLP 2406
Specialized Mathematical Solving by a Step-by-Step Expression Chain Generation, Wenqi Zhang, Yongliang Shen, Guiyang Hou, Kuangyi Wang, Weiming Lu. -
ACL 2024 Findings
TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models, Guiyang Hou, Wenqi Zhang #, Yongliang Shen, Linjuan Wu, Weiming Lu. -
EMNLP 2023
An Expression Tree Decoding Strategy for Mathematical Equation Generation, Wenqi Zhang, Yongliang Shen, Qingpeng Nong, Zeqi Tan, Yanna Ma, Weiming Lu. -
EMNLP 2022 Findings
Multi-View Reasoning: Consistent Contrastive Learning for Math Word Problem, Wenqi Zhang, Yongliang Shen, Yanna Ma, Xiaoxia Cheng, Zeqi Tan, Qingpeng Nong, Weiming Lu. -
IROS 2021 Oral
Learning to Navigate in a VUCA Environment: Hierarchical Multi-expert Approach, Wenqi Zhang, Kai Zhao, Peng Li, Xiao Zhu, Faping Ye, Weijie Jiang, Huiqiao Fu, Tao Wang. -
arxiv2410
Entering Real Social World! Benchmarking the Theory of Mind and Socialization Capabilities of LLMs from a First-person Perspective, Guiyang Hou, Wenqi Zhang, Yongliang Shen, Zeqi Tan, Sihao Shen, Weiming Lu. -
NIPS 2024
TaskBench: Benchmarking Large Language Models for Task Automation, Yongliang Shen, Kaitao Song, Xu Tan, Wenqi Zhang, Kan Ren, Siyu Yuan, Weiming Lu, Dongsheng Li, Yueting Zhuang. -
EMNLP 2024 Main
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning, Mingqian He, Yongliang Shen, Wenqi Zhang, Zeqi Tan, Weiming Lu. -
ACL 2024 Main
Learning Global Controller in Latent Space for Parameter-Efficient Fine-Tuning, Zeqi Tan, Yongliang Shen, Xiaoxia Cheng, Chang Zong, Wenqi Zhang, Jian Shao, Weiming Lu, Yueting Zhuang. -
EMNLP 2023-Findings
Enhancing Emotion Recognition in Conversation via Multi-view Feature Alignment and Memorization, Guiyang Hou, Yongliang Shen, Wenqi Zhang, Wei Xue, Weiming Lu. -
ACL 2023
PromptNER: Prompt Locating and Typing for Named Entity Recognition, Yongliang Shen, Zeqi Tan, Shuhui Wu, Wenqi Zhang, Rongsheng Zhang, Yadong Xi, Weiming Lu, Yueting Zhuang. -
EMNLP 2022
Query-based Instance Discrimination Network for Relational Triple Extraction, Zeqi Tan, Yongliang Shen, Xuming Hu, Wenqi Zhang, Xiaoxia Cheng, Weiming Lu, Yueting Zhuang. -
IJCAI 2021
Deep Reinforcement Learning for Multi-contact Motion Planning of Hexapod Robots, Hui Fu, Kai-Fu Tang, Peng Li, Wenqi Zhang, Xinpeng Wang, Guizhou Deng, Tao Wang, Chunlin Chen.
🎖 Honors and Awards
- 2024.10 National Scholarship (Top 1 %)
- 2024.10 Distinguished Reviewer Award @ 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024)
- 2024.05 Outstanding Paper @ ICLR2024 LLM Agent Workshop (TOP 3%)
- 2021-2022, 2022-2023, 2023-2024 Excellent Postgraduate Student Scholarship of Zhejiang University
- 2013.09 First Prize of Zhejiang University Research and Innovation Scholarship
- 2012.11 First Prize of Zhejiang Province “Challenge Cup” College Student Business Plan Competition
- 2012.05 The Best Project of Zhejiang University “Challenge Cup” College Student Business Plan Competition
📖 Educations
- 2021.09 - 2025.06, Ph.D., Zhejiang University, China, Advisor: Prof. Weiming Lu, Prof. Yueting Zhuang
- 2015.09 - 2018.06, M.S., Zhejiang University, China, Advisor: Prof. Kaichen Song
- 2009.09 - 2013.06, B.S., Zhejiang University, China, Advisor: Prof. Hong Zhou
💻 Experience
- 2024.05 - now, Research Intern, Alibaba DAMO Academy, Supervisor: Xin Li, Lidong Bing
- Vision-language Pretraining
- Developing video-language models with colleagues
- 2020.02 - 2021.08, Algorithm Engineer, Advanced Institute of Information Technology, Peking University, Leader: Peng Li, Tao Wang
- RL-based robot motion control and navigation framework
💬 Invited Talks
- 2024.10, Multimodal Self-instruct @AITime [video]
- 2024.08, LLM Agent @MetaGPT Team (DeepWisdom)
- 2024.08, How to Apply an LLM to Data Science@觅炽科技
📂 Services
PC Member:
- ACL 2023-2024, EMNLP 2023-2024, NAACL 2023-2024, ICLR 2024, CIKM 2024, WWW 2024, ACM-MM 2024