About Me

I am a final year Ph.D candidate in the College of Computer Science and Technology at Zhejiang University, under the supervision of Prof. Weiming Lu and Prof. Yueting Zhuang. Meanwhile, I am a research intern at Alibaba DAMO Academy, supervised by Xin Li, Lidong Bing. Previously, I received my master’s and bachelor’s degrees also from Zhejiang University.

Research Interests

My research interest includes Large language models, Multi-modal models, and their applications in the field of Embodied Intelligence. I have published over 10 papers at top international AI conferences for natural language processing.

  • LLM Agent: LLM-powered autonomous agents, particularly in their cognitive reasoning, task-planning and self-evolution capabilities.

  • Embodied Intelligence: Combine VLM, robot control, and reinforcement learning to build an embodied robot for daily tasks, e.g. obstacle avoidance, navigation, and manipulation.

  • AI for Education: Deploying LLMs for fundamental education, such as mathematics and psychology, to improve the performance of LLMs in education domains.

🔥 News

  • 2025.01:  🎉 Our Multimodal Textbook ranks #2 in Huggingface Trending, over 7k downloads in just two weeks.
  • 2024.11:  🎉 I am honored to be awarded the distinguished reviewer award of CIKM 2024.
  • 2024.10:  🎉 One papers are accepted by NIPS 2024, focusing on benchmarking LLM’s task planning ability (TaskBench).
  • 2024.10:  🎉 Two papers are accepted by EMNLP 2024, focusing on multimodal data synthesis (Multimodal Self-instruct) and LLM O1-style reasoning
  • 2024.05:  🎉 Four papers are accepted by ACL 2024 about LLM reflection (Self-contrast), self-evolving agent (Agent-Pro), time perception (Time-ToM), and PEFT.
  • 2024.05:  🎉 One papers are accepted by IEEE-ACM Transactions on Audio Speech and Language Processing, focusing on math reasoning.
  • 2023.10:  🎉 Data-Copilot has been accepted by Outstanding Paper of LLM Agent Workshop@ICLR 2024.
  • 2023.10:  🎉 Two papers are accepted by EMNLP 2023, focusing on math reasoning and emotion analysis.

📝 Selected Publications

(# indicates corresponding author)

Daily Paper Top1 & Rank#2@Huggingaface Trending
sym

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang, Hang Zhang, Xin Li, Jiashuo Sun, Yongliang Shen, Weiming Lu, Deli Zhao, Yueting Zhuang, Lidong Bing

arXiv Github Huggingface Pages 知乎

  • Interleaved image-text pretraining corpus from instructional videos
  • All the images and text are extracted from online instructional videos (22,000 class hours), covering multiple fundamental subjects, e.g., mathematics, physics, and chemistry.
  • Our textbook corpus providing a more coherent context and richer knowledge for image-text aligning.
  • More than 7000 downloads within two weeks (Rank #2 in Huggingface Trending)
Outstanding Paper@ICLR LLM Agent workshop
sym

Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang

arXiv Github Hugginface Spaces 知乎 机器之心

  • LLM-powered autonomous data analysis agent
  • Automated data querying, analysis, and visualization
  • Enterprise-level scenario
  • Over 1.4k stars on Github
EMNLP 2024 Oral
sym

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang

arXiv Github Project Huggingface 新智元

  • Multimodal data engine
  • Synthetic massive abstract chart data
  • Enhance the abstract image perception and reasoning ability of multimodal models
  • Over 13k downloads on Huggingface
ACL 2024
sym

Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
Wenqi Zhang, Yongliang Shen, Linjuan Wu, Qiuying Peng, Jun Wang, Yueting Zhuang, Weiming Lu

arXiv MIT科技评论 PaperWeekly

  • Investigate LLM’s self-reflection ability
  • Break the blind faith in LLM’s self-reflection ability
  • Inference time scale-up for better reasoning ability
ACL 2024
sym

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, Weiming Lu

arXiv Github 量子位 将门创投

  • Self-evolving LLM agent
  • Policy-level reflection and optimization
  • Dynamic environment and game scenarios
IJCAI 2022
sym

A Closed-Loop Perception, Decision-Making and Reasoning Mechanism for Human-Like Navigation
Wenqi Zhang, Kai Zhao, Peng Li, Xiao Zhu, Yongliang Shen, Yanna Ma, Yingfeng Chen, Weiming Lu

arXiv Github YouTube Bilibili

  • Autonomous navigation framework for robots
  • Self-exploration for better navigation strategy
  • Action-to-State inverse reasoning process
arxiv2406
sym

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing

arXiv Github hf_space hf_checkpoint

  • Open-source Video-language model
  • Over 20k downloads on Huggingface

🎖 Honors and Awards

  • 2024.10 National Scholarship (Top 1 %)
  • 2024.10 Distinguished Reviewer Award @ 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024)
  • 2024.05 Outstanding Paper @ ICLR2024 LLM Agent Workshop (TOP 3%)
  • 2021-2022, 2022-2023, 2023-2024 Excellent Postgraduate Student Scholarship of Zhejiang University
  • 2013.09 First Prize of Zhejiang University Research and Innovation Scholarship
  • 2012.11 First Prize of Zhejiang Province “Challenge Cup” College Student Business Plan Competition
  • 2012.05 The Best Project of Zhejiang University “Challenge Cup” College Student Business Plan Competition

📖 Educations

  • 2021.09 - 2025.06, Ph.D., Zhejiang University, China, Advisor: Prof. Weiming Lu, Prof. Yueting Zhuang
  • 2015.09 - 2018.06, M.S., Zhejiang University, China, Advisor: Prof. Kaichen Song
  • 2009.09 - 2013.06, B.S., Zhejiang University, China, Advisor: Prof. Hong Zhou

💻 Experience

  • 2024.05 - now, Research Intern, Alibaba DAMO Academy, Supervisor: Xin Li, Lidong Bing
    • Vision-language Pretraining
    • Developing video-language models with colleagues
  • 2020.02 - 2021.08, Algorithm Engineer, Advanced Institute of Information Technology, Peking University, Leader: Peng Li, Tao Wang
    • RL-based robot motion control and navigation framework

💬 Invited Talks

  • 2024.10, Multimodal Self-instruct @AITime [video]
  • 2024.08, LLM Agent @MetaGPT Team (DeepWisdom)
  • 2024.08, How to Apply an LLM to Data Science@觅炽科技

📂 Services

PC Member:

  • ACL 2023-2024, EMNLP 2023-2024, NAACL 2023-2024, ICLR 2024, CIKM 2024, WWW 2024, ACM-MM 2024