学术报告
学术报告
当前位置:首页  学术报告
北京大学袁坤老师学术报告通知
发布人:张艺芳  发布时间:2024-12-16   浏览次数:10

报告人袁坤助理教授

报告题目Subspace Optimization for Large Language Models with Convergence Guarantees

报告摘要Subspace optimization algorithms, with GaLore (Zhao et al., 2024) as a representative method, have gained popularity for pre-training or fine-tuning large language models (LLMs) due to their memory efficiency. However, their convergence guarantees remain unclear, particularly in stochastic settings. In this paper, we unexpectedly discover that GaLore does not always converge to the optimal solution and substantiate this finding with an explicit counterexample. We then investigate the conditions under which GaLore can achieve convergence, demonstrating that it does so either in deterministic scenarios or when using a sufficiently large mini-batch size. More significantly, we introduce GoLore (Gradient random Low-rank projection), a novel variant of GaLore that provably converges in stochastic settings, even with standard batch sizes. Our convergence analysis can be readily extended to other sparse subspace optimization algorithms. Finally, we conduct numerical experiments to validate our theoretical results and empirically explore the proposed mechanisms.

 

报告人简介 Dr. Kun Yuan is an Assistant Professor at Center for Machine Learning Research (CMLR) in Peking University. He completed his Ph.D. degree at UCLA in 2019, and was a staff algorithm engineer in Alibaba (US) Group between 2019 and 2022. His research focuses on the development of fast, scalable, reliable, and distributed algorithms with applications in large-scale optimization, deep neural network training, federated learning, and Internet of Things. He was the recipient of the 2017 IEEE Signal Processing Society Young Author Best Paper Award, and the 2017 ICCM Distinguished Paper Award.

 

报告时间20241216上午8:00-11:30; 14:00-17:30;

报告形式上午线上:腾讯会议号:779-461-142;

下午线下:理学楼609;

获取会议密码请联系taohy163@163.com