报告题目:Communication-Efficient Pilot Estimation for Non-Randomly Distributed Data in Diverging Dimensions
报 告 人:夏小超(重庆大学)
报告时间:2025年11月8日(星期六)11:20-12:00
报告地点:数学大楼报告厅三(814)
参加人员:教师、研究生、本科生
报告摘要:The communication-efficient surrogate likelihood (CSL) framework \citep{jordan2019communication} is notable for handling massive or distributed datasets. The CSL methods use the first machine as the central one for optimization with its data and assume a fixed dimension for statistical properties. However, CSL may not suit non-randomly or heterogeneously distributed data and limit its applicability to diverging- or high-dimensional datasets. To address these issues, we propose a communication-efficient pilot (CEP) estimation strategy. This involves pilot sampling on each machine to create a pilot sample dataset and using a new pilot sample-based surrogate loss to approximate the global one, with the minimizer termed the CEP estimator. We rigorously investigate theoretical properties of the CEP estimator including its convergence rate, reaching the global rate $\sqrt{\frac{p_n}{N}}$, and its asymptotic normality when the dimension $p_n$ diverges with the pilot sample size $r$ and $p_n < n$. Additionally, we extend CEP to high-dimensional cases ($p_n>n$) and propose a regularized version of CEP (CERP). We establish non-asymptotic error bounds for the CERP estimator with Lasso penalty (CERP-Lasso) and provide convergence rates and asymptotic normality for the CERP estimator with adaptive Lasso penalty (CERP-aLasso) under generalized linear models. Extensive synthetic and real datasets demonstrate the effectiveness of our approaches.
报告人简介:夏小超,重庆大学数学与统计学院副教授,主要从事高维数据与海量数据的统计方法、理论和应用研究,主持完成国家自然科学基金青年项目,发表论文20余篇,担任中国现场统计研究会第十一届理事。