报告题目:Feature Screening with Conditional Rank Utility for Big-Data Classification
报 告 人:徐晨 研究员(鹏城国家实验室)
负 责 人:王建军
报告时间:2024年12月10日(星期二)上午10:00-11:30
报告地点:数学大楼报告厅四(912)
参加人员:教师、研究生、本科生
报告摘要:Feature screening is a commonly used strategy to eliminate irrelevant features in high-dimensional classification. When one encounters big datasets with both high dimensionality and huge sample size, the conventional screening methods become computationally costly or even infeasible. In this article, we introduce a novel screening utility, Conditional Rank Utility (CRU), and propose a distributed feature screening procedure for the big-data classification. The proposed CRU effectively quantifies the significance of a numerical feature on the categorical response. Since CRU is constructed based on the ratio of the mean conditional rank to the mean unconditional rank of a feature, it is robust against model misspecification and the presence of outliers. Structurally, CRU can be expressed as a simple function of a few component parameters, each of which can be distributively estimated using a natural unbiased estimator from the data segments. Under mild conditions, we show that the distributed estimator of CRU is fully efficient in terms of the probability convergence bound and the mean squared error rate; the corresponding distributed screening procedure enjoys the sure screening and ranking properties. The promising performances of the CRU-based screening are supported by extensive numerical examples.
报告人简介:徐晨,鹏城国家实验室研究员、西安交通大学领军学者、国家重大人才计划入选者、深圳孔雀计划特聘专家。2012年于加拿大不列颠哥伦比亚大学取得统计学博士学位,而后赴美国宾州州立大学(2013-2015)、加拿大渥太华大学(2015-2023)工作任教。徐晨教授长期从事大数据统计学的基础理论与方法研究,在大数据特征筛选/降维、再抽样理论与方法、分布式统计分析等领域取得系统性创新成果,做出多个原创性贡献。在统计学与机器学习国际著名杂志及会议发表研究论文50余篇; 主持中加多项国家级科研项目。现任统计学权威杂志JASA、EJS的副主编,曾任CJS、Neurocomputing、Survey Sampling等国际知名杂志的编委或客座主编。