A Geometric Framework for Feature Selection from High Dimensional Data

Prof. Lizhong Zheng
Massachusetts Institute of Technology (MIT)
Tuesday, 21 November, 2017
2:30 - 3:30 pm
Room 833, Ho Sin Hang Engineering Building, CUHK

In this talk, we present an overview of a new geometric framework for the general problems of selecting informative features from high dimensional data. The key message is that such problems need to be studied in the space of distributions of the data, for which, we develop a geometric structure that provides powerful insights. With this approach, we discuss a few different formulations of feature selection: as a universal inference problem with unknown statistical models, as generalized Renyi maximal correlation, or as decomposition of common information into “orthogonal modes”. We show that these different formulations share the same structure of solution, which is an SVD (Singular Value Decomposition) structure with strong geometric insights. This effort to connect different formulations helps to establish new operational meanings of information metrics in the context of feature selection and inference, as well as defining new semantic-aware information metrics. We demonstrate an algorithm designed based on this approach, which is flexible enough to handle any data type, particularly multi-modal data with different types, time-scales, and qualities. It offers the provably optimal inference performance, as well as the minimum sample complexity. We also show that the theoretic framework based on the geometric structure can be used to understand many existing feature selection algorithms, including PCA, CCA, Compressed Sensing, and Logistic regression.



Lizhong Zheng received the B.S and M.S. degrees, in 1994 and 1997 respectively, from the Department of Electronic Engineering, Tsinghua University, China, and the Ph.D. degree, in 2002, from the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley. Since 2002, he has been working at MIT, where he is currently a professor of Electrical Engineering. His research interests include information theory, statistical inference, communications, and networks theory. He received Eli Jury award from UC Berkeley in 2002, IEEE Information Theory Society Paper Award in 2003, and NSF CAREER award in 2004, and the AFOSR Young Investigator Award in 2007. He served as an associate editor for IEEE Transactions on Information Theory, and the general co-chair for the IEEE International Symposium on Information Theory in 2012. He is an IEEE fellow.