News & Events


Speaker:Prof. Shaw-Hwa Lo (Department of Statistics, Columbia University)


Topic:The implications of the I-score with complex data   
Speaker:Prof. Shaw-Hwa Lo (Department of Statistics, Columbia University)
Date Time:FRI. Jan 17,2020, 10:10 AM - 11:00 AM 
Place: 4F-427, Assembly Building I

Although the quantity and complexity of data have grown, the literature does not have a theoretical/ practical quantification that allows for measuring predictivity--- a set of variables can maximally predict. One popular way of variables selection (VS) is through the identification of variables correlated with the outcome, selected through tests of statistical significance such as Chi-squared test. Recently we pointed out in (PNAS) that this approach suffers from the problem that significant variables are not necessarily predictive and vice versa. As a result, targeting significant variable might miss the goal of VS for higher predictiveity. Instead, we proposed a novel statistics I-score to fix this problem. I-score has two highly desirable properties: 1. It has the ability to search for all predictive variables sets with interactions; secondly, for each predictive set found a predictive lower bound is obtained indicating how important this set can predict. However, the explanations given above do not provide scientific answers to all major questions. We offer here what we hope is an explanation for how inference and prediction converge and diverge in everyday scientific data analysis. Specifically, we seek to answer three of the ‘outstanding questions’ posed by the recent article by Bzdok and Ioannidis (2019) for the neuroscience community, and the scientific community at large. These include 1) When do statistically significant variables usefully contribute to accurate predictions? 2) When are variables found to be predictive but not declared to be statistically significant? and 3) when can variables serve both these modeling goals?
The message of this lecture is to demonstrate the wide applications of I-score in several directions: in classification theory, in GWAS, in deep learning, in social sciences, in causal inference and in computer science. Finally, we will use MNIST data to demonstrate that the use of I-score  will not only improve the accuracy it may  sometime provide  considerable reduction of number of variables involved, leading to a better interpretation and reproducibility.
Last modification time:2020-01-17 AM 11:21

  • NCTU
cron web_use_log