:::

最新消息

:::

專題演講 主講人:羅小華教授 (哥倫比亞大學統計系教授、交大統計所講座教授)

演講/學術活動公告
張貼人:網站管理員事件日期:2019-01-04
 
 

題 目:Selecting Influential & Predictive Variables for a BIGDATA set

主講人:羅小華教授 (哥倫比亞大學統計系教授、交大統計所講座教授) 
        
時 間:108年1月4日(星期五)上午10:40-11:30
(上午10:20-10:40茶會於交大統計所428室舉行)

地 點:交大綜合一館427室


摘要

Identifying variables or factors that are influential for response and good for prediction are two important aims. When data are not large and numbers of variables involved are small to moderate, applying the methods (or their variations) developed during last century produced good results and thus served Sciences well. As data grow unwieldly during last 10 years, searching for important variables from a much larger collection of variables, most of them are noisy variables with no useful information, become urgent and challenging.  To respond this challenge, we consider an alternative "Partition Retention" (PR) approach, for variable selection and prediction problems involving very complex and large data sets. This approach seeks to alter statistical practice and predictive literature in the analysis of big data by changing the focus from common significance-based modeling to evaluating variables’ ability to predict.

This approach directly measures a variable set's ability to predict (termed “predictivity”), the I-score, without relying on the CV. There are many important and challenging problems arising in BIGDATA which require innovative idea and methods to treat them, including areas from all Natural, Social and Engineering Sciences.  We argue that the I-score not only reflects the true amount of interactions among variables, it can be related to a lower bound of the correct prediction rate and does not over fit. The values of the I-score measure the amount of “influence” of the variables set under consideration. We suggest shifting the research agenda toward searching for a new criterion to locate highly predictive variables using partition retention (PR) method with I-score. The PR was effective in reducing prediction error from 30% to 8% on a long-studied breast cancer data set.




最後修改時間:2019-01-04 AM 11:37

  • 招生影片
  • Recruiting Animation
  • 國立交通大學
cron web_use_log