:::

最新消息

:::

專題演講 主講人:Professor Shaw-Hwa Lo (Columbia University)

演講/學術活動公告
張貼人:網站管理員事件日期:2017-07-07
 
題 目:An  approach for big data variable selection and classification

主講人:Professor Shaw-Hwa Lo (Columbia University)

時 間:106年7月7日(星期五)下午14:30-15:20
(下午14:10-14:30茶會於交大統計所428室舉行)

地 點:交大綜合一館427室


摘要

Current practices toward prediction problems generally include using a significance-based criterion for evaluating variables to use in a chosen model and evaluating variables and models simultaneously for prediction, using cross-validation or independent test data. Our recent works showed that significant variables may not necessarily be predictive, and that strong predictors may not appear statistically significant at all. This left us with an important question: how can we find highly predictive variables then, if not through a guideline of statistical significance? To respond, we suggest a “Partition Retention (PR)” approach, for handling general big data variable selection and classification (prediction) problems. PR alters standard statistical practice in big data analysis, switching from significance-based modeling to seeking variables with high predictivity, a novel parameter of interest. We introduce the I-score, a statistic that can select variables sets with very high prediction rates and is closely related to a very useful lower bound of the predictivity.
There are diverse scientific applications for which the PR approach would be useful, for example in formulating predictions about diseases with high dimensional data, such as gene datasets, in the social sciences for text prediction or financial markets predictions; in terrorism, civil war, elections and financial markets. We're hoping this opens up a new field of work that would focus on designing new statistics that measure predictivity.

主辦單位:國立交通大學大數據研究中心
協辦單位:國立交通大學統計學研究所
國立清華大學統計學研究所

最後修改時間:2017-07-07 PM 3:00

  • 招生影片
  • Recruiting Animation
  • 國立交通大學
cron web_use_log