News & Events


Speaker:Prof. Lih-Yuan Deng (University of Memphis, USA)

Topic:Data Reduction for Big Data  
Speaker:Prof. Lih-Yuan Deng (University of Memphis, USA)
Date Time:FRI. Jan 17,2020, 11:10 AM - 12:00 PM 
Place: 4F-427, Assembly Building I

When a dataset is too big for computer memory, traditional statistical and many machine learning methods are often not applicable. Data-reduction techniques on big data include: (1) dimension reduction and/or (2) data subsampling.  In 2018, a paper published in JASA proposed information-based optimal subdata selection (IBOSS) by choosing data points with extreme values on one of many dimensions. We discuss potential weaknesses which include (1) highly sensitive to outliers (2) assumes (unrealistically) the "best" statistical model is known (3) inefficient for ultra-high dimensional data.  We believe that it is essential that subsamples chosen be representative of the full data set so that additional analysis will yield consistent results.  In this talk,  we discuss and evaluate prosed several subsampling procedures based both ideas of dimension reduction and data subsampling. Discussion and Comparison with IBOSS will be also given.
Last modification time:2020-01-17 AM 11:24

  • recruiting animation-EN
  • NCTU
cron web_use_log