Mathematical Economics, 2014, Nr 10 (17), s. 5-16
Controlling the effect of multiple testing in Big Data
Dla wszystkich w zakresie dozwolonego użytku
Uniwersytet Ekonomiczny we Wrocławiu
Wszystkie prawa zastrzeżone (Copyright)
Big Data
2014
artykuł
Wrocław
DOI: 10.15611/me.2014.10.01
application/pdf
application/pdf
application/pdf
eng
Big Data poses a new challenge to statistical data analysis. An enormous growthof available data and their multidimensionality challenge the usefulness of classical methodsof analysis. One of the most important stages in Big Data analysis is the verification ofhypotheses and conclusions. With the growth of the number of hypotheses, each of which istested at significance level, the risk of erroneous rejections of true null hypotheses increases.Big Data analysts often deal with sets consisting of thousands, or even hundreds ofthousands of inferences. FWER-controlling procedures recommended by Tukey [1953], areeffective only for small families of inferences. In cases of numerous families of inferencesin Big Data analyses it is better to control FDR, that is the expected value of the fraction oferroneous rejections out of all rejections. The paper presents marginal procedures of multipletesting which allow for controlling FDR as well as their interesting alternative, that isthe joint procedure of multiple testing MTP based on resampling [...]
multiple testing
Denkowska, Sabina
FDR
Mathematical Economics, 2014, Nr 10 (17)
Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu