Latest News and Comment from Education

Thursday, December 5, 2013

Presumed Averageness: The Mis-Application of Classical Hypothesis Testing in Education | Brookings Institution

Presumed Averageness: The Mis-Application of Classical Hypothesis Testing in Education | Brookings Institution:

Presumed Averageness: The Mis-Application of Classical Hypothesis Testing in Education


Teacher hands back homeworkImagine yourself having had a heart attack.  An ambulance arrives to transport you to a hospital emergency room.  Your ambulance driver asks you to choose between two hospitals, Hospital A or Hospital B.  At Hospital A, the mortality rate for heart attack patients is 75 percent.  At Hospital B, the mortality rate is just 20 percent.  But mortality rates are imperfect measures, based on a finite number of admissions.  If neither rate were “statistically significantly” different from average, would you be indifferent about which hospital you were delivered to?

Don’t ask your social scientist friends to help you with your dilemma.  When asked for expert advice, they apply the rules of classical hypothesis testing, which require that a difference be large enough to have no more than a 5% chance of being a fluke to be accepted as statistically significant.  (For examples, see Schochet and Chiang (2010), Hill (2009), Baker et. al. (2010).)  In many areas of science, it makes sense to assume that a medical procedure does not work, or that a vaccine is ineffective, or that the existing theory is correct, until the evidence is very strong that the original presumption (the null hypothesis) is wrong. That is why the classical hypothesis test places the burden of proof so heavily on the alternative hypothesis, and preserves the null hypothesis until the evidence is overwhelmingly to the contrary.  But that’s not the right standard to use in choosing between two hospitals.
In 1945, Herbert Simon published a classic article in the Journal of the American Statistical Association pointing out that the hypothesis testing framework is not suited to many common decisions.  He argued that in cases where decision-makers face an immediate choice between two options, where the cost of falsely rejecting the best option is not qualitatively