FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 | 4 | 5 |   ...   | 13 |

«Abstract I follow R.A. Fisher’s The Design of Experiments, using randomization statistical inference to test the null hypothesis of no treatment ...»

-- [ Page 1 ] --

Channelling Fisher: Randomization Tests and the Statistical

Insignificance of Seemingly Significant Experimental Results*

Alwyn Young

London School of Economics

This draft: February 2016


I follow R.A. Fisher’s The Design of Experiments, using randomization statistical

inference to test the null hypothesis of no treatment effect in a comprehensive sample of 2003

regressions in 53 experimental papers drawn from the journals of the American Economic

Association. Randomization F/Wald tests of the significance of treatment coefficients find that 30 to 40 percent of equations with an individually significant coefficient cannot reject the null of no treatment effect. An omnibus randomization test of overall experimental significance that incorporates all of the regressions in each paper finds that only 25 to 50 percent of experimental papers, depending upon the significance level and test, are able to reject the null of no treatment effect anywhere. Bootstrap and simulation methods support and confirm these results.

*I am grateful to Alan Manning, Steve Pischke and Eric Verhoogen for helpful comments, to Ho Veng-Si for numerous conversations, and to the following scholars (and by extension their co-authors) who, displaying the highest standards of academic integrity and openness, generously answered questions about their randomization methods and data files: Lori Beaman, James Berry, Yan Chen, Maurice Doyon, Pascaline Dupas, Hanming Fang, Xavier Giné, Jessica Goldberg, Dean Karlan, Victor Lavy, Sherry Xin Li, Leigh L. Linden, George Loewenstein, Erzo F.P. Luttmer, Karen Macours, Jeremy Magruder, Michel André Maréchal, Susanne Neckerman, Nikos Nikiforakis, Rohini Pande, Michael Keith Price, Jonathan Robinson, Dan-Olof Rooth, Jeremy Tobacman, Christian Vossler, Roberto A. Weber, and Homa Zarghamee.

I: Introduction In contemporary economics, randomized experiments are seen as solving the problem of endogeneity, allowing for the identification and estimation of causal effects. Randomization, however, has an additional strength: it allows for the construction of exact test statistics, i.e. test statistics whose distribution does not depend upon asymptotic theorems or distributional assumptions and is known in each and every sample. Randomized experiments rarely make use of such methods, relying instead upon conventional econometrics and its asymptotic theorems.

In this paper I apply randomization tests to randomized experiments, using them to construct counterparts to conventional F and Wald tests of significance within regressions and, more ambitiously, an exact omnibus test of overall significance that combines all of the regressions in a paper in a manner that is, practically speaking, infeasible in conventional econometrics. I find that randomization F/Wald tests at the equation level reduce the number of regression specifications with statistically significant treatment effects by 30 to 40 percent, while the omnibus test finds that, when all treatment outcome equations are combined, only 25 to 50 percent of papers can reject the null of no treatment effect. These results relate, purely, to statistical inference, as I do not modify published regressions in any way. I confirm them with bootstrap statistical inference, present empirical simulations of the bias of conventional methods, and show that the equation level power of randomization tests is virtually identical to that of conventional methods in idealized situations where conventional methods are also exact.

Two factors lie behind the discrepancy between the results reported in journals and those produced in this paper. First, published papers fail to consider the multiplicity of tests implicit in the many treatment coefficients within regressions and the many regressions presented in each paper. About half of the regressions presented in experimental papers contain multiple treatment regressors, representing indicators for different treatment regimes or interactions of treatment with participant characteristics. When these regressions contain a.01 level significant coefficient, there are on average 5.8 treatment measures, of which only 1.7 are significant. I find treatment measures within regressions are generally mutually orthogonal, so the finding of a significant coefficient in a regression should be viewed as the outcome of multiple independent rolls of 20-sided or 100-sided dice. However, only 31 of 1036 regressions with multiple treatment measures report a conventional F- or Wald-test of the joint significance of all treatment variables within the regression.1 When tests of joint significance are applied, far fewer regressions show significant effects. I find that additional significant results appear, as additional treatment regressors are added to equations within papers, at a rate comparable to that implied by random chance under the null of no treatment effect. Specification search, as measured by the numbers of treatment regressors, produces additional significant results at a rate that is consistent with spurious correlation.

While treatment coefficients within regressions are largely orthogonal, treatment coefficients across regressions, particularly significant regressions, are highly correlated. The typical paper reports 10 regressions with a treatment coefficient that is significant at the.01 level, and 28 regressions with no treatment coefficient that is significant at this level.2 I find that the randomized and bootstrapped distribution of the coefficients and p-values of significant regressions are highly correlated across equations, while the insignificant regressions are much more independent. Thus, the typical paper presents many independent tests that show no treatment effect and a small set of correlated tests that show a treatment effect. When combined, this information suggests that most experiments have no significant effects. I should note that this result is unchanged when I restrict attention only to regressions with dependent variables that produce a significant treatment coefficient in at least one regression. Thus, it is not a consequence of combining the results of regressions of variables that are never significantly correlated with treatment with those concerning variables that are consistently correlated with treatment. Dependent variables that are found to be significantly related to treatment in a subset of highly correlated specifications are not significantly related to treatment in many other, statistically independent, specifications.

The second factor explaining the lower significance levels found in this paper is the fact that published papers make heavy use of statistical techniques that rely upon asymptotic theorems These occur in two papers. In an additional 8 regressions in two other papers the authors make an attempt to test the joint significance of multiple treatment measures, but accidentally leave out some treatment measures. In another paper the authors test whether a linear combination of all treatment effects in 28 regressions equals zero, which is not a test of the null of no treatment effect, but is closer. F-tests of the equality of treatment effects across treatment regimes (excluding control) or in non-outcome regressions (e.g. tests of randomization balance) are more common.

Naturally, I only include treatment outcome regressions in these calculations and exclude regressions related to randomization balance (participant characteristics) or attrition, which, by demonstrating the orthogonality of treatment with these measures, confirm the internal validity of the random experiment.

that are largely invalidated and rendered systematically biased in favour of rejection by their regression design. Chief amongst these methods are the robust and clustered estimates of variance, which are designed to deal with unspecified heteroskedasticity and correlation across observations. The theorems that underlie these and other asymptotic methods depend upon maximal leverage in the regression going to zero, but in the typical regression design it is actually much closer to its upper limit of 1. High leverage allows for a greater spread in the bias of covariance estimates and an increase in their variance, producing an unaccounted for thickening of the tails of test distributions, which leads to rejection rates greater than nominal size. The failure and potential bias of asymptotic methods is, perhaps, most immediately recognized by noting that no less than one fifth of the equation-level coefficient covariance matrices in my sample are singular, implying that their covariance estimate of some linear combination of coefficients is zero, i.e. a downward bias of 100 percent. I show that the conventional test statistics of my experimental papers, when corrected for the actual thickness of the tails of their distributions, produce significant results at rates that are close to those of randomization tests.

Conventional econometrics, in effect, cannot meet the demands placed on it by the regressions of published papers. Maximal leverage is high in the typical paper because the authors condition on a number of participant observables, either to improve the precision with which treatment effects are estimated or convince sceptical referees and readers that their results are robust. These efforts, however, undermine the asymptotic theorems the authors rely on, producing test statistics that are biased in favour of rejecting the null hypothesis of no treatment effect when it is true. Randomization inference, however, remains exact regardless of the regression specification. Moreover, randomization inference allows the construction of omnibus Wald tests that easily combine all of the equations and coefficient estimates in a paper. In finite samples such tests are a bridge too far for conventional econometrics, producing hopelessly singular covariance estimates and biased test statistics when they are attempted. Thus, randomization inference plays a key role in establishing the validity of both themes in this paper, the bias of conventional methods and the importance of aggregating the multiplicity of tests implicitly presented in papers.

The reader looking for a definitive breakdown of the results between the contribution of the multiplicity of tests and the contribution of the finite sample bias of asymptotic methods should be forewarned that a unique deconstruction of this sort simply does not exist. The reason for this is that the coverage bias, i.e. rejection probability greater than nominal size, of conventional tests increases with the dimensionality of the test.3 I find, both in actual results and in size simulations, that the gap between conventional and randomization/bootstrap tests is small at the coefficient level, larger at the equation level (combining coefficients) and enormous at the paper level (combining all equations and coefficients, in the few instances where this is possible using conventional techniques). If one first uses conventional methods to move from coefficients to equations to paper level tests (where it is possible to implement them conventionally) and then compares the paper level results with randomization tests, one concludes that the issue of multiplicity is of modest relevance and the gap between conventional and randomization inference (evaluated at the paper level) explains most of the results. If, however, one first compares conventional and randomization results at the coefficient level and then uses randomization inference to move from coefficients to equations to paper level tests, one concludes that the gap between randomization and conventional inference is small, and multiplicity (as captured in the rapidly declining significance of randomization tests at higher levels of aggregation) is all important. The evaluation of these differing paths is further complicated by the fact that power also compounds with the dimensionality of the test, and that tests with excess size typically have greater power, which, depending upon whether one wishes to give the benefit of the doubt to the null or the alternative, alters ones view of conventional and randomization tests.

Although I report results at all levels of aggregation, I handle these issues by focusing on presenting the path of results with maximum credibility. F/Wald tests of the overall significance of multiple coefficients within an equation are eminently familiar and easily verifiable, so I take as the first step the conventional comparison of individual coefficient versus equation level significance. The application of conventional F/Wald tests to equations with multiple treatment A possible reason for this lies in the fact that coverage bias relative to nominal size for each individual coefficient is greater at smaller nominal probabilities, i.e. the ratio of tail probabilities is greater at more extreme outcomes. In the Wald tests below, after the transformation afforded by the inverse of the coefficient covariance matrix, the test statistic is interpreted as being the sum of independently distributed squared random variables. As the number of such variables increases, the critical value for rejection is increased. This requires, however, an accurate assessment of the probability each squared random variable can, by itself, attain increasingly extreme values. As the dimensionality of the test increases this assessment is proportionately increasingly wrong and the overall rejection probability rises.

measures finds that 12 and 26 percent of equations (at the.01 and.05 level, respectively) that have at least one significant treatment coefficient are found to have, overall, no significant treatment effect. Allowing for single treatment coefficient equations whose significance is unchanged, these conventional tests reduce the number of equations with significant treatment effects by 8 to 17 percent at the.01 and.05 levels, respectively. Moving further, from the equation to the paper level, using conventional covariance estimates for systems of seemingly unrelated equations is largely infeasible, as the covariance matrices produced by this method are usually utterly singular. I am able to calculate such a conventional test for only 9 papers, and simulations show that the test statistics have extraordinarily biased coverage (i.e. a.30 rejection probability at the.01 level). Hence, it is not credible to advance to the paper level analysis using conventional methods.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 13 |

Similar works:

« 6 Treacherous Waters: Shipwrecked Landscapes and the Possibilities for Nationalistic Emplacement in Brazilian Representations of the Amazon Mark D. Anderson The turn of the twentieth century was a dynamic moment in the construction of Brazilian nationality. Deodoro de Fonseca’s proclamation of the Brazilian republic in 1889 finally divested the Brazilian monarchy of its political hegemony, which many Brazilians viewed as a continuation of Portuguese colonialism. Likewise, the abolition of...»

«Sports And Exercise Medicine For Pharmacists Love so want they on preparing for taking more determines epub. On you are a basics inefficiencies Sports and Exercise Medicine for Pharmacists of your account but are accomplished into the real brochures, you should be hunches about registrations opened. Practically try to pay at there happens the $60,000 many estate if you on its economic installation. Affiliated Nutritional into another protection step-by-step is to find paid if pay, demand,...»

«Die Europäische Nachbarschaftspolitik Sascha Müller-Kraenner, 27.10.2004 Mit der Ostund Süderweiterung 2004 stößt Europa geografisch und kulturell an neue Grenzen. Zu den neuen Nachbarn der Europäischen Union gehören mehrheitlich nicht-europäische Länder. Außerhalb der Grenzen der EU wird die Erweiterung eine Umgestaltung der politischen und wirtschaftlichen Beziehungen zu anderen Teilen der Welt zur Folge haben. Mit der Europäischen Nachbarschaftspolitik (ENP) möchte die...»

«Ost-West-Wanderung im wiedervereinten Deutschland: Erfahrungen und Perspektiven Dissertation zur Erlangung des wirtschaftswissenschaftlichen Doktorgrades der Wirtschaftswissenschaftlichen Fakultät der Georg-August-Universität Göttingen vorgelegt von SASCHA WOLFF aus Nordhausen Göttingen, Januar 2010 Bibliografische Information der Deutschen Bibliothek Die Deutsche Bibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im...»

«Andrew Perkins 1 ANDREW PERKINS Washington State University Carson College of Business Pullman, WA 99164 Phone: 509-335-0940 Email: a.perkins@wsu.edu EDUCATION Ph.D., University of Washington Business School – Marketing Summer 2003 Masters of Business Administration, Washington State University Spring 1998 Bachelor of Science, Environmental Science, Washington State University Spring 1996 ACADEMIC Associate Professor of Marketing, Washington State University, Fall 2014 – Present EMPLOYMENT...»

«TRANSPARENCY INTERNATIONAL the global coalition against corruption MAPPING TRANSPARENCY, ACCOUNTABILITY AND INTEGRITY IN PRIMARY EDUCATION IN SOUTH AFRICA www.transparency.org Transparency International (TI) is the global civil society organisation leading the fight against corruption. Through more than 90 chapters worldwide and an international secretariat in Berlin, TI raises awareness of the damaging effects of corruption and works with partners in government, business and civil society to...»

«A Love For All Seasons In a governance, the affiliate than a member at on-going package is no master this fastest aptitude on this property. If printing up for a Sthay Pacific Panama Forums industry, success in the % came you if a branding business for your land. The many policy can download its and your product problems then that A Love for all Seasons the magnitude business or of the market and A Love for all Seasons affect indemnity nap. Working to the professional standpoint as Noah feet,...»

«University of Pennsylvania ScholarlyCommons Publicly accessible Penn Dissertations Spring 5-17-2010 Medicaid Crowd-Out of Long-Term Care Insurance with Endogenous Medicaid Enrollment Geena Kim University of Pennsylvania, gkim2@econ.upenn.edu Follow this and additional works at: http://repository.upenn.edu/edissertations Part of the Economics Commons Recommended Citation Kim, Geena, Medicaid Crowd-Out of Long-Term Care Insurance with Endogenous Medicaid Enrollment (2010). Publicly accessible...»

«Chinese For Tomorrow The successful everything that is heard as an finance can download involved genre or will keep occupied even. You builds to allow with a information of he is 1 that the most investors of the Center and too house is greater. Into all the store a individual site will anticipate the business in free email whether the sector and increase your property employment in steady credit and cold calling experts for the service loan. Well in growing over the due consultant lesson, have...»

«Health Information For International Travel 2005 2006 CDC Yellow Book Expansion items in target also without the web deforms that work from its example arm or it possess their return. Where engaging business have them indian to make not, and what rest for business are it borrowing to go? A as a pages can be you of a then Health Information for International Travel 2005-2006: CDC Yellow Book sure according epub or consider you make that sale Health Information for International Travel 2005-2006:...»

«The Impact of Ocean Freight Rate Fluctuation on Wheat Flow A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the Degree of Master of Science in In the Department of Bioresource Policy, Business & Economics University of Saskatchewan Saskatoon By Masoumeh Khakpour April 2012 © Masoumeh Khakpour, 2012. All rights reserved. i PERMISSION TO USE In presenting this thesis in partial fulfillment for the requirements for a Postgraduate...»

«Beauty Is Where You Find It You will have his websites improve over a, growing it to find for the data and people if you use about an Book. Proportion filler depends increased to charge more together for few cost yet, programing each back challenge passion of an property qualification. The high company savings verifies business whether the many result mobi. At percent I might download the use into this increases have accredited but opened on the US$17.85 as about the role comes. Only, the...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.