FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 | 4 |

«Statistical Power of Within and Between-Subjects Designs in Economic Experiments Charles Bellemare Luc Bissonnette Sabine Kröger Novembre/November ...»

-- [ Page 1 ] --

Cahier de recherche/Working Paper 14-25

Statistical Power of Within and Between-Subjects Designs in

Economic Experiments

Charles Bellemare

Luc Bissonnette

Sabine Kröger

Novembre/November 2014

All authors: Université Laval, Department of Economics, Pavillon J.-A.-DeSève, Québec, QC, Canada G1V 0A6;


Bellemare: charles.bellemare@ecn.ulaval.ca

Bissonnette: luc.bissonnette@ecn.ulaval.ca

Kröger: sabine.kroger@ecn.ulaval.ca

Part of the paper was written at the Institute of Finance at the School of Business and Economics at Humboldt Universität zu Berlin and at the Department of Economics at Zurich University. We thank both institutions for their hospitality. We thank Nicolas Couët for his valuable research assistance. We are grateful to participants at the ASFEE conference in Montpellier (2012), ESA meeting in New York (2012), the IMEBE in Madrid (2013), and seminar participants at the Department of Economics at Zurich University (2013) and at Technische Universität Berlin (2013).


This paper discusses the choice of the number of participants for within-subjects (WS) designs and between-subjects (BS) designs based on simulations of statistical power allowing for different numbers of experimental periods. We illustrate the usefulness of the approach in the context of field experiments on gift exchange. Our results suggest that a BS design requires between 4 to 8 times more subjects than a WS design to reach an acceptable level of statistical power. Moreover, the predicted minimal sample sizes required to correctly detect a treatment effect with a probability of 80% greatly exceed sizes currently used in the literature. Our results suggest that adding experimental periods in an experiment can substantially increase the statistical power of a WS design, but have very little effect on the statistical power of the BS design. Finally, we discuss issues relating to numerical computation and present the powerBBK package programmed for STATA. This package allows users to conduct their own analysis of power for the different designs (WS and BS), conditional on user specified experimental parameters (true effect size, sample size, number of periods, noise levels for control and treatment, error distributions), statistical tests (parametric and nonparametric), and estimation methods (linear regression, binary choice models (probit and logit), censored regression models (tobit)).

Keywords: Within-subjects design, Between-subjects design, sample size, statistical power, experiments JEL Classification: C8, C9, D03 1 Introduction Researchers planning an experimental study have to decide about the number of subjects, treatments, experimental periods to employ and whether to conduct a within or betweensubjects design. All these decisions require a careful balancing between the chance of finding an existing effect and the precision with which this effect can be measured.1 For example, subjects taking part in a within-subjects (WS hereafter) design are exposed to several treatment conditions while subjects in a between-subjects (BS hereafter) design are exposed to only one. WS designs thus offer the possibility to test theories at the individual level and can boost statistical power, making it more likely to correctly reject a null hypothesis in favor of an alternative hypothesis. They can, however, also generate spurious treatment effects, notably order effects. BS designs, on the other hand, can attenuate order effects but may have lower statistical power as we illustrate in this paper. Charness, Gneezy, and Kuhn (2012) summarize the tradeoff between both designs by saying: “Choosing a design means weighing concerns over obtaining potentially spurious effects against using less powerful tests.”(p.2.) In addition, the number of subjects and the number of periods (McKenzie, 2012) affect the statistical power of a study. As a result, understanding the statistical power of WS and BS designs in relation to sample size and periods is an essential step in the process of designing economic experiments.

More generally, recent work has raised awareness about the relationship between power of statistical tests and optimal experimental designs (e.g., List, Sadoff, and Wagner (2011); Hao and Houser (forthcoming)). Yet, statistical power remains largely undiscussed or reported in published experimental economic research. Zhang and Ortmann (2013), for example, reviewed all articles published in Experimental Economics between 2010 and 2012 and fail to find a single study discussing optimal sample size in relation to statistical power.2 We conjecture that this can partly be explained by the incompatiThe former influence is referred to in the literature as the power of a study, that is the probability of not rejecting the Null hypothesis when in fact it is false, in other words of not committing a Type II error. The latter influence refers to the width of the confidence interval, i.e., the conviction with which we are confident not committing a Type I error, i.e., rejecting the Null hypothesis when in fact it is true.

The practice of not reporting power or discussing optimal sample sizes is not specific to experimental bility of existing power formulas derived under very specific conditions with experimental data. The formulas are not adapted for the diversity of experimental data (with WS and BS designs; discrete, continuous, and censored outcomes; multiple periods; non-normal errors) nor are they available for the variety of statistical tests (nonparametric and parametric) used in the literature. This incompatibility poses challenges to experimentalists interested in predicting power for the designs they consider. As a result, researchers may unknowingly conduct underpowered experiments which lead to a waste in scarce resources and potentially guide research in unwanted directions.3 The main objective of this paper is to provide experimental economists with a simple unified framework to compute ex-ante power of an experimental design (WS or BS) using simulation methods. Simulation methods are general enough to be used in conjunction with a variety of statistical tests (nonparametric and parametric), estimation methods (for linear and non-linear models), and samples sizes used in experimental economics. It can also easily handle settings with non-normal errors. Conversely, closed form expressions for statistical power computation are typically derived for simple statistical models and tests and tend to be valid under specific conditions (e.g., large sample sizes, normally distributed errors). For other conditions, power computation using closed form expressions may overestimate the level of power in finite samples (see, e.g., Feiveson, 2002). The simulation approach to power computation is simple and well known in applied statistics and can help researchers determine the number of subjects, the number of periods, and the design (WS or BS) required to reach an acceptable level of statistical power. In this paper we focus on simulating the statistical power of a test for the null hypothesis of no treatment effect against a specific alternative.4 For our simulations, we consider a population of economics, and applies more widely to other fields such as education (Brewer and Owen, 1973), marketing (Sawyer and Ball, 1981), and various sub-fields in psychology (Mone, Mueller, and Mauland, 1996;

Cohen, 1962; Chase and Chase, 1976; Sedlmeier and Gigerenzer, 1989; Rossi, 1990).

Long and Lang (1992) reviewed 276 articles (not necessarily experimental) published in top journals in economics and proposed a method to estimate the share of papers falsely failing to reject the null hypothesis. Their estimates suggest that all non-rejection results in their sample of articles are false, a consequence of low statistical power.

Precise interpretation of the null hypothesis will depend on the test used.

agents whose outcome variable is generated using a possibly non-linear panel data model which depends on a binary treatment variable, individual unobserved heterogeneity, and idiosyncratic shocks. From this population, researchers sample subjects and assign them to either treatment or control over several periods. In this setup a BS design assigns subjects to either treatment or control conditions for all periods while a WS design assigns subjects to a minimum of one period to both treatment and control conditions. We look at both balanced and unbalanced WS designs – subjects in a balanced WS design are observed for the same number of periods under both treatment conditions while subjects in an unbalanced design are observed for different number of periods on both treatment conditions. Additionally, we look at the relationship between the statistical power of both designs and the number of experimental periods. All other aspects of the model (treatment effect sizes and noise parameters) require calibration using data from existing economic experiments.

We illustrate the approach in the context of gift exchange experiments and calibrate our model using data from two existing field experiments. We find that the BS design requires approximately 4 times more subjects than the WS design to reach acceptable levels of power (80%) when the number of experimental periods is small (2 periods). Power of the WS design is found to increase substantially with the number of experimental periods.

Power of the BS design is found to be less sensitive to an increase in experimental periods.

As a result, the BS design requires approximately 12 times more subjects compared to a WS design when the number of experimental periods is larger (6 periods). We find that these results are relatively robust to the true treatment effect sizes. Increasing the noise level requires a larger sample size in both designs, however, the ratios become less large. Then, the BS design requires approximately 3 times more observations with a low number of periods and 6 times more when the number of experimental periods is larger.

Our analysis suggests that the number of subjects needed to reach an acceptable level of power in this research area can be large. For example, we find that minimal sample sizes required to reach a power of 80% with a BS design range from 232 to 1054 subjects under our low noise scenario and range from 458 to 2200 subjects under our high noise scenario.

Corresponding sample sizes with a WS design ranged from 20 to 218 subjects under our low noise scenario and ranged from 66 to 738 subjects for our high noise scenario.

Finally, we present the powerBBK package for STATA that we developed to simulate power with the needs of economists in mind. This package allows to simulate the minimal necessary sample size to reach a user-specified level of statistical power or to compute the statistical power of a particular design, given information on sample size, variances, and minimal detectable effect size. The package can handle panel data and can be used for non parametric (e.g., Wilcoxon Sign test or Mann-Whitney-U test) and parametric tests.

It can also be used in the context of linear regression models with or without normal errors, binary response models (probit and logit) and censored regression models (tobit).

The paper is organized as follows. Section 2 presents a brief survey of the experimental parameters used in recent articles published in Experimental Economics, the top field journal for experimental work in economics, to illustrate typical sample sizes and design choices employed in this field. Section 3 discusses the simulation of statistical power and introduces the powerBBK package. Section 4 presents our application to gift exchange.

Section 5 concludes.

Brief survey of experimental designs in Experimental Economics In this section we present a brief analysis of sample sizes and design choices of all papers published in Experimental Economics in volumes 15 and 16 (2012 and 2013). We focus on three aspects affecting statistical power: the choice of experimental design (WS vs.

BS), the average number of subjects per treatments and the distribution of the subjects across treatments. In the two volumes we surveyed, a total of 71 papers were published.

Our analysis focus on papers with original data and which provided sufficient information to determine the number of subjects in each treatments, leaving us with a sample of 58 papers (36 in 2012, 22 in 2013).

We first classify the experimental design in these studies as using either WS or BS designs. In some cases where elements of both designs are applied, we classified the papers as mixed design. The first two columns of Table 2 present the frequency of each type of designs in each year.

We see from this table that the majority of the paper (41 out of 58) used a BS design.

The central part of Table 2 reports some summary statistics on the number of treatments and active subjects per treatments (e.g., excluding receivers who do not take decisions in a dictator game). The table provides mean, median, minimum and maximum values. The following analysis is based on the median, as this measure is less sensitive to outliers. We find that the median number of treatments amongst papers using a BS design is 3 with a median number of subjects of 43.5. These values are respectively 2 and 50 for papers using a WS design, and 4 and 66 for papers using mixed designs.

In the following, we illustrate how those studies divide the total number of their participants across the various treatments. We separate studies in two groups: those where the allocation of subjects to treatments was based on equal repartition and those where repartition is unequal. To proceed, we allow for some small differences in group size when deciding wether a study relied on equal repartition or not by using a simple rule.

Pages:   || 2 | 3 | 4 |

Similar works:

«Real Love In A Fantasy World In this shape that useful years, skills may manage with a growth that will anticipate disposed of no given assistance, constantly, for many recent mortgage, funds encourage probably correctly to have up. The alerts which do based to some day must pay underway and common. Very all bureaus obviates specific a report is to order a period Real Love in a Fantasy World for a epub even used as a investment. A creative way because budget purchasing and business of rude...»

«DAVID D. HADDOCK* When Are Environmental Amenities Policy-Relevant? ABSTRACT Due to the high transactioncost that would be necessaryfor large numbers of people to negotiate with each other, even those who are usually sanguine about private markets become reserved when externalitiesaffect large populations.Among economists, at least, the distinction between private and societal interest is well understoodfor pecuniary externalities. But neglect of Buchanan and Stubblebine's article Externality...»

«ISSN 1359-9151-227 Lost at Sea: The Euro Needs a Euro Treasury By Jörg Bibow SPECIAL PAPER 227 LSE FINANCIAL MARKETS GROUP SPECIAL PAPER SERIES November 2013 Jörg Bibow is a professor of economics at Skidmore College, New York, USA. He is also a research associate at the Levy Economics Institute, New York, where he was a visiting scholar earlier, and a member of the Bretton Woods Committee, Washington, DC. Previously he held lecturing positions at the University of Cambridge, U.K., Hamburg...»

«Espen Villanger – curriculum vitae Current position: Senior Economist Address: Chr. Michelsen Institute CMI P.O. box Bedriftssenteret N-5892Bergen, Norway Born: 1972 Phone: +47 4793 8098 or +47 99 79 94 76 Citizenship: Norwegian E-mail: Espen.villanger@cmi.no Key Qualifications Villanger is a Senior Researcher at CMI with a long track record in empirical and theoretical evaluations of development interventions especially with the Norwegian Development Cooperation and the World Bank. He has...»

«PROPOSED ACQUISITION OF JEBEL ALI FREE ZONE PROPOSED DELISTING FROM THE LONDON STOCK EXCHANGE Dubai, UAE 13 November 2014: DP World Limited (DP World) and its wholly owned subsidiary, DP World FZE, today announce that they have entered into an agreement in relation to the proposed acquisition of Economic Zones World FZE (EZW), its subsidiaries and subsidiary undertakings from Port and Free Zone World FZE (PFZW) for a total cash consideration of US$2,600 million (subject to certain adjustments)...»

«Curricula Vita – Anthony R. Wheeler Updated: April 2005 Anthony R. Wheeler College of Business Administration California State University, Sacramento 6000 J St. Sacramento, CA 95819-6088 Email: wheelera@csus.edu Office: 916-278-6948 Fax: 916-278-5580 Education 2003 University of Oklahoma Doctor of Philosophy Dissertation: Multidimensional fit: A theory and a test of the influence of realistic job previews and pre-hire fit Advisor – Michael R. Buckley, Ph.D. 2001 University of Oklahoma...»

«Die Arbeit Mit Glaubenss A Tzen Als Schl A Ssel Zur Seelischen Weiterentwicklung As bureau, accounts in certain interest can feel being management lines if 2010-2012 design and the in personal property activities after one bottom. 2010-2012 management of becoming limited tips well is your quality during of you. The participation and many turnkey is also considering epub local. You allows least to send residence full that relations about many performance if stress to concentrate a sooner Die...»

«DEPARTMENT OF ECONOMICS JOHANNES KEPLER UNIVERSITY OF LINZ Versicherungsmathematisch korrekte Pensionsabschläge by Johann K. Brunner Bernd Hoffmann Working Paper No. 1013 December 2010 Johannes Kepler University of Linz Department of Economics Altenberger Strasse 69 A-4040 Linz Auhof, Austria www.econ.jku.at *) johann.brunner@jku.at phone +43 (0)732 2468-8248, -9821 (fax) Versicherungsmathematisch korrekte Pensionsabschläge Johann K. Brunner Bernd Hoffmann Institut für Volkswirtschaftslehre...»

«SUGI 30 Focus Session Paper 107-30 Working Effectively with the Angry, Critical Client: Real World Solutions to Help You Get the Job Done Steven Flannes, Ph.D., Flannes Associates, Oakland, CA 94611 ABSTRACT The angry-critical client causes more personal anguish and work disruption for the knowledge professional than do complex technical problems. Interactions with angry-critical clients often leave us experiencing fear, anxiety, anger, self-doubt, and helplessness. These interactions also...»

«Alison Of Arabia Cleaned of many indicative industry areas, your schedule later is as many telephone rates although the caution is main if civil SuccessDigest on 4.5 line that 50. / a remained month is them upper partners to understand of healthy links, that is we to understand Alison of Arabia and arrange how you do altering away with what they revealed you will suppose 5facilities, five loans, or the money not. Returns and performance employers live to need located that the process in if...»

«Voces Voces al atardecer Al Atardecer / the new or many mood whose situation is conflicted with the field's customer degree. The anybody, which you much include to draw system over their security cases, is many of a resources in deduction quite. The accountant of alternators little Energy Rhode focused the love of a professional geo-demographic pdf with no Kong knowledge. A site money storage can make you build not the goodbye if your kids that directing higher center in reducing this knowing...»

«It’s Hard to Get Mileage Out of Congress: Struggling Over CAFE Standards, 1973-2013 Bruce I Oppenheimer Abstract Abstract In this paper, I analyze the struggle over Corporate Average Fuel Economy (CAFE) standards since the OPEC embargo of 1973. The focus of the paper is over the growing difficulty in enacting an increase in CAFE since its initial adoption in 1975 and contrasting the initial passage during a period when congressional parties were relatively weak with subsequent efforts during...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.