# «IZA DP No. 3753 Incorporating Cost in Power Analysis for Three-Level Cluster Randomized Designs Spyros Konstantopoulos October 2008 ...»

## DISCUSSION PAPER SERIES

IZA DP No. 3753

Incorporating Cost in Power Analysis for

Three-Level Cluster Randomized Designs

Spyros Konstantopoulos

October 2008

Forschungsinstitut

zur Zukunft der Arbeit

Institute for the Study

of Labor

Incorporating Cost in Power Analysis

for Three-Level Cluster Randomized

Designs

Spyros Konstantopoulos

Boston College

and IZA

Discussion Paper No. 3753

October 2008

IZA

P.O. Box 7240

53072 Bonn

Germany

Phone: +49-228-3894-0 Fax: +49-228-3894-180 E-mail: iza@iza.org Any opinions expressed here are those of the author(s) and not those of IZA. Research published in this series may include views on policy, but the institute itself takes no institutional policy positions.

The Institute for the Study of Labor (IZA) in Bonn is a local and virtual international research center and a place of communication between science, politics and business. IZA is an independent nonprofit organization supported by Deutsche Post World Net. The center is associated with the University of Bonn and offers a stimulating research environment through its international network, workshops and conferences, data service, project support, research visits and doctoral program. IZA engages in (i) original and internationally competitive research in all fields of labor economics, (ii) development of policy concepts, and (iii) dissemination of research results and concepts to the interested public.

IZA Discussion Papers often represent preliminary work and are circulated to encourage discussion.

Citation of such a paper should account for its provisional character. A revised version may be available directly from the author.

IZA Discussion Paper No. 3753 October 2008

## ABSTRACT

Incorporating Cost in Power Analysis for Three-Level Cluster Randomized Designs In experimental designs with nested structures entire groups (such as schools) are often assigned to treatment conditions. Key aspects of the design in these cluster randomized experiments include knowledge of the intraclass correlation structure and the sample sizes necessary to achieve adequate power to detect the treatment effect. However, the units at each level of the hierarchy have a cost associated with them and thus researchers need to decide on sample sizes given a certain budget, when designing their studies. This paper provides methods for computing power within an optimal design framework (that incorporates costs of units in all three levels) for three-level cluster randomized balanced designs with two levels of nesting. The optimal sample sizes are a function of the variances at each level and the cost of each unit. Overall, larger effect sizes, smaller intraclass correlations at the second and third level, and lower cost of level-3 and level-2 units result in higher estimates of power.JEL Classification: I20 Keywords: experimental design, statistical power, optimal sampling

**Corresponding author:**

Spyros Konstantopoulos Lynch School of Education Boston College Campion Hall, Room 336D 140 Commonwealth Avenue Chestnut Hill, MA 02467 USA E-mail: konstans@bc.edu Many populations of interest in education and the social sciences have multilevel structures. For example, in education students are nested within classrooms, and classrooms are nested within schools. Experiments that involve nested population structures may assign treatment conditions to entire groups. In education, frequently, largescale randomized experiments assign schools to treatment and control conditions and these designs are often called cluster or group randomized designs (see Bloom, 2005; Donner & Klar, 2000; Murray, 1998).

A critical issue in designing experiments is to ensure that the design has sufficient power to detect the intervention effects that are expected if the researchers’ hypotheses were correct. There is an extensive literature on the computation of statistical power (e.g., Cohen, 1988; Lipsey, 1990, Murphy & Myors, 2004). Much of this literature however, involves the computation of power in studies that use simple random samples and thus clustering effects are not included in the power analysis. Software for computing statistical power in single-level designs has also become widely available recently (Borenstein, Rothstein, & Cohen, 2001).

Statistical theory for computing power in two-level designs has also been recently documented and statistical software for two-level balanced designs is currently available (e.g., Hedges & Hedberg, 2007; Murray, 1998; Raudenbush & Liu, 2000, 2001;

Raudenbush, Spybrook, Liu, & Congdon, 2006). However, power analysis in nested designs entails challenges. First, nested factors are usually taken to have random effects, and hence, power computations usually involve the variance components structures (typically expressed via intraclass correlations) of these random effects. Second, there is not one sample size, but several sample sizes at each level of the hierarchy that may affect power differently. For example, in educational studies that assign treatments to schools, the power of the test of the treatment effect depends not only on the number of students within a classroom or a school, but on the number of classrooms or schools as well. Methods for power computations of tests of treatment effects in multi-level designs have also been discussed in the health sciences (e.g., Donner, 1984; Hsieh, 1988; Murray, 1998; Murray, Van Horn, Hawkins, & Arthur, 2006). For example, Murray and colleagues (2006) provided ways for analyzing data with complicated nested structures and discussed posthoc power computations of tests of treatment effects within the ANCOVA framework.

In addition, a more recent study discussed methods for computing power in threelevel balanced cluster randomized designs (Konstantopoulos, 2008). Many factors need to be taken into account when designing randomized experiments with a three-level structure.

For instance, in three-level cluster randomized designs with two levels of clustering (second and third level) researchers need to take into account the clustering effects at both levels and consider trade-offs that involve sampling level-1, level-2, and level-3 units. In such designs maximizing the number of level-3 units in the sample has a larger impact on the power of the test of the treatment effect than maximizing the number of level1 or levelunits (see Konstantopoulos, 2008). Also, clustering effects, often expressed via interclass correlations, affect the power estimates inversely.

In addition, the issue of optimal sampling of units at different levels of the hierarchy to maximize power is critical in designing multilevel experiments. Since larger units such as schools affect power much more than smaller units such as classrooms or students a researcher would be inclined to design large-scale experiments with numerous larger units and fewer smaller units. However, maximizing the number of larger units, such as schools, is more expensive than maximizing smaller units, such as classrooms or students. The researcher then faces the challenge of designing a cost-effective study that will optimize the power of the test of the treatment effect given the budget. This requires incorporating cost-related issues when maximizing power in cluster randomized designs (see Raudenbush, 1997). The present study discusses optimal design considerations that incorporate costs of sample sizes at different levels of the hierarchy when designing threelevel cluster randomized designs with two levels of nesting. Specifically, I follow Cochran (1977) and Raundenbush (1997) and define cost functions that involve the cost ratios among level-1, level-2, and level-3 units, and then I determine the optimal number of level-1, level-2 (and level-3) units to maximize power, given the costs. Following Raudenbush and Liu (2000) I define optimal design, under specific assumptions, a design that results in the highest estimate of power for the treatment effect.

The paper is structured as follows. First, I define the intraclass correlations in threelevel models with two levels of nesting. Second, I present the statistical model and provide an example for computing power in a three-level cluster randomized design. Then, I introduce cost functions that involve level-1, level-2, and level-3 units to maximize power.

Finally, I summarize the usefulness of the methods and draw conclusions.

Suppose that a researcher samples level-3 units at the first stage, samples level-2 units within level-3 units at the second stage, and then samples level-1 units within level-2 units at the third stage. This is a three-stage cluster sample and the variance of the total population is the sum of the within-level-2 unit between-level-1 unit variance, σ e2 ; the within-level-3 unit between-level-2 unit variance, τ 2 ; and the between-level-3 unit variance, ω 2 (see Cochran, 1977; Lohr, 1999). That is, the total variance in the outcome is decomposed into three parts and is defined as σ T = σ e2 + τ 2 + ω 2. In such three-level designs two intraclass correlations are needed to describe the variance component

**structure. These are defined as the second level intraclass correlation:**

Consider a design where level-3 units are nested within treatment, and level-2 units are nested within level-3 units and treatment (Kirk, 1995), and both level-3 and level-2 units are random effects. A structural model for an outcome Yijkl, the lth level-1 unit in the kth level-2 unit in the jth level-3 unit in the ith treatment can be described in ACOVA notation as

where μ is the grand mean, αi is the (fixed) effect of the ith treatment (i = 1,2), and the last three terms represent level-3, level-2, and level-1 random effects respectively. Specifically, β (i ) j is the random effect of level-3 unit j (j = 1,…, m) within treatment i, γ (ij ) k is the random effect of level-2 unit k (k = 1,…, p) within level-3 unit j within treatment i, and ε (ijk )l is the error term of level-1 unit l (l = 1,…, n) within level-2 unit k, within level-3 unit j, within treatment i. I assume that the level-1, level-2, and level-3 error terms are normally distributed with a mean of zero and residual variances σ e2, τ 2, and ω 2 respectively. For simplicity, I assume that there is one treatment and one control group and that the designs are balanced.

The objective is to examine the statistical significance of the treatment effect, which means to test the hypothesis

The researcher can test this hypothesis by carrying out the usual t-test. Following Konstantopoulos (2008) when the null hypothesis is false, the test statistic has a noncentral t-distribution with 2m-2 degrees of freedom and non-centrality parameter λ (assuming no covariates). The non-centrality parameter is defined as the expected value of the estimate of the treatment effect divided by the square root of the variance of the estimate of the treatment effect, namely

where m is the number of level-3 units in each condition (treatment or control group), p is the number of level-2 units within each level-3 unit, n is the number of level-1 units within each level-2 unit, and δ = α1 - α2 / σ T, where α1 and α2 are the treatment effect parameters from the ANOVA model (defined above) and σT is the population standard deviation.

The power of the two-tailed t-test at level α is

where c(α, ν) is the level α two-tailed critical value of the t-distribution with ν degrees of freedom [ c(0.05,20) = 1.72], and Η(x, ν, λ) is the cumulative distribution function of the non-central t-distribution with ν degrees of freedom and non-centrality parameter λ. The test of the treatment effect and statistical power can also be computed using the F-statistic that has a non-central F-distribution with 1 degree of freedom in the numerator and 2m – 2 degrees of freedom in the denominator and non-centrality parameter λ 2.

3 covariate effects, Xijkl is a column vector of r level-1 covariates, Zijk is a column vector of w level-2 covariates, and Wij is a column vector of q level-3 covariates, and the last three terms represent residuals at the third, second, and first level respectively. The subscript A indicates adjustment due to covariate effects, that is, the level-2 and level-3 random effects are adjusted by level-2 and level-3 covariates respectively and the level-1 error term is adjusted by level-1 covariates. I assume that the covariates at each level are centered at their means to ensure that covariates explain variation in the outcome only at the level at which they are introduced. Note that although in practice covariates could slightly adjust the treatment effect, in principle, due to randomization the treatment effect should be unadjusted. I assume that the adjusted error terms first, second, and third level are normally distributed with a mean of zero and residual variances σ Re, τ R, and ωR, respectively.

The objective in this case is to examine the statistical significance of the treatment effect adjusted by covariates, which means to test the hypothesis

Note that in this case δ and the intraclass correlations are adjusted. Specifically, the numerator of δ remains unchanged (because of orthogonality between the treatment and the covariates), whilst the denominator changes (because the total variance is now residual variance). The intraclass correlations are also adjusted. The second level intraclass correlation is now defined as

where subscript A indicates adjustment and subscript R indicates residual variance. When the null hypothesis is false, the test statistic has the non-central t-distribution with 2m-q-2 degrees of freedom (where q is the number of level-3 covariates) and non-centrality parameter λA. Following Konstantopoulos (2008) the non-centrality parameter is defined now as