«ABSTRACT. Researchers seldom realize 100% participation for any research study. If participants and non-participants are systematically different, ...»
Detecting and Statistically Correcting
Sample Selection Bias
John G. Orme
ABSTRACT. Researchers seldom realize 100% participation for any
research study. If participants and non-participants are systematically
different, substantive results may be biased in unknown ways, and external or internal validity may be compromised. Typically social work reGary Cuddeback, MSW, MPH, is Research Associate, Cecil G. Sheps Center for
Health Services Research, University of North Carolina at Chapel Hill, 101 Connor Drive, Suite 302, Chapel Hill, NC 27514.
Elizabeth Wilson, MSW, is a Doctoral Student, John G. Orme, PhD, is Professor, and Terri Combs-Orme, PhD, is Associate Professor, University of Tennessee, College of Social Work, Children’s Mental Health Services Research Center, Henson Hall, Knoxville, TN 37996-3332.
Address correspondence to: Gary S. Cuddeback, Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill, 101 Connor Drive, Suite 302, Chapel Hill, NC 27514 or John G. Orme, University of Tennessee, College of Social Work, Children’s Mental Health Services Research Center, 128 Henson Hall, Knoxville, TN 37996-3332 (E-mail: Gary S. Cuddeback at email@example.com.
unc.edu or John G. Orme at firstname.lastname@example.org). The authors would like to express their appreciation to Shenyang Guo for his suggestions on an earlier version of this paper and to Niki Le Prohn at Casey Family Programs for her support.
This article is the result of a collaborative effort of the Casey Family Programs Foster Family Project at The University of Tennessee and Casey Family Programs, an operating foundation delivering foster care, adoption, and other permanency planning services in 14 states. You may review more information about Casey Family Programs at http://www.casey.research.org/research and more information about Casey Family Programs Foster Family Project at The University of Tennesee at http://www.
Journal of Social Service Research, Vol. 30(3) 2004 http://www.haworthpress.com/web/JSSR 2004 by The Haworth Press, Inc. All rights reserved.
Digital Object Identifier: 10.1300/J079v30n03_02 19
20 JOURNAL OF SOCIAL SERVICE RESEARCHsearchers use bivariate tests to detect selection bias (e.g., χ to compare the race of participants and non-participants). Occasionally multiple regression methods are used (e.g., logistic regression with participation/ non-participation as the dependent variable). Neither of these methods can be used to correct substantive results for selection bias. Sample selection models are a well-developed class of econometric models that can be used to detect and correct for selection bias, but these are rarely used in social work research. Sample selection models can help further social work research by providing researchers with methods of detecting and correcting sample selection bias. [Article copies available for a fee from
The Haworth Document Delivery Service: 1-800-HAWORTH. E-mail address:
email@example.com Website: http://www.HaworthPress.com © 2004 by The Haworth Press, Inc. All rights reserved.]
KEYWORDS. Sample selection bias, statistical methods, social work research
Whether conducting a survey of life experiences and attitudes or examining relationships among predictor and response variables, social work researchers generally intend to make inferences beyond a study’s participants to a population. It is the collection of a probability sample where each element has a known non-zero probability of selection that permits the use of statistics to make inferences about a population (Kish, 1965).
The selection of a probability sample is only the first step, however. Researchers must then ensure the participation of those selected. Researchers seldom realize 100% participation, usually for one of two reasons. First, selected individuals can, and frequently do, refuse to participate. This is problematic if collectively the individuals who do not participate are systematically different from those who do, and consequently the final sample may be biased. This is known as “sample selection bias.” Second, selected individuals may agree to participate but then be “lost” over time due to transience, incarceration, death, or other reasons. The final sample might be biased if the individuals who are lost differ in some systematic way from the participants who remain. This is known as “attrition bias.” Unlike sample selection bias, attrition bias mainly occurs in longitudinal studies. The remainder of this paper will concentrate on sample selection bias, however, some readings on attrition bias will be presented later.
Cuddeback et al. 21 Selection bias occurs because non-participation is rarely random (e.g., distributed equally across subgroups); instead, bias often is correlated with variables that also are related to the dependent variable of interest or that preclude using the sample to describe the target population (Goodfellow, Kiernan, Ahern, & Smyer, 1988). York (1998) defines selection bias as “any characteristic of a sample that is believed to make it different from the study population in some important way” (p. 239). Finally, Winship and Mare (1992) report that selection bias can occur when observations in social research are selected such that they are not independent of the outcome variables in the study, possibly leading to biased inferences about social processes.
Beginning in the late 1970s and early 1980s (Greene, 1981; Heckman, 1976, 1978, 1979), methods for detecting and statistically correcting selection bias were developed in economics and related areas. In the decades since, an extensive literature has evolved in the area of sample selection bias (Berk, 1983; Lee & Marsh, 2000; Miller & Wright, 1995;
Stolzenberg & Relles, 1997; Vella & Verbeek, 1999; Winship & Mare, 1992). These methods are known as “sample selection” models.
Selection bias potentially threatens both internal and external validity (Berk, 1983; Miller & Wright, 1995). Selection bias is a threat to internal validity in that independent variables are correlated with a disturbance term (i.e., error) and analyses based on biased samples can lead to inaccurate estimates of the relationships between variables (e.g., regression coefficients). Thus, effects may be attributed to exogenous variables that actually are due to selection factors (Cook & Campbell, 1979). For example, consider the relationship between family income and approval to provide family foster care in a sample of foster family applicants. If data for income are missing systematically for applicants with higher incomes, the effect of income on approval might be underestimated as quantified using a regression coefficient, for example.
Thus, the internal validity of the study might be compromised.
Selection bias also potentially threatens external validity because a final, biased sample might not be generalizable to the intended population. Using another example, consider the results of a study that evaluates a high school dropout prevention program based on an analysis of a random sample of students who completed the program. The final sample used in the analysis might underrepresent the high-risk students and overrepresent the students who are at low or medium risk if the students most at risk drop out of school prior to completing (or even starting) the
22 JOURNAL OF SOCIAL SERVICE RESEARCHintervention. And, any inferences (i.e., a conclusion that the program is successful for all students irrespective of their level of risk) drawn from the sample might not be generalizable to the students most in danger of dropping out of school (i.e., those that need the intervention the most).
These examples underscore the importance of attending to differences among participants and non-participants and participation rates.
By establishing that no differences exist among participants and nonparticipants, or more importantly, detecting differences among participants and non-participants and correcting substantive results for these differences, these sample selection models are useful and important tools for social work researchers.
Moreover, selection models should be used whenever sufficient data for non-participants are available, and failing to do so can potentially lead to problems with the results of any research. However, failing to use these methods when appropriate is different than failing to use them when data for non-participants are unavailable, which is common. In this latter case, sample selection models are obviously of no use. This is an important distinction to note.
The importance of selection bias is known to social work researchers.
However, with a few exceptions (Ards, Chung, & Myers, 1998;
Brooks & Barth, 1999; Courtney, Piliavin, & Wright, 1997; Greenwell & Bengston, 1997; Grogan-Kaylor, 2001; McDonald, Moran, & Garfinkel, 1990; Vartanian, 1999), the available methods for detecting and statistically correcting selection bias have not been used in social work research, and only a limited number of the many available sample selection models have been used. Therefore, the purpose of this paper is to: (1) introduce sample selection models to social work researchers;
(2) provide an overview of sample selection models; (3) illustrate the use of a sample selection model and compare the results with methods typically used in social work research; (4) note computer software for estimating sample selection models; and (5) direct readers to additional literature in this area.
in Moffitt (1991), Reynolds and Temple (1995), Shadish, Cook, and Campbell (2001), Stolzenberg and Relles (1997), Winship and Mare (1992), and Winship and Morgan (1999). Additional selected readings of potential use to social work researchers are suggested below.
One of the earliest sample selection methods is known as the “Heckman” two-step estimator (Heckman, 1976, 1978, 1979). However, there is some evidence that corrections using this method can sometimes worsen rather than improve estimates, even under ordinary circumstances (Stolzenberg & Relles, 1997; Winship & Mare, 1992).
See Stolzenberg and Relles (1997) for a discussion of making reasonable judgments of when Heckman’s (1976, 1979) two-step estimator is likely to improve or worsen regression coefficient estimates. Nevertheless, since Heckman (1976, 1979), numerous models for detecting and statistically correcting sample selection bias have been developed.
Current sample selection models typically involve the simultaneous estimation of two multiple regression models. One model (i.e., the substantive model) is used to examine the substantive question of interest (e.g., Is the probability of approval to foster different for African-American and European-American families?). In most respects this model is no different from any other multiple regression model, and continuous, binary, multi-categorical, or other types of dependent variables can be modeled using methods familiar to social work researchers (e.g., a linear “OLS” model for a continuous dependent variable, a binary probit or logit model for a binary dependent variable, a multinomial probit or logit model for a multi-categorical dependent variable) (Orme & Buehler, 2001). The other regression model (i.e., the selection model) is used to detect selection bias and to statistically correct the substantive model for selection bias. Binary probit regression typically is used for this purpose because the outcome modeled usually is binary (e.g., participation or not), but binary logit regression and other models can be used (Greene, 1995, 2000).
Illustration of a Sample Selection Model
The illustration we will use is based on a study of foster family applicants. In this study of a population of foster family applicants who completed pre-service training, 230 applicants were selected but only 161 participated (70%) (Orme, Buehler, McSurdy, Rhodes, Cox, & Patterson, in press; Orme, Buehler, McSurdy, Rhodes, Cox, & Cuddeback, in press). The substantive question of interest used for illustration here is whether African-American applicants are less likely than EuropeanJOURNAL OF SOCIAL SERVICE RESEARCH American applicants to be approved to foster (i.e., the substantive model). First, it is important to consider whether there are systematic differences between participants and non-participants on selected variables (i.e., the selection model), in this case race, marital status, and education level. To make things more clear in the illustration, it is important to distinguish between the binary dependent variable in the substantive model (i.e., approval or disapproval to foster) and the dependent variable in the selection model (i.e., participation or non-participation), which is also a binary dependent variable. Descriptions of both the selection model and the substantive model follow.
Race, highest education, and marital status were determined for participants and non-participants. Race was coded 0 for “European-American” and 1 for “African-American/other”; marital status was coded 0 for “not married” and 1 for “married”; and education was coded 0 for “less than high school,” 1 for “high school/GED,” 2 for “some college, no degree,” 3 for “associate/two-year degree,” 4 for “bachelor’s degree,” and 5 for “advanced degree.” Participation was determined for the sample of 230 applicants and coded 0 for “no” and 1 for “yes.” Approval to foster was determined only for the 161 participants and coded as 0 for “no” and 1 for “yes.” Family-level education and race variables were created because approval and placement decisions are made at the family level, and so models were tested at the family level. Women’s education was used for family-level education except in the four cases of unmarried men, and men’s education was used in these cases. Family-level race for each unmarried applicant was the race of the individual (European-American = 0, African-American/other = 1). For same-race married couples, family-level race was the race shared by spouses, and for the four mixed-race married couples, family-level race was coded as African-American/other.
Table 1 shows descriptive statistics for race, marital status, and education for both participants and non-participants.