«EPA Observational Economy Series Volume 1: Composite Sampling EPA Observational Economy Series Vol. 1: Composite Sampling United States Policy, ...»
United States Policy, Planning.
Environmental Protection And Evaluation
Volume 1: Composite Sampling
EPA Observational Economy Series
Vol. 1: Composite Sampling
United States Policy, Planning, EPA 230-R-95-005
Environmental Protection And Evaluation August 1995
Foreword iii Acknowledgments iv
1. Introduction 1
2. What is Composite Sampling? 3
2.2 Limitations of Composite Sampling................
3. Applications 7
3.1. Soil Sampling...........................
3.1.1. PCB Contamination....................
3.1.2. PAH Contamination....................
3.2. Ground Water Monitoring....................
3.3. Indoor Air Monitoring....................... 10
3.4. Biomonitoring........................... 11 3.4.1. Bioaccumulation in Human Adipose Tissue....... 11 3.4.2. Assessing Contamination in Fish............. 12 3.4.3. Assessing Contaminants in Mollusks........... 13 3.4.4. Measuring Average Fat Content in Bulk Milk...... 13
4. Summary References Foreword The high costs of laboratory analytical procedures frequently strain environmental and public health budgets. Whether soil, water or biological tissue is being analyzed, the cost of testing for chemical and pathogenic contaminants can be quite prohibitive.
Composite sampling can substantially reduce analytical costs because the number of required analyses is reduced by compositing several samples into one and analyzing the composited sample. By appropriate selection of the composite sample size and retesting of select individual samples, composite sampling may reveal the same information as would otherwise require many more analyses.
Many of the limitations of composite sampling have been overcome by recent research, thus bringing out more widespread potential for using composite sampling to reduce costs of environmental and public health assessments while maintaining and often increasing the precision of sample-based inference.
The EPA Observational Economy Series is a result of the research conducted under a cooperative agreement between the U.S. Environmental Protection Agency and the Pennsylvania State University Center for Statistical Ecology and Environmental Statistics, Professor G.P. Patil, Director.
The EPA grant CR-821531010, entitled “Research and Outreach on Observational Economy, Environmental Sampling and Statistical Decision Making in Statistical Ecology and Environmental Statistics” consists of ten separate projects in progress at the Penn State Center: 1) Composite Sampling and Designs; 2) Ranked Set Sampling and Designs; 3) Environmental Site Characterization and Evaluation; 4) Encounter Sampling; 5) Spatio-temporal Data Analysis; 6) Biodiversity Analysis and Monitoring; 7) Adaptive Sampling Designs; 8) Statistics in Environmental Policy and Regulation for Compliance and Enforcement; 9) Statistical Ecology and Ecological Risk Assessment; and 10) Environmental Statistics Knowledge Transfer, Outreach and Training.
The series is published by the Statistical Analysis and Computing Branch of the Environmental Statistics and Information Division in the EPA Office of Policy, Planning and Evaluation. This volume in the series is largely based on the work of M. T. Boswell, S. D. Gore, G. D. Johnson, G. P. Patil, and C.
Taillie at the Penn State Center in cooperation with John Fritzvold, Herbert Lacayo, Robert O’Brien, Brenda Odom, Barry Nussbaum, and John Warren as project officers at U.S. EPA. Questions or comments on this publication should be directed to Dr. N. Phillip Ross, Director, Environmental Statistics and Information Division (Mail Code 2163), United States Environmental Protection Agency, 401 M Street SW, Washington, DC 20460; Ph. (202) 260-2680.
iv 1. Introduction
While decision making in general involves opinion based on prior experience, scientifically based decision making requires careful collection, measurement and interpretation of data from physical observations. Examples of such decisions are: “Has a hazardous waste site been sufficiently cleaned?“; or “Are pollutants accumulating in certain foods as well as in human or wildlife tissues?“.
Scientifically based decision making should minimize the risk of being wrong. Since decisions require information, which is in turn extracted from data, this risk decreases as the data become more representative of the population being studied.
In order for a data set to properly represent a population, it must cover the ranges of space and time within which the population lies, as well as have sufficient resolution within these ranges. It soon becomes obvious that collection and review of representative data can be prohibitively expensive if a large sample size (number of measurements, recordings or counts) is required, especially when analytical costs are very high such as with monitoring environmental and biological media for chemical or pathogenic contaminants.
Conventional statistical techniques allow for the reduction of either cost or uncertainty. However, the reduction of one of these factors is at the expense of an increase in the other. Composite sampling offers to maintain cost or uncertainty at a specified level while decreasing the other component.
Cornpositing simply refers to physically mixing individual samples to form a composite sample, as visualized in Figure 1. Just one analysis is performed on the composite, which is used to represent each of the original individual samples.
Cornpositing is common practice for simply increasing the representativeness of a measurement, such as when measuring the fat content of a particular entree that is cornposited across several restaurants included in a national survey (Burros, 1994). For this reason, cornpositing can always reduce costs for estimating a total or an average value. However, analysis of composite samples can be cleverly extended to classify the original individual sample units that comprised a composite. For example, one may need to identify
Figure 1: Forming composite samples from individual samples the presence or absence of a pathogen like HIV in blood samples, or one may need to identify all soil cores whose contaminant concentration exceeds an action level at a hazardous waste site.
When analytical costs dominate over sampling costs, the savings potential is obviously high; however, the immediate question is “How do we compensate for information that is lost due to compositing?“. More specifically, if we are testing whether or not a substance is present or existing at a concentration above some threshold, we do not want to dilute individual “contaminated” samples with clean samples so that the analysis does not detect any contamination.’ Furthermore, if our measurements are of a variable such as a chemical concentration, we may need to know the actual values of those individual samples with the highest concentrations. For example, “hot spots” need to be identified at hazardous waste sites.
Through judicial choice of a strategy for retesting some of the original individual samples based on composite sample measurements, many limitations of composite sampling can be overcome. Furthermore, other innovative applications of composite sampling are emerging such as combining with ranked set sampling, another approach to achieve observational economy that is discussed in Volume 2 of this series.
2. What is Composite Sampling?
2.1. Method First; let’s clarify that a “sample” in this document refers to a physical object to be measured, whether an individual or a composite, and not a collection of observations in the statistical sense. Individual sample units are what is obtained in the field, such as soil cores or fish fillets; or obtained from subjects, such as blood samples. Meanwhile, a composite sample may be a physical mix of individual sample units or a batch of unblended individual sample units that are tested as a group. Most cornpositing for environmental assessment and monitoring consists of physically mixing individual units to make a composite sample that is as homogeneous as possible.
With classical sampling, no distinction is made between the process of sampling (i.e., selection or inclusion) and that of observation or measurement.
We assume, with classical sampling, that any unit selected for inclusion in a statistical sample is measured and hence its value becomes known. In composite sampling, however, there is a clear distinction between the sampling and measurement stages. Compositing takes place between these two stages, and therefore achieves two otherwise conflicting goals. While a large number of samples can be selected to satisfy sample size requirements, the number of analytical measurements is kept affordable.
If a variable of concern is a measurement that is continuous in nature such as a chemical concentration, the mean (arithmetic average) of composite samples provides an unbiased estimate of the true but unknown “population” mean. Also, if measurement error is known, the population variance based on the scale of the individual samples can be estimated by a simple weighting of the measured composite sample variance.
With selective retesting of individual sample units, based on initial composite sample results, we can classify all of the individual sample units according to the presence or absence of a trait, or exceedance (vs. compliance) of a numerical standard. We can subsequently estimate the prevalence of
a trait or proportion of non-compliance. Basically, if a composite measurement does not reveal a trait in question or is in compliance, then all individual samples comprising that composite are classified as “negative”. When a composite tests positive, then retesting is performed on the individual samples or subsamples (aliquots) in order to locate the source of “contamination”.
Retesting, as visualized in a general sense in Figure 2, may simply be exhaustive retesting of all individuals comprising a composite or may entail more specialized protocols. Generally, as the retesting protocol becomes more sophisticated, the expected number of analyses decreases. Therefore, one must consider any increased logistical costs along with the expected decrease in analytical cost when evaluating the overall cost of a compositing/retesting protocol.
Due to recent research (Patil, Gore and Sinha, 1994), the individual samples with the highest value, along with those individual samples comprising an upper percentile, can be identified with minimal retesting. This ability is extremely important when “hot spots” need to be identified such as with soil monitoring at a hazardous waste site.
Whether we are dealing with data from binary (presence/absence) measurements or data from measurements on a continuum, composite sampling can result in classifying each individual sample without having to separately analyze each one. While composite sampling may not be feasible when the prevalence of contamination is high, the analytical costs can be drastically reduced as the number of contaminated samples decreases.
2.2. Limitations of Composite Sampling Both physical and logistical constraints exist that may restrict the application of composite sampling. The limitations which more commonly arise are discussed here along with some simple recommendations for how compositing still may help.
If the integrity of the individual sample values changes because of compositing, then composite sampling may not be the desired approach. For example, volatile chemicals can evaporate upon mixing of samples (Cline and Severin, 1989) or interaction can occur among sample constituents. In the first case, compositing of individual sample extracts may be a reasonable alternative to mixing individual samples as they are collected.
Another limitation is imposed by potential dilution, where an individual sample with a high value is combined with low values resulting in a composite sample that falsely tests negative. When classifying samples according to exceedance or compliance with some standard value, c, the problem of dilution is overcome by comparing the composite sample result to c divided by the composite sample size, k, (c/k). Furthermore, when an analytical detection limit, d, is known, the maximum composite sample size is established according to the inequality k c/d. One may lower this upper bound on the composite sample size to reduce effects of measurement error. As can be seen here, when reporting limits (Rajagopal, 1990) or action levels (Williams, 1990) of some hazardous chemical concentrations are legally required to be near the detection limit, the possibility of composite sampling may be eliminated.
Sample homogeneity is another consideration. A homogeneous sample is one where the variable of interest, such as a chemical concentration, is evenly distributed throughout the sample. In contrast, a heterogeneous sample can have substantially different values for the variable of interest, depending on what part of the sample is actually analyzed. If the whole sample unit is analyzed, then heterogeneity is not a problem; however, most laboratory analyses are performed on a small subsample of the original sample unit. For example, one gram of soil may be taken from a one kilogram soil core for actual extraction and analysis. If a subsample is to represent a larger sample unit, then the larger unit must be fairly homogeneous with respect to the variable of interest.
Therefore, an individual sample unit should be homogenized as much as possible prior to obtaining an aliquot for inclusion in a composite. Furthermore, formation of a composite must include homogenization if the composite is going to be represented by measurement on a smaller subsample.