«CONFIDENTIAL Case Study 1 Comparison of Statistical Sampling Models for the Detection of Pests in Stored Grain Project - CRC 30086 David Elmouttie1,2 ...»
Report to CRC for National Plant Biosecurity
Case Study 1
Comparison of Statistical Sampling
Models for the Detection of Pests in
Project - CRC 30086
David Elmouttie1,2 and Grant Hamilton1,2
Discipline of Biogeosciences, Queensland University of Technology, GPO Box 2434,
Brisbane, Queensland, Australia, 4001.
Cooperative Research Centre for National Plant Biosecurity, LPO Box 5012, Bruce, ACT 2617, Australia.
6th June 2011 Case study 1 – Comparing statistical sampling models – CRC 30086 Executive Summary Developing robust sampling strategies for cereal grains, based on a coherent statistical framework, is becoming increasingly important. These methods are developed in an attempt to ensure internationally and domestically traded grain commodities are free from unwanted pests and pathogens. Developing sampling strategies to target insect pests and pathogens in grains is inherently difficult however, as these organisms can display varied spatial distributions in relation to conditions within storages.
Statistical sampling models are generally developed on an underlying statistical distribution, selected on the basis of how closely the distribution approximates the known distribution of the target pests. The Binomial, Negative Binomial or Poisson distributions have formed the basis of many sampling models, due to their simplicity in part and their underlying characteristics. In grain sampling and the broader ecological literature, both the Negative Binomial and Poisson distribution have often been selected to represent situations where targets are highly aggregated or rare. However, the behaviour of these models under the broad range of insect distributions common in grain bulks is relatively unknown.
In this case study we compare four sampling models, the compound model, proposed by Elmouttie et al 2010, a model based on the Negative Binomial proposed by Hagstrum et al.
(1985) and Green and Young (1993), a Poisson model proposed by Green and Young (1993) and a Binomial approach proposed by Hunter and Griffiths (1978). Using data collected from farm silos in Australia and the USA we demonstrate that the compound model proposed by Elmouttie et al. (2010) performs better than alternative models in meeting a specified level of detection. We also illustrate the conditions under which each model fails in achieving this desired level of detection.
Case study 1 – Comparing statistical sampling models – CRC 30086 Background - Sampling An efficient sampling programme is an imperative component of any quality assurance programme, whether it be for manufacturing, exporting or in the agricultural sector.
Sampling programmes form the basis for the acceptance or rejection of goods to be sold or traded, are used to determine the quality of sampled goods and can lead to the initiation of treatment or management regimes. It is therefore important that a sampling programme is adequate with respect to achieving its desired goal, such as pest detection.
The fundamental component to any sampling programme is determining how intensively a commodity should be sampled to detect defects or unwanted items. Determining sampling intensity however, is inherently difficult. The intensity at which a commodity is required to be sampled will vary in relation to a range of factors including the commodity itself, the ease at which it can be sampled, the density of the target and the type of target which is being sampled. No single or standard sampling protocol or programme can be consistently used inserted across a range of systems.
Sampling programmes have historically been developed based upon an underlying statistical distribution. A statistical distribution is used to approximate the distribution of sampling target within the commodity being sampled. From this, sampling intensity can be determined based on factors such as target density and the approximated distribution.
There are a number of statistical distributions used to develop sampling programmes, namely, the Poisson, Binomial or Negative Binomial. Each distribution is selected on the basis of how well it may approximate the system being sampled with selection of individual distribution based on the type of system being sampled.
Case study 1 – Comparing statistical sampling models – CRC 30086 Sampling - grains Robust sampling strategies are becoming increasingly important to ensure the detection of pests and pathogens in stored commodities such as cereal grain. There is a growing emphasis on developing statistically robust sampling programmes that maximise detection of target species due to biosecurity and food shortage concerns globally. In a pest management context, sampling typically has two main objectives, to either estimate the presence or quantity of a target species within an area, or determine the spatial distribution that the species displays within the sampled area (Pillar 1998). Whatever the objective however, it is important to have a basic understand of the target species biology so that sampling plans can be tailored specifically to the particular management situation.
Similarly to sampling programmes for other commodities, sampling programmes for biological systems have been developed based on an underlying statistical distribution which aims to best approximate the ‘known’ distribution of the target species. A number of sampling programmes based on a range of statistical distribution have been adopted and modified for use in stored grains sampling (Hunter and Griffiths, Love et al. 1983, Hagstrum et al. 1985). The statistical models chosen in each scenario and modifications made attempt to capture the variability within biological systems which stem from factors such as environment and species’ behaviours. However as it is difficult to find a single distribution which adequately describes the natural variation that exists, distribution may be chosen and assumptions are made for convenience, often without scientific support.
As with the broader field of sampling, most common statistical sampling programmes for biological systems have been based on the Binomial, Negative Binomial and Poisson distributions (Hunter and Griffith 1978, Hagstrum et al. 1985, Green and Young 1993, Case study 1 – Comparing statistical sampling models – CRC 30086 Southwood and Henderson 2000). The usage and basis for selection of each of these distributions to develop sampling programmes however has typically been derived from quality assurance and modified for ecological requirements rather than being primarily developed for a biological system (Stephens 2001).
Developing robust sampling programmes for biological systems (such as stored grains) that capture the range variability in the system in question is inherently more complex than capturing the variability in a manufacturing process. Target species can be difficult to detect for numerous reasons. For example, species distributions can vary over space and time, influencing detection rates. Further the area being sampled may not be conducive to sampling. It is therefore necessary that the sampling programmes established and the statistical framework developed best approximates the actual system as closely as possible.
The issue of species aggregation and clustering behaviour (i.e. spatially heterogeneous distributions) has been of particular concern to biologist when developing sampling programmes. Sampling plans have been developed attempting to account for species clustering behaviour (Hagstrum et al. 1985, Green and Young 1993, Elmouttie et al. 2010). In such instances, statistical models have most commonly been based on either the Negative Binomial or Poisson probability functions. Although neither function implicitly considers heterogeneity they do provide a good approximation of rarity, in case of the Poisson and clustering, in the case of the Negative Binomial (Green and Young 1993, Subramanyam and Hagstrum 1996).
An area which has seen significant research with regards to sampling ecology has been stored grain insects (Hagstrum et al. 1995, Lippert and Hagstrum 1987, Elmouttie et al.
2010). The presence of grain beetles within storages is problematic, as they lead to Case study 1 – Comparing statistical sampling models – CRC 30086 restrictions in biosecurity and trade, commodity losses and spoilage (Rees 2004, Hagstrum and Subramanyam 2006). Thus a significant emphasis has been placed on developing techniques to detect insect pests early within grain storages (Hagstrum et al 1985, Lippert and Hagstrum 1987, Hagstrum et al. 1988, Elmouttie et al. 2010).
Although grain storages appear to be homogeneous, insect distributions within them often display varied distributions. The distribution of insects within a storage will vary from species to species, between storage types, and in relation to external and internal climatic conditions (Cuperus et al. 1990, Athanassiou et al. 2003, Nansen et al. 2009). It is therefore logical that sampling programmes account for species clustering behaviour and not consider the grain mass as a homogenous entity to be sampled. Not all sampling programmes have been developed to account for species distribution, however.
In this case study we aim to compare a range of sampling models. We aim to test each model over a range of insect densities to determine where each model performs well and where they perform poorly. In order to make an extensive comparison, we use data collected both from Australian on farm storages by Dr. David Elmouttie, and data collected from an intensive sampling programme of on farm storages in the USA supplied by Dr. Paul Flinn (USDA). This ensures that a wide range of possible storage conditions, environmental and management practices are covered in the comparison. Given that insects pests occur at a range of densities in a range of environmental conditions, this comparison has been made to determine which models are most appropriate to use under a variety of environmental conditions.
Case study 1 – Comparing statistical sampling models – CRC 30086 Methodology Data Collection Australia A grain silo holding approximately 70 tonnes of wheat that had been harvested and placed in storage for four months was sampled. No insecticide treatments had been administered during the storage period. All grain sampling from the silo was conducted on a single day.
Twenty five 800 gram samples were taken from random locations within the silo using a Graintec® Stainless steel grain spear. Each sample was individually bagged and sieved using a Graintec® 2mm stainless steel grain insect sieve for a standard 10 seconds. For each sample the number of Sitophilus oryzae (Rice Weevils), Rhyzopertha dominica (Lesser Grain borer) and Cryptolestes spp. (Flat Grain Beetle) were recorded. This procedure was replicated three times such that three data sets for each species was available for simulation.
USA Data used for model comparisons was collected from four independent vertical Silos approximately 4.75m in diameter containing on average 30 tonnes of wheat in Kansas USA were sampled monthly for seven consecutive months commencing in July 2003 as described in (Flinn et al. 2004). Silos were separated in relation to three distinct treatments, Control (no treatment), Aeration (Aerated silos). Each treatment was administered over two independent silos. Sampling was conducted using a pneumatic grain sampler (Pro2e-AVac®, Cargill Minneapolis, MN). Twenty one, 3 kg samples were drawn from each Silo during in sampling period of three height strata (0-0.8m, 0.8-1.6m, 1-6-2.4m). All samples Case study 1 – Comparing statistical sampling models – CRC 30086 were processed using an Insectomat® motorized inclined sieve (Samplex LTD, Willow Park, UK), to separate insects from grain. The number of live adult insects was counted in each sample immediately after extraction. Three major grain insect pest species were the target of the study, Rhyzopertha dominica, Cryptolestes spp and Tribolium spp.
Model Parameter estimation and simulations In general, the aim of a grains sampling protocol is to detect insects at a given power, that is, to ensure with a given probability that insects are detected when they are in fact present.
In this section, four sampling models were compared: i) a Compound model proposed by Elmouttie et al (2010; equation 1); ii) a Poisson model proposed for sampling by Green and Young (1993; equation 2); iii) a Negative Binomial model proposed by Green and Young (1993) and Hagstrum et al. (1985)(equation 3); and iv) and a Binomial approach proposed by Hunter and Griffiths (1978;equation 4) (equations 1-4 below). Data from each silo was used to calculate parameter estimates for each model at each sampling period for each species.
These data were then used to populate each model to determined estimated sampling intensity, n. In the following models n represents the number of samples required for a given power (1 – β), β represents the probability of the Type 2 error (the probability of failing to detect a species after n samples are drawn), and m represents the mean density of the target.
where p is the proportion of the silo infested, w represents the weight of the sample drawn and λ represents mean density of insects in the infested portion of the silo.
In equation 4, k represents a dispersion parameter. The dispersion parameter was estimated using the method presented by Southwood and Henderson (2000)
where ψ represents the number of insects in a sample, P(ψ) is the probability of drawing an infested sample, θ is the total fraction of the bulk grain that is sampled and ν represents the total number of insects in a lot Model Comparisons Parameter estimates were generated for each model for each treatment, species and sampling period for both Australian and USA data sets. These data were then used to populate each model to determine the required sampling intensity n for a 95% probability of detection. The data from which the estimates were generated from were then randomly sampled in a Monte Carlo simulation with replacement for 10 000 iterations and intensity n for each model. A detection was recorded when n samples taken from the data set resulted in at least one insect being found. The number of detections from the 10 000 simulation was recorded and percentage success rate determined for each trial simulation.