United States
Environmental Protection
Policy, Planning.
EPA-230-R-95-005
And Evaluation
August 1995
Agency
(2163)
EPA Observational
Economy Series
Volume 1: Composite Sampling
EPA Observational Economy Series
Vol. 1: Composite Sampling
United States
Environmental Protection
Agency
Policy, Planning,
And Evaluation
(2163)
EPA 230-R-95-005
August 1995
Contents
Foreword
iii
Acknowledgments
iv
1. Introduction
1
2. What is Composite Sampling?
3
2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.2 Limitations of Composite Sampling . . . . . . . . . . . . . . . .
5
3. Applications
7
3.1.
Soil Sampling
...........................
8
3.1.1. PCB Contamination
....................
8
3.1.2. PAH Contamination
....................
9
3.2. Ground Water Monitoring
....................
10
3.3. Indoor Air Monitoring
.......................
10
3.4. Biomonitoring
...........................
11
3.4.1. Bioaccumulation in Human Adipose Tissue
.......
11
3.4.2. Assessing Contamination in Fish
.............
12
3.4.3. Assessing Contaminants in Mollusks
...........
13
3.4.4. Measuring Average Fat Content in Bulk Milk
......
13
4. Summary
15
References
17
Foreword
The high costs of laboratory analytical procedures frequently strain environ-
mental and public health budgets. Whether soil, water or biological tissue is
being analyzed, the cost of testing for chemical and pathogenic contaminants
can be quite prohibitive.
Composite sampling can substantially reduce analytical costs because the
number of required analyses is reduced by compositing several samples into
one and analyzing the composited sample. By appropriate selection of the
composite sample size and retesting of select individual samples, composite
sampling may reveal the same information as would otherwise require many
more analyses.
Many of the limitations of composite sampling have been overcome by
recent research, thus bringing out more widespread potential for using com-
posite sampling to reduce costs of environmental and public health assess-
ments while maintaining and often increasing the precision of sample-based
inference.
iii
Acknowledgments
The EPA Observational Economy Series is a result of the research conducted
under a cooperative agreement between the U.S. Environmental Protection
Agency and the Pennsylvania State University Center for Statistical Ecology
and Environmental Statistics, Professor G.P. Patil, Director.
The EPA grant CR-821531010, entitled “Research and Outreach on Ob-
servational Economy, Environmental Sampling and Statistical Decision Mak-
ing in Statistical Ecology and Environmental Statistics” consists of ten sep-
arate projects in progress at the Penn State Center: 1) Composite Sampling
and Designs; 2) Ranked Set Sampling and Designs; 3) Environmental Site
Characterization and Evaluation; 4) Encounter Sampling; 5) Spatio-temporal
Data Analysis; 6) Biodiversity Analysis and Monitoring; 7) Adaptive Sam-
pling Designs; 8) Statistics in Environmental Policy and Regulation for Com-
pliance and Enforcement; 9) Statistical Ecology and Ecological Risk Assess-
ment; and 10) Environmental Statistics Knowledge Transfer, Outreach and
Training.
The series is published by the Statistical Analysis and Computing Branch
of the Environmental Statistics and Information Division in the EPA Office
of Policy, Planning and Evaluation. This volume in the series is largely based
on the work of M. T. Boswell, S. D. Gore, G. D. Johnson, G. P. Patil, and C.
Taillie at the Penn State Center in cooperation with John Fritzvold, Herbert
Lacayo, Robert O’Brien, Brenda Odom, Barry Nussbaum, and John Warren
as project officers at U.S. EPA. Questions or comments on this publication
should be directed to Dr. N. Phillip Ross, Director, Environmental Statistics
and Information Division (Mail Code 2163), United States Environmental
Protection Agency, 401 M Street SW, Washington, DC 20460; Ph. (202)
260-2680.
iv
1. Introduction
While decision making in general involves opinion based on prior experience,
scientifically based decision making requires careful collection, measurement
and interpretation of data from physical observations.
Examples of such
decisions are:
“Has a hazardous waste site been sufficiently cleaned?“; or
“Are pollutants accumulating in certain foods as well as in human or wildlife
tissues?“.
Scientifically based decision making should minimize the risk of being
wrong. Since decisions require information, which is in turn extracted from
data, this risk decreases as the data become more representative of the pop-
ulation being studied.
In order for a data set to properly represent a population, it must cover
the ranges of space and time within which the population lies, as well as
have sufficient resolution within these ranges. It soon becomes obvious that
collection and review of representative data can be prohibitively expensive
if a large sample size (number of measurements, recordings or counts) is re-
quired, especially when analytical costs are very high such as with monitoring
environmental and biological media for chemical or pathogenic contaminants.
Conventional statistical techniques allow for the reduction of either cost or
uncertainty. However, the reduction of one of these factors is at the expense
of an increase in the other. Composite sampling offers to maintain cost or
uncertainty at a specified level while decreasing the other component.
Cornpositing simply refers to physically mixing individual samples to form
a composite sample, as visualized in Figure 1. Just one analysis is performed
on the composite, which is used to represent each of the original individual
samples.
Cornpositing is common practice for simply increasing the representative-
ness of a measurement, such as when measuring the fat content of a particular
entree that is cornposited across several restaurants included in a national
survey (Burros, 1994). For this reason, cornpositing can always reduce costs
for estimating a total or an average value. However, analysis of composite
samples can be cleverly extended to classify the original individual sample
units that comprised a composite.
For example, one may need to identify
1
the presence or absence of a pathogen like HIV in blood samples, or one may
need to identify all soil cores whose contaminant concentration exceeds an
action level at a hazardous waste site.
When analytical costs dominate over sampling costs, the savings potential
is obviously high; however, the immediate question is “How do we compensate
for information that is lost due to compositing?“. More specifically, if we are
testing whether or not a substance is present or existing at a concentration
above some threshold, we do not want to dilute individual “contaminated”
samples with clean samples so that the analysis does not detect any con-
tamination.’ Furthermore, if our measurements are of a variable such as a
chemical concentration, we may need to know the actual values of those in-
dividual samples with the highest concentrations. For example, “hot spots”
need to be identified at hazardous waste sites.
Through judicial choice of a strategy for retesting some of the original in-
dividual samples based on composite sample measurements, many limitations
of composite sampling can be overcome. Furthermore, other innovative ap-
plications of composite sampling are emerging such as combining with ranked
set sampling, another approach to achieve observational economy that is dis-
cussed in Volume 2 of this series.
Individual Field Samples
Figure 1: Forming composite samples from individual samples
2
2. What is Composite
Sampling?
2.1. Method
First; let’s clarify that a “sample” in this document refers to a physical object
to be measured, whether an individual or a composite, and not a collection
of observations in the statistical sense.
Individual sample units are what
is obtained in the field, such as soil cores or fish fillets; or obtained from
subjects, such as blood samples. Meanwhile, a composite sample may be a
physical mix of individual sample units or a batch of unblended individual
sample units that are tested as a group. Most cornpositing for environmental
assessment and monitoring consists of physically mixing individual units to
make a composite sample that is as homogeneous as possible.
With classical sampling, no distinction is made between the process of
sampling (i.e., selection or inclusion) and that of observation or measurement.
We assume, with classical sampling, that any unit selected for inclusion in a
statistical sample is measured and hence its value becomes known. In com-
posite sampling, however, there is a clear distinction between the sampling
and measurement stages. Compositing takes place between these two stages,
and therefore achieves two otherwise conflicting goals. While a large number
of samples can be selected to satisfy sample size requirements, the number
of analytical measurements is kept affordable.
If a variable of concern is a measurement that is continuous in nature
such as a chemical concentration, the mean (arithmetic average) of composite
samples provides an unbiased estimate of the true but unknown “population”
mean. Also, if measurement error is known, the population variance based
on the scale of the individual samples can be estimated by a simple weighting
of the measured composite sample variance.
With selective retesting of individual sample units, based on initial com-
posite sample results, we can classify all of the individual sample units ac-
cording to the presence or absence of a trait, or exceedance (vs. compliance)
of a numerical standard. We can subsequently estimate the prevalence of
3
subsamples (aliquots) of
individual samples used to
form a composite
retest select individuals
Yes: all individual samples
classified as negative
Figure 2: Composite sampling with retesting
a trait or proportion of non-compliance. Basically, if a composite measure-
ment does not reveal a trait in question or is in compliance, then all individual
samples comprising that composite are classified as “negative”. When a com-
posite tests positive, then retesting is performed on the individual samples
or subsamples (aliquots) in order to locate the source of “contamination”.
Retesting, as visualized in a general sense in Figure 2, may simply be
exhaustive retesting of all individuals comprising a composite or may entail
more specialized protocols. Generally, as the retesting protocol becomes more
sophisticated, the expected number of analyses decreases. Therefore, one
must consider any increased logistical costs along with the expected decrease
in analytical cost when evaluating the overall cost of a compositing/retesting
protocol.
Due to recent research (Patil, Gore and Sinha, 1994), the individual sam-
ples with the highest value, along with those individual samples comprising
an upper percentile, can be identified with minimal retesting. This ability
is extremely important when “hot spots” need to be identified such as with
soil monitoring at a hazardous waste site.
Whether we are dealing with data from binary (presence/absence) meas-
urements or data from measurements on a continuum, composite sampling
4
can result in classifying each individual sample without having to separately
analyze each one. While composite sampling may not be feasible when the
prevalence of contamination is high, the analytical costs can be drastically
reduced as the number of contaminated samples decreases.
2.2. Limitations of Composite Sampling
Both physical and logistical constraints exist that may restrict the applica-
tion of composite sampling. The limitations which more commonly arise are
discussed here along with some simple recommendations for how compositing
still may help.
Physical:
If the integrity of the individual sample values changes because of com-
positing, then composite sampling may not be the desired approach. For
example, volatile chemicals can evaporate upon mixing of samples (Cline
and Severin, 1989)
or interaction can occur among sample constituents. In
the first case, compositing of individual sample extracts may be a reasonable
alternative to mixing individual samples as they are collected.
Another limitation is imposed by potential dilution, where an individual
sample with a high value is combined with low values resulting in a compos-
ite sample that falsely tests negative.
When classifying samples according
to exceedance or compliance with some standard value, c, the problem of
dilution is overcome by comparing the composite sample result to c divided
by the composite sample size, k, (c/k). Furthermore, when an analytical
detection limit, d, is known, the maximum composite sample size is estab-
lished according to the inequality k < c/d. One may lower this upper bound
on the composite sample size to reduce effects of measurement error. As
can be seen here, when reporting limits (Rajagopal, 1990) or action levels
(Williams, 1990) of some hazardous chemical concentrations are legally re-
quired to be near the detection limit, the possibility of composite sampling
may be eliminated.
Sample homogeneity is another consideration. A homogeneous sample is
one where the variable of interest, such as a chemical concentration, is evenly
distributed throughout the sample. In contrast, a heterogeneous sample can
have substantially different values for the variable of interest, depending on
what part of the sample is actually analyzed. If the whole sample unit is
analyzed, then heterogeneity is not a problem; however, most laboratory
analyses are performed on a small subsample of the original sample unit. For
example, one gram of soil may be taken from a one kilogram soil core for
5
actual extraction and analysis. If a subsample is to represent a larger sample
unit, then the larger unit must be fairly homogeneous with respect to the
variable of interest.
Therefore, an individual sample unit should be homogenized as much as
possible prior to obtaining an aliquot for inclusion in a composite. Further-
more, formation of a composite must include homogenization if the composite
is going to be represented by measurement on a smaller subsample.
Often, measurements on multiple attributes are desired. However, if
retesting is performed in order to classify individual samples, it is unclear
how to optimize the retesting relative to the different attributes (Schaeffer et
al., 1982). For example, should chemicals be tested independently, or does
there exist dependence in the multivariate information that can be used to
improve cost efficiency? Classifying for multiple attributes remains an open
problem in composite sampling.
Logistical:
When retesting of certain individual samples may be required based on
composite sample results, then subsamples (aliquots) of the original individ-
ual samples must be preserved and stored until all testing is done. This may
lead to extra expense that must be considered in the overall cost comparison
between compositing and other strategies. For most environmental and pub-
lic health studies, the analytical savings from compositing will far outweigh
the extra cost of sample preservation and storage.
Another consideration is that events out of control of the scientists may
dictate the feasibility of composite sampling. For example, people whose
wells are being tested may demand that their wells be treated as equitably
as the wells of their neighbors. Measuring some well samples individually and
some well samples solely as part of a composite may give an appearance of
inequitability and result in a political decree to measure each well individually
(Rajagopal, 1990).
Circumstances that may presently disqualify composite sampling from
being applied may change upon advances in technology. Long turn-around
time for laboratory results and large labor costs may currently eliminate
optimal retesting designs from consideration. However, retesting designs in
the future may be automated and guided by an expert system (Rajagopal,
1990). Also, advances in statistical methodology may further extend the
utility of composite sampling,
For other reviews of composite sampling, see Rohde (1976, 1979), Elder
(1977), Elder, Thompson, and Myers (1980), Boswell and Patil (1987) and
Garner, Stapanian, and Williams (1988). For an overview, see Patil, Gore,
and Taillie (1994).
6
3. Applications
Composite sampling has its roots in what is known as group testing. An early
application of group testing was to estimate the prevalence of plant virus
transmission by insects (Watson, 1936). In this application, insect vectors
were allowed to feed upon host plants, thus allowing the disease transmission
rate to be estimated from the number of plants that subsequently become
diseased.
Apparently, the next important application of group testing occurred dur-
ing World War II when U.S. servicemen were tested for syphilis by detecting
the presence or absence of a specific antigen of the syphilis-causing bacterium
in samples of their blood (Dorfman, 1943). Initial analyses were done on com-
posite samples formed from aliquots of blood drawn from the subjects. A
composite sample testing negative indicated that all individuals contributing
to the composite were negative, while a composite testing positive prompted
exhaustive retesting of the original aliquots comprising that composite. If
blood aliquots of, k individuals are composited, the number of required tests
to classify these k individuals will either be 1 or k + 1. For a given prevalence
of the trait, the expected number of tests can be calculated for a composite
of size k. This application has gone on to become a classic example of how
statistical cleverness can assist researchers in attaining what we call obser-
vational economy (Rao, 1989).
In light of recent developments, composite sampling is increasingly be-
coming an acceptable practice for sampling soils, biota, and bulk materials
when the goal is estimation of some population value under restrictions of a
desired standard error and/or limits on the cost of sample measurement.
In response to an informal survey of various professionals, several favor-
able applications of composite sampling were received. They include:
l Establishing and verifying attainment of remedial cleanup standards in
soils using sample compositing and bootstrapping techniques
l Use of compositing to obtain adequate support in geostatistical sam-
pling
7
l Optimal compositing strategies for screening material for deleterious
agents
l
A soil sample design utilizing techniques of compositing, binary search,
and confidence limits on proportions
l
Composite sampling for analyzing foliage and other biological materials
While many diverse applications exist for composite sampling, some ex-
amples that are particularly relevant to environmental and public health
studies are detailed in the remainder of this chapter.
3.1.
Soil Sampling
3.1.1. Characterization of Soil PCB Contamination at
Gas Pipeline Compressor Stations
As part of a recent settlement between the Pennsylvania Department of
Environmental Resources and the Texas Eastern Pipeline Company, PCB-
contaminated soils had to be characterized and remediated at 19 sites. Be-
cause waste sources included indiscriminate dumping, disposal in trash pits,
air emissions and even application as weed killer along fence lines, the result-
ing spatial distribution of contaminated soil was very heterogeneous, with
hot spot locations unknown. Therefore, the only way to reliably characterize
these sites required a very large number of soil samples, around 12,000 to be
more precise. With each sample analyzed for total PCB,s, the cost for site
characterization alone was around $33 million. Now to really appreciate the
magnitude of the problem, one must realize this discussion only pertains to
the Pennsylvania settlement. The problem extends along the whole pipeline
from the Gulf Coast to New England.
Results of a retrospective study (Gore, Patil, and Taillie, 1992; Patil, Gore
and Sinha, 1994), using the actual site characterization data, revealed that
composite sampling methods potentially could have substantially reduced
the analytical costs.
Three aspects of the data were evaluated: (i) estimation of the mean
and variance of total PCB concentration as well as total PCB mass, (ii)
classification of each individual (uncomposited) sample as above or below
a specified critical level, and (iii) quantification of those individual samples
with the highest PCB levels.
Results showed that unbiased estimates of the mean and variance could
be obtained with one fourth the number of analyses (90 instead of 360). A
small loss of precision resulting from compositing seemed quite acceptable in
8
light of large analytical cost reduction.
Cornpositing can actually increase
precision if composites are purposely formed to increase heterogeneity within
composites; however, in this case composites were formed from spatially prox-
imate field samples in order to minimize heterogeneity within composites.
This approach was taken because it provides for the most efficient retesting
for classifying individual samples, which, as with most hazardous waste sites,
was the primary objective.
A site was acceptably clean if 90% of the measured samples were below
10 parts per million (ppm) with no values exceeding 25 ppm. With charac-
terization data from the worst of the nineteen sites, cornpositing could have
reduced the analytical cost of classifying individual samples according to the
10 ppm criterion by 9%, relative to exhaustive testing. Starting from this
nearly worst case scenario, the cost savings increase as we move to cleaner
sites and should be dramatic when analyzing post-remediation verification
data. For example, another site along the pipeline that is cleaner, although
still contaminated, could have had all individual samples classified accord-
ing to 10 ppm for 50% less of the analytical cost associated with exhaustive
testing. (See Gore, Boswell, Patil, and Taillie, 1992).
Finally, if concerned with simply knowing which individual sample has the
highest concentration, we could have discovered this by exhaustively retesting
just two composite samples. In other words, with only eight measurements
in addition to the 90 composite measurements, we could have identified the
“hottest” spot.
Furthermore, 12 additional measurements could have re-
vealed the locations with the four highest concentrations (See Patil, Gore
and Sinha, 1994).
Keep in mind that the percentages cited here result from a retrospective
study where expected composite values were estimated by arithmetically av-
eraging individual values. Since this approach assumes no measurement error
(but some is expected due to incomplete homogenization of samples), these
percentages are best interpreted as potential savings.
3.1.2. Characterization of Soil PAH Contamination at
a Superfund Site
In another study involving remediation of contaminated soil (Messner, et al,
1990), the investigators wanted to determine which half-acre plots at a Su-
perfund site should be remediated. The contaminant was total polyaromatic
hydrocarbons (PAHs) and the cleanup objective was to remediate any plots
that posed greater than a 10
-4
risk based on direct ingestion as the most
likely route of exposure.
These investigators concluded that the most cost-effective sampling design
9
was to take two composite samples from each half-acre plot, with each of the
two composites consisting of ten individual samples. Even when considering
the influence of small “hot spots,”
the proposed composite sampling design
assured a high probability of making the correct decision. Since the estimated
cost per analysis for this study was $800, the savings due to compositing is
phenomenal.
3.2. Ground Water Monitoring
As the distribution of a constituent in a given medium becomes more ho-
mogeneous, measurement error decreases, making composite sampling more
feasible. For this reason, composite sampling has great economic potential
for analyzing dissolved solutes, whether the solvent is water or some other
liquid. In fact, a study of composite sampling of wastewater (Schaeffer, Ker-
ster and Janardan, 1980) h
s
owed
that variability of analytical results due to
compositing was an insignificant source of total variability.
Rajagopal and Williams (1989) critically evaluated the economy of com-
positing ground water samples when screening a large monitoring network in
order to identify contaminated wells. With a binary retesting scheme, com-
positing resulted in decreased analytical effort and subsequent cost when no
more than about 12.5% of the wells were contaminated. Of course the savings
increased as the number of contaminated wells in the network decreased.
When more than one out of eight wells were contaminated, the number
of analyses increased over the amount required’ by initial exhaustive testing,
with the worst case scenario resulting in 50% additional analyses. If, however,
curtailed retesting was performed instead of straight binary retesting, the
absolute maximum exceedance of analyses would be 31% over that required
by initial exhaustive retesting. This number of additional analyses becomes
even smaller as the distribution of contaminated wells becomes contagious
(or clumped); therefore the rate of 31% additional analyses is absolute worst
case.
As seen here, if the number of contaminated wells is expected to be gen-
erally low, (e.g. less than 12%), cornpositing can be economically attractive.
3.3. Indoor Air Monitoring for Allergens
Quantification of specific allergens in dust from human dwellings provides
important information for determining allergen exposure. The fact that in-
door allergens are not equally distributed in the dust of human dwellings
makes it difficult to estimate allergen exposure with a high degree of cer-
10
tainty. A composite sample may provide a more reliable estimate of indoor
allergen exposure and minimize error associated with unequal distribution
of allergens on discrete objects. Composite samples of household dust may
provide useful information while minimizing the sample collection effort and
analytical test costs.
In a recent study (Lintner et al., 1992), dust samples from three specific
objects and composite samples from the same three objects were collected
from the living rooms and bedrooms of 15 homes by a single technician.
Discrete and composite samples were collected from floor, furniture (uphol-
stery/bed) and window- coverings in both the living room and a bedroom of
each home. Discrete samples were collected by vacuuming the specific objects
for 10 minutes. Composite samples were collected in a defined sequence by
vacuuming the three objects for 5 minutes each. In this way, the composites
were formed at the time of sample collection by allowing the vacuum cleaner
to do the physical mixing of the dust from several objects.
Results of this study seem to indicate that the actual measurement of a
composite sample will be approximately the average of the values that would
be obtained from separate measurements on discrete samples. However, if
an object has a significantly higher allergen content than other objects, the
composite sample measurement tends to be higher than the average of the
discrete sample measurements.
In order to effectively use composite sam-
pling, only items which are likely sources of allergen should be used to form
a composite sample.
3.4. Biomonitoring
3.4.1. Measuring Bioaccumulation in Human Adipose
Tissue
The National Human Adipose Tissue Survey (NHATS) is an annual survey
to collect and analyze a sample of adipose tissue specimens from autopsied
cadavers and surgical patients (Orban, Lordo and Schemberger, 1990). The
primary objectives of NHATS include:
l To identify chemicals that are present in the adipose tissue of individ-
uals in the U.S. population,
l To estimate the average concentrations, with confidence intervals, of
selected chemicals in adipose tissue of individuals in the U.S. population
and in various demographic subpopulations, and
l To determine if geographic region, age, race and sex affect the average
concentrations of selected chemicals detected in the U.S. population
11
Every year approximately 800-1200 adipose tissue specimens are collected
using a multistage sampling plan.
First, the 48 conterminous states are
stratified into four geographic areas, which form four strata. Next, a sample
of metropolitan statistical areas (MSAs) is selected from every stratum with
probabilities proportional to MSA populations. Finally, several cooperators
(hospital pathologists or medical examiners) are chosen from every selected
MSA and asked to supply a specified quota of tissue specimens. The quota
specifies the number of specimens needed in each of the following categories:
l Age groups: 0-14 years, 15-44 years, and 45+ years;
l Race: Caucasian and non-Caucasian; and
l Sex: Male and female.
The sampling plans are designed to give unbiased and efficient estimates
of the average concentrations of selected chemicals in the entire population
and in various subpopulations defined by the demographic variables described
above. Concentrations are characterized by the average or median chemical
concentrations; while prevalence is the proportion of individuals with chem-
ical concentrations exceeding specified criterion levels.
Instead of analyzing 800-1200 individual specimens, only about 50 com-
posite samples are analyzed. This not only reduces analytical cost, but also
provides enough tissue mass to use high resolution gas chromatography /
mass spectrometry which allows for a wider list of target chemicals to test
for.
3.4.2. Assessing Contamination in Fish
When monitoring human tissue for assessing the bioaccumulation of contam-
inants, compositing was forced on the study in order to achieve sufficient
mass of material for analysis. Now, with other organisms this is not typically
a limitation because we can sacrifice the whole organism. Nevertheless, as
researchers have shown (Paasivirta and Paukku, 1989), compositing is still
preferable because it is much more cost-effective.
When concerned with the concentrations of a host of organochlorine com-
pounds in Herring off of Finland’s East Gulf, researchers recognized how
expensive such monitoring could become. They therefore evaluated the ef-
fectiveness of composite sampling and concluded that costs could be reduced
by about 54% using optimized composite sampling instead of analyzing indi-
vidual fish. They also showed that average chemical concentrations could be
estimated from composite samples with the same accuracy as a larger num-
ber of individual samples, and that optimum composite sample sizes could
be easily calculated if laboratory variance can be predicted.
12
3.4.3. Assessing Contaminants in Mollusks
As part of the National Oceanic and Atmospheric Administration’s “Mus-
sel Watch” program,
177 coastal sites were sampled from 1986 to 1988
(NOAA, 1989).
While mussels were collected along the West Coast and north-
ern East Coast, oysters were taken along the southern East Coast, the Gulf
Coast and three sites in Hawaii.
Using the soft tissue of these mollusks, composite samples were made by
homogenizing either 30 mussels or 20 oysters. Six composites were then used
for chemical analysis, three for organics and three for trace elements.
Cornpositing served two purposes here; to provide sufficient media (tis-
sue) for analysis and to increase the information in each measurement. The
statistics of interest were means and variances, therefore retesting of individ-
ual mollusks or groups thereof was not necessary and the desired information
was obtained with minimal analyses.
3.4.4. Measuring Average Fat Content in Bulk Milk
Apparently, the economic value of composite sampling is well known in the
dairy industry, where milk must be routinely analyzed. For example, the fat
content of milk is determined on composite samples which are formed from
samples using all deliveries during a specified period of time.
Since composite samples are known to provide an unbiased estimate of the
population mean, dairy scientists are mainly concerned with the precision of
a composite sample estimator compared to that of an individual sample es-
timator. Williams and Peterson (1978)
developed
a framework for assessing
the precision of sampling schemes by estimating different sources of varia-
tion associated with the sampling process. They identified four components:
variance due to real difference between collections from a supplier within a
cornpositing period (biological variance), variance among samples taken from
the same collection (sample variance), variance among measurements on the
same sample (testing variance) and the variance associated with forming a
composite sample (cornpositing variance).
Based on a study of sixty-one herd milk supplies in three different cream-
ery locations, Connolly and O’Connor (1981) found that the biological com-
ponents of variability were about 10 times as large as sampling or cornpositing
components, indicating that the true biological variability is not masked by
the composite sampling process.
13
4. Summary
Compared to exhaustively testing all individual sample units, testing compos-
ite samples has the potential to greatly increase one’s observational economy
when conducting environmental and public health monitoring.
When the objective is to estimate the population mean or total, com-
positing will always reduce analytical cost; however, a sufficient number of
composite samples must still be obtained for estimating the variance.
When the objective is to classify each individual sample, with subsequent
estimation of the prevalence of a binary trait or proportion of noncompliance
measurements, testing composite samples with selective retesting becomes
cost-effective when the prevalence or proportion is low. Examples of where
composite sampling can be very cost-effective for classification include (i)
estimating the prevalence of a rare disease and (ii) verifying if a hazardous
waste site has been sufficiently remediated.
15
References
B
OSWELL
, M. T.,
AND
P
ATIL
, G. P. (1987). A perspective of composite sampling.
Commun.
Statist.-Theory Meth.,
16, 3069-3093.
B
URROS
, M. (1994). A study faults Mexican restaurants. The New York Times,
July 19, 1994, p. A16.
CLINE, S. M., AND SEVERIN, B. F. (1989). Volatile organic losses from a com-
posite water sampler. Water Res., 23(4), 407-412.
CONNOLLY, J., AND O’CONNOR, F. (1981). Comparison of random and compos-
ite sampling methods for the estimation of fat content of bulk milk supplies.
Ir. J. Agr. Res., 20, 35-51.
D
ORFMAN
, R. (1943). The detection of defective members of large populations.
Ann. Math. Stat., 14, 436-440.
EDLAND, S. D. AND VAN BELLE, G. (1994). Decreased sampling costs and im-
proved accuracy with composite sampling. In Environmental
Statistics, As-
sessment and Forecasting, C.
R. Cothern, and N. P. Ross, eds. Lewis Pub-
lishers, Boca Raton. pp. 29-55.
E
LDER
, R. S. (1977). Properties of composite sampling procedures. Ph.D. Dis-
sertation. Virginia Polytechnic Institute and State University, Blacksburg,
VA.
ELDER, R. S., THOMPSON, W. O., AND MYERS, R. H. (1980). Properties of
composite sampling procedures.
Technometrics, 22(2),
179-186.
GARNER, F. C., STAPANIAN, M. A., AND WILLIAMS, L. R. (1988). Compos-
ite sampling for environmental monitoring.
In Principles of Environmental
Sampling,
L. H. Keith, ed. American Chemical Society. pp. 363-374.
GORE, S. D., BOSWELL, M. T., PATIL, G. P., AND TAILLIE, C. (1992). Studies
on the applications of composite sample techniques in hazardous waste site
characterization and evaluation: I. Onsite surface soil sampling for PCB at
the Uniontown Site. Technical Report Number 92-0101, Center for Statis-
tical Ecology and Environmental Statistics, Pennsylvania State University,
University Park, PA 16802.
G
ORE
,
S.
D.,
AND
P
ATIL
, G. P. (1994). Identifying extremely large values using
composite sample data. With Discussions by J. Warren, H. D. Kahn, and
K. Campbell. Environmental and Ecological Statistics, l(3), 227-245.
GORE, S. D., PATIL, G. P., AND TAILLIE, C. (1992). Studies on the applications
of composite sample techniques in hazardous waste site characterization and
evaluation: II. Onsite surface soil sampling for PCB at the Armagh Site.
Technical Report Number 92-0305, Center for Statistical Ecology and En-
vironmental Statistics, Pennsylvania State University, University Park, PA
16802.
17
LINTNER, T. J., MAKI, C. L., BRAME, K. A., AND BOSWELL, M. T. (1992).
Sampling dust from human dwellings to estimate the prevalence of indoor
allergens. Technical Report Number 92-0805, Center for Statistical Ecol-
ogy and Environmental Statistics, Pennsylvania State University, University
Park, PA 16802.
MACK, G. A., AND ROBINSON, P. E. (1985). Use of cornposited samples to
increase the precision and probability of detection of toxic chemicals. In En-
vironmental Applications of Chemometrics
J. J. Breen, and P. E. Robinson,
eds. American Chemical Society, Washington, DC. pp. 174-183.
MESSNER, M. J., CLAYTON, C. A., MICHAEL, D. I., NEPTUNE, M. D., AND
B
RANTLY
, E. P. (1990). Retrospective design solutions for a remedial in-
vestigation. Supplement to Quantitative Decision Making in Super-fund: A
Data Quality Objectives Case Study. Hazardous Materials Control, Volume
3, Number 3.
NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION. (1989). A Summary
of Data on Tissue Contamination from the First Three Years (1986-1988) of
the Mussel Watch Project. NOAA Technical Memorandum, NOS OMA 49.
ORBAN, J. E., LORDO, R., AND SCHWEMBERGER, J. (1990). Statistical methods
for analyzing composite sample data applied to EPA’s human monitoring
program. MS.
P
AASIVIRTA
, J.,
AND
P
AUKKU
, R. (1989). Use of cornposited samples to optimize
the monitoring of environmental toxins.
Chemosphere,
19,
1551-1562.
PATIL, G. P., GORE, S. D. AND SINHA, A. K. (1994). Environmental chemistry,
statistical modeling, and observational economy. In
Environmental Statis-
tics, Assessment
and
Forecasting,
C. R. Cothern and N. P. Ross, eds. Lewis
Publishers, Boca Raton. pp. 57-97.
PATIL, G. P., GORE, S. D., AND TAILLIE, C. (1994). Design and analysis with
composite samples: A novel method to accomplish observational economy in
environmental studies. Technical Report Number 94-0410, Center for Statis-
tical Ecology and Environmental Statistics, Pennsylvania State University,
University Park, PA 16802.
R
AJAGOPAL
, R. (1990). Personal communication.
RAJAGOPAL, R., AND WILLIAMS, L. R. (1989). Economics of sample composit-
ing as a screening tool in ground water quality monitoring.
Ground Water
Monitoring Review,
9(l),
186-192.
RAO, C.R. (1989). Statistics and Truth, Putting Chance to Work. International
Co-operative Publishing House, Fairland, MD. pp. 118-119.
ROHDE, C. A. (1976). Composite sampling. Biometrics, 32, 273-282.
ROHDE, C. A. (1979). Batch, bulk and composite sampling. In Sampling Bi-
ological Populations R. M. Cormack, G. P. Patil, and D. S. Robson, eds.
International Co-operative Publishing House, Fairland, MD. pp. 365-377.
SCHAEFFER, D., KERSTER, H. W., AND JANARDAN, K. G. (1982). Monitoring
toxics by group testing. Environ. Mgmt., 6(6), 467-469.
18
SCHAEFFER, D. J., KERSTER, H. W., AND JANARDAN, K. G. (1980). Grab ver-
sus composite sampling: A primer for the manager and engineer. Environ.
Mgmt., 4(6), 469-481.
WATSON, M. A. (1936). Factors affecting the amount of infection obtained by
aphis transmission of the virus Hy. III. Philos. Trans. Roy. Soc. London,
Ser. B., 226, 457-489.
WILLIAMS, C. J., AND PETERSON, R. G. (1978). Variation in estimates of milk
fat, protein and lactose content associated with various bulk milk sampling
programs. J. Dairy Science, 61, 1093.
19