Archives of Iranian Medicine
, Volume 16, Number 5, May 2013
298
6DPSOH6L]H&DOFXODWLRQIRU(SLGHPLRORJLF6WXGLHV
ferent if we chose to compare the effects of the treatments in a su-
periority trial – i.e., a trial that determines which drug is superior
– versus if compared them in a noninferiority trial – i.e., our plan
is to show that the new drug is not inferior to the standard treat-
ment by a certain amount. Also, it would make a big difference
if we decide to compare the mean cholesterol reduction versus
we decide to compare the proportion of people whose cholesterol
UHDFKWKHWDUJHWRIPJG/Ɣ
Unfortunately, determination of the main study objective is not
always straightforward, particularly in observational studies, such
as case-control and cohort studies. Consider the example of a
cohort study. During follow-up in a cohort study, there will be
a large number of possible outcomes, including overall mortal-
ity, ischemic heart disease mortality, and esophageal cancer inci-
dence. If the main outcome is a common event, such as overall
mortality or mortality from ischemic heart disease, the sample
size doesn’t have to be very large, whereas if the main outcome is
esophageal cancer incidence, a relatively uncommon cancer, then
sample size has to be quite large. Likewise, in a typical cohort
study, we collect information on a number of exposures, with each
exposure having its own distribution. Although protocols often
require that we determine sample size, it may not always be easy
to determine in advance what the main outcome and the main ex-
posure is. The usefulness of a cohort goes way beyond one out-
come and one exposure.
4-2- Selecting the design of the study
Design of the study may have a major impact on the sample
size. For example, it makes a large difference when we com-
pare the effects of treatments X and Y on serum cholesterol in
a parallel design trial versus a cross-over trial. Cross-over trials
often need much smaller sample sizes, as each person receives
both treatments. Also, each person serves as his or her own con-
trol, eliminating interpersonal variance, which again results in a
smaller sample size.
4-3- Deciding on the proportion of participants distributed into each
study arm
In the formulas discussed in Sections 3-2 and 3-3, we assumed
equal sample size in each of the two treatment groups. However,
we may decide otherwise. For example, in comparing treatments
X (an old treatment) and Y (the new treatment), we may decide to
randomize more people into receiving Y. This is because X is a
well-known and long-used treatment, but Y is a new one and we
want more information on it, particularly about its side effects.
Given equal variance in the two study arms, uneven distribution
of participants in the study arms requires higher sample size to
obtain the same power.
Example 9: We want our study participants to be distributed
with a ratio of 1 to 3 into X and Y treatment groups, respectively.
If so, a total sample size of 1333 (333 in X and 1000 in Y) will
give us the same power as randomizing 1000 people equally to
each study group (500 X and 500 Y). Here, we need a 33% in-
FUHDVHLQVDPSOHVL]HƔ
4-4- Deciding between Bayesian versus frequentist methods
All of the formulas and much of the discussion made in this pa-
per are based on the frequentist view of probability. This is be-
cause, at least thus far, much of the currently practiced statistics
is based on frequentist methods. For example, P-value, power,
type I error, and much of all other familiar statistics is rooted in
frequentist view.
However, Bayesian methods are gaining popularity. If Bayesian
analysis is considered, sample size calculations will be totally dif-
ferent. Sample size and power calculations for studies designed to
be analyzed using Bayesian methods heavily depend on the prior
distributions. Without going into any details, prior distributions
may come from various sources, including our beliefs. For exam-
SOHLIDSROLWLFDOOHDGHUEHOLHYHVWKDWKLVRSLQLRQLVGH¿QLWHO\FRU-
rect, no matter how much data you show him, he will stand by his
prior opinion. If so, even a huge sample size showing the contrary
would do no good! This one, of course, was an extreme example!
Using prior distributions could be very helpful in some cases.
4-5- Deciding the numbers to put in the formulas
Consider the study presented in Section 3-2. Assume the litera-
ture suggests that the variance of blood pressure in each group is
20 mmHg. If we choose a power of 0.90 and a type I error level
of 0.01, and we want to detect a d = 2 mmHg, the required sam-
ple size would be nearly 6000. However, if we choose a power
of 0.80 and a type I error level of 0.05, and we want to detect a
d = 3 mmHg, the required sample size would be approximately
1400. Therefore, with some minimal changes in requirements, all
perfectly reasonable and within the ranges used by clinicians and
VWDWLVWLFLDQVZHFDQ¿QGWKHUHTXLUHGVDPSOHVL]HWREHDVORZDV
1400 or as high as 6000.
5- The impact of assumptions
Several assumptions have been made for doing the calculations
made in Section 3. Departures from these assumptions may make
sample size calculations incorrect. Below, we provide a few ex-
amples of the assumptions and show their impact on sample size
calculations. To make it simple, in all examples we have assumed
that the calculated sample size under the assumption is 1000.
5-1- Independence of study samples
The formulas and methods discussed so far assume that indi-
viduals in the sample are independent. However, if they are not,
then the sample size must be larger to accommodate for lack of
independence (sometimes referred to as clustering).
Example10: Consider the extreme example that identical twins
always respond identically to a drug; i.e., the correlation between
response from identical twins is 1.00. If so, when a researcher re-
cruits 500 pairs of identical twins, although the sample size is 1000,
it only provides us with information equivalent to 500 people; once
we know the response from one twin, having the second one adds no
further information. Here we say the effective sample sizeLVƔ
Example 11: Assume that to reduce costs of enrolling study
participants, rather than selecting 1000 people randomly from an
entire population, we randomly select 20 villages from the popu-
lation and then randomly select 50 individuals from each village
(two-stage cluster sampling). Since the responses obtained from
each village can be correlated, the effective sample size may be
less than 1000. If so, the effective sample will fall somewhere be-
tween the number of independent units (here, number of villages
= 20) and the total number of study participants (here, 1000). In
other words, our sample selection is not quite as good as recruiting
1000 independent people, but it is not as poor as selecting only
SHRSOH:LWKRXWJRLQJLQWRGHWDLOVRIWKHIRUPXODZHVXI¿FH
to say that the effective sample size depends on the total number
of people, the number of units, and the intracluster correlation,