AIM For Lito.pdf

Archives of Iranian Medicine

, Volume 16, Number 5, May 2013

295

).DPDQJDU),VODPL

Introduction

“

ow many is enough?” is a question that epidemiologists

and clinicians ask themselves when they plan on conduct-

ing a new study. Researchers want to enroll a large

enough number of people such that statistical errors (type I and

type II) are minimized yet the cost, labor, and time to do the study

remain acceptable. Sample size calculations often remind us of

complex formulas. While we provide some formulas in the text,

the main aim of the article is not to offer a long list of such formu-

las. Rather, the main aim is to discuss the statistical principles

behind sample size calculation, issues that may make such calcu-

lations not-so-straightforward, and nonstatistical considerations in

sample size determination. Therefore, in this article, we discuss

the following topics:

1. The need to calculate sample size;

2. Principles;

3. Some formulas;

4. Factors that need to be determined for sample size calcula-

tions;

5. Assumptions made for sample size calculations;

6. Nonstatistical considerations;

7. Methods used to calculate sample size; and

8. Software used to calculate sample size.

7KH¿QDOSDUWRIWKHSDSHU6XPPDU\DQG&RQFOXVLRQVWLHVWKHVH

sections together.

1- The need to calculate sample size

When we would like to learn about an attribute of a population,

such as mean cholesterol of the people of China, it may not be

feasible for us to study the entire population due to cost or time

issues. Besides cost and time issues, it may not be ethical to study

the entire population if accurate enough results could be obtained

by studying a subgroup of all people. For these reasons, we need

to study a sample of the population.

However, results vary from sample to sample, and they may be

somewhat different from the true mean of the population. For

example, one sample of 100 Chinese people may have a mean

cholesterol of 182 mg/dL and another sample may have a mean

of 186 mg/dL. Nevertheless, as the sample size grows, the prob-

ability of obtaining a result that is close to the true mean for the

population increases. The question is how large the sample size

should be to make it very likely for the sample results to be within

a narrow distance from the true mean. The answer is discussed in

the next few sections. First, we start with some general principles,

and then we go into more details.

2- Principles

Although sample size depends on many factors, there are certain

principles that apply to nearly all sample size calculations, which

we discuss in this section. Sample size nearly always depends on

the factors discussed below.

2-1- Variation

The more variation there is in the variable of interest, the larger

is the required sample size. If there is no variation, even a sample

size of n = 1 is adequate.

Example 1: We want to determine the mean salary of all interns

in a hospital. If all interns receive exactly the same salary (e.g.,

$40,000 annually), knowing the salary of only one intern is ad-

HTXDWHWRNQRZWKHPHDQƔ

Example 2: There is a disease that is universally fatal, which

is equivalent to saying that there is no variation in its outcomes in

terms of death and life. Now if a new drug cures only one case of

this disease, assuming that the diagnosis is correct, that one single

FDVHLVHQRXJKWRDFFHSWWKDWWKHGUXJZRUNVƔ

2-2- Magnitude of error that we accept

The less the magnitude of the error that we accept, the larger is

the needed sample size. This is somewhat intuitive: larger sample

size is the price that we pay for less error.

Example 3: A researcher wants to determine the mean height

of a population. Any sample would most likely estimate the mean

Abstract

7KLVSDSHUGLVFXVVHVWKHVWDWLVWLFDOSULQFLSOHVPHWKRGVDQGVRIWZDUHSURJUDPVXVHGWRFDOFXODWHVDPSOHVL]H,QDGGLWLRQLWUHYLHZVWKH

SUDFWLFDOFKDOOHQJHVIDFHGLQFDOFXODWLQJVDPSOHVL]H:HVKRZWKDWEHFDXVHRIVXFKFKDOOHQJHVVWDWLVWLFDOFDOFXODWLRQVRIWHQGRQRWSURYLGH

XVZLWKDFOHDUFXWQXPEHUIRUWKHVWXG\VDPSOHVL]HUDWKHUWKH\VXJJHVWDUDQJHRIUHDVRQDEOHQXPEHUV7KHSDSHUDOVRGLVFXVVHVVHYHUDO

LPSRUWDQWQRQVWDWLVWLFDOFRQVLGHUDWLRQVLQGHWHUPLQDWLRQRIVDPSOHVL]HVXFKDVQRYHOW\RIWKHVWXG\DQGDYDLODELOLW\RIUHVRXUFHV

Keywords:3RZHUVDPSOHVL]HW\SH,HUURUW\SH,,HUURU

Cite this article as: Kamangar F , Islami F. Sample size calculation for epidemiologic studies: Principles and methods. Arch Iran Med. 2013; 16(5): 295 – 300.

Review Article

Sample Size Calculation for Epidemiologic Studies: Principles

and Methods

)DULQ.DPDQJDU0'3K'

1,2

, Farhad Islami MD PhD

3,2

$XWKRUV¶DI¿OLDWLRQV

School of Community Health and Policy, Morgan State

University, Baltimore, USA,

Digestive Disease Research Center, Tehran Univer-

sity of Medical Sciences, Tehran, Iran,

Institute for Translational Epidemiology,

Mount Sinai School of Medicine, New York, NY, USA.

&RUUHVSRQGLQJDXWKRUDQGUHSULQWVFarin Kamangar MD PhD, Department

of Public Health Analysis, School of Community Health and Policy, Morgan

State University, Portage Avenue Campus, Baltimore, MD 21251.

E-mail: [email protected].

Accepted for publication: 17 April 2013

Archives of Iranian Medicine

, Volume 16, Number 5, May 2013

296

6DPSOH6L]H&DOFXODWLRQIRU(SLGHPLRORJLF6WXGLHV

height with some error. For example, if the real mean height of

the entire population is 182.3 cm, a sample may estimate it as

182.5 mm, which has a 2 mm error. If we want to be relatively

certain that our sample mean has no more than 1 mm of error, the

required sample size is much larger than when we accept an error

RIPPFP)RUIXUWKHUGHWDLOVSOHDVHVHH([DPSOHƔ

2-3- Probability of making a certain magnitude of error

The smaller the probability of the error, the larger the sample

size should be. Consider Example 3. We are never sure that the

error is necessarily going to be less than 1 mm. In a large popula-

tion, there are many tall people. It could turn out that a random

sample, however large, have a mean height of 3 mm higher than

the entire population. We can only increase the sample size to the

extent that with a large probability, e.g., 95% or 99%, the sample

mean be within 1 mm from the true population mean. If we want

99% certainty, we would need a larger sample size than we accept

95% certainty. This is again somewhat intuitive, as larger sam-

ple size is the price we pay for more certainty. For more details,

please see Example 6.

3- Some formulas

Each study is unique and needs its own formula. However, to

provide some examples and to illustrate the principles mentioned

above, we will provide formulas for three cases: 1) estimating the

mean height in a population; 2) comparing the effects of two treat-

ments on mean blood pressure; and 3) comparing the effects of

two treatments on mortality.

3-1- Estimating mean height

We want to determine the mean height of men aged 18 or above

in a very large population. Here, the required sample size de-

pends on three factors: 1) the variance of height in men aged 18

RUDERYHı

); 2) the maximum magnitude of error that we accept

(d); and 3) the probability that our error will be higher than the

DFFHSWDEOHPDJQLWXGHĮ7KHIROORZLQJIRUPXODFRXOGEHXVHG

to calculate sample size for this study:

)

(ı

)

Now, we discuss how each of these elements is related to the

principles discussed in Section 2.

9DULDQFHı

)

According to the formula, the more variation in height, the larger

our sample size should be. This is in line with Principle 1 in Sec-

tion 2 (Principle 2-1).

)LQGLQJWKHFRUUHFWıWRSXWLQWKHIRUPXODLVVRPHZKDWFKDOOHQJ-

ing. Determining the variance of height in the entire population

depends on knowing its mean. Since we don’t have the mean

(otherwise we wouldn’t do the study!), we cannot know the vari-

ance, so we can only estimate it, which is subject to some error.

Example 4: What number do we use for variance to estimate

the mean height of the population? If we assume that height is

normally distributed, 95% of the values of height in the popula-

tion will fall in the range of mean ± two standard deviations, in

other words in a range of four standard deviations. Therefore, if

the height of 95% of the people falls roughly between 160 cm and

200 cm, it is reasonable to assume that the population standard de-

YLDWLRQıLVDERXW· FP$OWKRXJKWKLVLVDUHDVRQDEOH

assumption, one can assume that the real standard deviation may

be 12 cm, which increases the required sample size. Researchers

PD\DOVRXVHSUHYLRXVOLWHUDWXUHLIDYDLODEOHWRHVWLPDWHıƔ

3-1-2- The acceptable maximum error (d)

According to the formula, d is in the denominator. Therefore, the

less error we accept, the larger the sample size should be. This is

consistent with Principle 2-2.

Example 5: The researchers may decide that they would like

WKH ¿QDO HVWLPDWH WR EH ZLWKLQ  FP RI WKH WUXH QXPEHU  7KLV

means that if our results show a mean height of 178 cm, we hope

that the true number is between 177 cm and 179 cm. To show the

effect of dZH¿[RWKHUIDFWRUVIRUH[DPSOH= DQGVWDQGDUG

GHYLDWLRQı FP8QGHUWKHVHFLUFXPVWDQFHVLIZHDJUHHWR

a maximum error of 1 cm, our sample size needs to be 400 (4 ×

·+RZHYHULIZHDFFHSWDPD[LPXPHUURURIFP

PPWKHQWKHVDPSOHVL]HQHHGVWREHî·Ɣ

Example 5, in addition to illustrating the effect of acceptable er-

ror, shows that determination of sample size is not entirely clear-

cut. One can substantially increase or decrease sample size by

changing one of these factors, particularly by changing the ac-

ceptable error.

3UREDELOLW\RIHUURUIDOOLQJRXWVLGHGĮ

As mentioned earlier, there is always a probability that our error

is larger than d7KLVSUREDELOLW\LVVKRZQDVĮDQGDIXQFWLRQRI

it, (Z

) , or simply Z, appears in the formula. The smaller is

WKHĮWKHODUJHULVWKH=7KHUHIRUHDVPDOOHUSUREDELOLW\RIHUURU

needs a larger Z, and consequently a larger sample size. This is

consistent with Principle 2-3.

Example 6:0DQ\HSLGHPLRORJLFVWXGLHVFKRRVHWKHLUĮWREH

,QWKLVFDVH=ZKLFKLVDIXQFWLRQRIĮZLOOEH,I

RQHZDQWVWRUHGXFHĮWRWKHQ=ZLOOEH5HGXF-

ing the probability of error requires a larger Z and hence a larger

VDPSOHVL]HƔ

6LQFHFKRRVLQJ Į LV DWWKH GLVFUHWLRQRI WKHUHVHDUFKHU WKHUH-

quired sample size is not entirely clear-cut. This is a lesson that

we learnt from choice of d too.

If you feel that you have had enough of formulas, you can skip

the rest of this section and go to Section 4. However, if you are

interested in reading two more examples, go through Sections 3-2

and 3-3.

3-2- Comparing mean blood pressures

Suppose a study has one clearly-stated main objective: “To com-

pare mean systolic blood pressures between patients receiving six

months of treatment X versus those receiving six months of treat-

ment Y”. In this example, the formula is slightly more complex

than the formula shown in Section 3-1, but many of the elements

are common among the two. The sample size depends on four

IDFWRUVWKHYDULDQFHRIEORRGSUHVVXUHLQHDFKJURXSı

); 2)

the minimum difference that we would like to detect between the

WZRWUHDWPHQWVGWKHSUREDELOLW\RIW\SH,VWDWLVWLFDOHUURUĮ

DQGWKHSUREDELOLW\RIW\SH,,VWDWLVWLFDOHUURUȕ7KHIROORZLQJ

formula could be used to calculate sample size for each treatment

group:

1-E

)

(ı

ı

)

Archives of Iranian Medicine

, Volume 16, Number 5, May 2013

297

).DPDQJDU),VODPL

Now, we discuss the application and the intuitive meaning of

each of the factors in this formula.

9DULDQFHı

)

As discussed in Principle 2-1, sample size should be larger when

variance of the blood pressure reduction is larger. Conversely, if

there is no variation, for example if treatments X and Y decrease

blood pressure in each participant by exactly 25 mmHg and 20

mmHg, we could accurately estimate the difference with only one

person in each treatment group.

3-2-2- The minimum difference that we would like to

detect (d)

In general, detecting a large difference requires a small sample

size but detecting a small difference requires a large sample size.

This is in line with Principle 2-2.

Finding the right d to put into the formula is challenging. Before

the study, we don’t know what the difference between the two

treatments is. Therefore, we ask ourselves: “What is the small-

HVWGLIIHUHQFHWKDWPDWWHUVWRXV"´,IZHGHVLUHWR¿QGRQO\ODUJH

differences between the two treatments, e.g., 10 mmHg, then the

sample size wouldn’t need to be that large. However, if we want

to observe even very small differences, e.g., 1 mmHg, then sam-

ple size should be much larger, 100 times larger than that needed

for the former situation.

It is intuitively understandable that detecting smaller differences

requires larger sample sizes. Suppose you want to compare two

students for their English spelling. If the difference between the

two students is large, i.e., one student’s spelling is far better than

the other one, you could perhaps see the difference by asking only

10 questions. On the other hand, if both students are very strong

and the difference is minimal, you may need to test them with 200

questions before you learn who is better.

3UREDELOLW\RIW\SH,HUURUĮ

Type I error occurs when we erroneously reject the null hypoth-

esis. To make it simpler and more relevant to our own example,

type I error occurs if in truth (i.e., if one studies the entirety of our

target population) the two treatments affect the mean blood pres-

VXUHH[DFWO\WKHVDPHEXWLQRXUVWXG\VDPSOHZH¿QGWKDWWKH\DUH

GLIIHUHQW7KLVLVREYLRXVO\DQHUURUEHFDXVHZH¿QGDGLIIHUHQFH

where in reality there is no difference. Such errors may happen

due to sampling variation. Thinking about tossing a coin (rather

than blood pressure) may make it easier to understand.

Example 7: Let’s say we want to know whether two coins are

different with regards to their shape, such that the percentage of

the heads for each of the coins is different. To learn this, we toss

the two coins several times, compare the percentage of the heads,

and perform statistical tests. However, in a single study, two com-

SOHWHO\ VLPLODU FRLQV PD\ KDYH VWDWLVWLFDOO\ VLJQL¿FDQW GLIIHUHQW

results, which is a type I error.

To understand what was said above, let’s change the experiment.

7RVVWKH¿UVWFRLQWLPHV%\FKDQFH\RXPD\JHWWZRKHDGVRXW

of 10 (20%). Toss the same coin another 10 times, and you may

get nine heads out of 10 (90%). The difference between these

WZR QXPEHUV  DQG  LV VWDWLVWLFDOO\ VLJQL¿FDQW ZLWK D

two-sided Fisher exact P-value of 0.003. Since you used the same

coin, obviously the difference in the percentage of heads in the

two series of tosses was not due the design or shape of the coin; it

was merely due to random variation (chance). In statistical terms,

this was type I error, because while there was no difference, you

GHWHFWHGRQHƔ

,QVWXG\ GHVLJQ ZH XVXDOO\ ¿[WKH SUREDELOLW\RIW\SH, HUURU

For example, if we want a two-sided type I error probability of

0.05 (5%), its corresponding Z will be 1.96. For a type I error of

0.01, Z will be 2.58. If we want a smaller type I error, then our

Z, and consequently our sample size would be larger. In other

words, if we ask for a smaller probability of error, we need a larger

sample size, which is in line with Principle 2-3.

3UREDELOLW\RIW\SH,,HUURUȕ

Type II error occurs when in truth the two treatments are differ-

HQWEXWZHGRQRW¿QGWKHGLIIHUHQFHLQRXUVWXG\VDPSOH7KLV

happens quite commonly if the sample size is not large enough.

We obviously want to reduce the probability of such errors, i.e.,

we want to detect differences if they exist. Reducing type II error

is also called increasing the power of the study. If we want larger

power, or smaller type II error, our Z will increase, which requires

larger sample size. Intuitively, the smaller the probability of error,

the larger our sample size should be (Principle 2-3).

3-3- Comparing mortality

Let’s discuss the third case. We want to compare the effects of

treatments X and Y on reducing mortality. To do this, we ran-

domize subjects into two treatment groups, receiving X or Y. The

required sample size for each group could be obtained from the

formula below:

1-E

)

[(S



uS



S



uS



@

The elements used in this formula are exactly the ones used in

WKHSUHYLRXVIRUPXODH[FHSWWKDWYDULDQFHLVUHSODFHGE\ʌuʌ,

ZKHUHʌLVWKHSURSRUWLRQRISHRSOHZKRGLHLQWKHHQWLUHSRSXOD-

WLRQZLWKLQ¿QLWHVDPSOHVL]H,WFDQEHVKRZQWKDWWKLVODWWHUHOH-

ment plays the role of variance when the outcome is dichotomous

(died or did not die).

4- Factors that need to be determined and their impact

on sample size

A number of formulas were introduced in the previous section

to calculate the study sample size. The question is: “What is the

right formula for our study and what are the right numbers to put

into it?” A number of decisions should be made before we choose

the right formula. And after the formula is selected, we need to

decide what numbers to put into the formula. Such decisions,

which usually have enormous impacts on our calculated sample

size, are discussed here.

4-1- Deciding on the main objective of the study

The main objective of the study is the primary factor in selecting

the formula. For example, in Section 3, we chose three different

formulas for studies with three different objectives. The objec-

WLYHVKRXOGEHVSHFL¿HGYHU\FOHDUO\(YHQVOLJKWFKDQJHVWRWKH

objectives may have a substantial impact on sample size.

Example 8: The objective of a study is: “To compare the ef-

fects of treatments X and Y on serum cholesterol in a randomized

parallel design trial.” As simple as it sounds, the objective still

QHHGVWREHPRUHFOHDUO\GH¿QHGDVLWZLOOKDYHDPDMRULPSDFWRQ

sample size. For example, the required sample size would be dif-

Archives of Iranian Medicine

, Volume 16, Number 5, May 2013

298

6DPSOH6L]H&DOFXODWLRQIRU(SLGHPLRORJLF6WXGLHV

ferent if we chose to compare the effects of the treatments in a su-

periority trial – i.e., a trial that determines which drug is superior

– versus if compared them in a noninferiority trial – i.e., our plan

is to show that the new drug is not inferior to the standard treat-

ment by a certain amount. Also, it would make a big difference

if we decide to compare the mean cholesterol reduction versus

we decide to compare the proportion of people whose cholesterol

UHDFKWKHWDUJHWRIPJG/Ɣ

Unfortunately, determination of the main study objective is not

always straightforward, particularly in observational studies, such

as case-control and cohort studies. Consider the example of a

cohort study. During follow-up in a cohort study, there will be

a large number of possible outcomes, including overall mortal-

ity, ischemic heart disease mortality, and esophageal cancer inci-

dence. If the main outcome is a common event, such as overall

mortality or mortality from ischemic heart disease, the sample

size doesn’t have to be very large, whereas if the main outcome is

esophageal cancer incidence, a relatively uncommon cancer, then

sample size has to be quite large. Likewise, in a typical cohort

study, we collect information on a number of exposures, with each

exposure having its own distribution. Although protocols often

require that we determine sample size, it may not always be easy

to determine in advance what the main outcome and the main ex-

posure is. The usefulness of a cohort goes way beyond one out-

come and one exposure.

4-2- Selecting the design of the study

Design of the study may have a major impact on the sample

size. For example, it makes a large difference when we com-

pare the effects of treatments X and Y on serum cholesterol in

a parallel design trial versus a cross-over trial. Cross-over trials

often need much smaller sample sizes, as each person receives

both treatments. Also, each person serves as his or her own con-

trol, eliminating interpersonal variance, which again results in a

smaller sample size.

4-3- Deciding on the proportion of participants distributed into each

study arm

In the formulas discussed in Sections 3-2 and 3-3, we assumed

equal sample size in each of the two treatment groups. However,

we may decide otherwise. For example, in comparing treatments

X (an old treatment) and Y (the new treatment), we may decide to

randomize more people into receiving Y. This is because X is a

well-known and long-used treatment, but Y is a new one and we

want more information on it, particularly about its side effects.

Given equal variance in the two study arms, uneven distribution

of participants in the study arms requires higher sample size to

obtain the same power.

Example 9: We want our study participants to be distributed

with a ratio of 1 to 3 into X and Y treatment groups, respectively.

If so, a total sample size of 1333 (333 in X and 1000 in Y) will

give us the same power as randomizing 1000 people equally to

each study group (500 X and 500 Y). Here, we need a 33% in-

FUHDVHLQVDPSOHVL]HƔ

4-4- Deciding between Bayesian versus frequentist methods

All of the formulas and much of the discussion made in this pa-

per are based on the frequentist view of probability. This is be-

cause, at least thus far, much of the currently practiced statistics

is based on frequentist methods. For example, P-value, power,

type I error, and much of all other familiar statistics is rooted in

frequentist view.

However, Bayesian methods are gaining popularity. If Bayesian

analysis is considered, sample size calculations will be totally dif-

ferent. Sample size and power calculations for studies designed to

be analyzed using Bayesian methods heavily depend on the prior

distributions. Without going into any details, prior distributions

may come from various sources, including our beliefs. For exam-

SOHLIDSROLWLFDOOHDGHUEHOLHYHVWKDWKLVRSLQLRQLVGH¿QLWHO\FRU-

rect, no matter how much data you show him, he will stand by his

prior opinion. If so, even a huge sample size showing the contrary

would do no good! This one, of course, was an extreme example!

Using prior distributions could be very helpful in some cases.

4-5- Deciding the numbers to put in the formulas

Consider the study presented in Section 3-2. Assume the litera-

ture suggests that the variance of blood pressure in each group is

20 mmHg. If we choose a power of 0.90 and a type I error level

of 0.01, and we want to detect a d = 2 mmHg, the required sam-

ple size would be nearly 6000. However, if we choose a power

of 0.80 and a type I error level of 0.05, and we want to detect a

d = 3 mmHg, the required sample size would be approximately

1400. Therefore, with some minimal changes in requirements, all

perfectly reasonable and within the ranges used by clinicians and

VWDWLVWLFLDQVZHFDQ¿QGWKHUHTXLUHGVDPSOHVL]HWREHDVORZDV

1400 or as high as 6000.

5- The impact of assumptions

Several assumptions have been made for doing the calculations

made in Section 3. Departures from these assumptions may make

sample size calculations incorrect. Below, we provide a few ex-

amples of the assumptions and show their impact on sample size

calculations. To make it simple, in all examples we have assumed

that the calculated sample size under the assumption is 1000.

5-1- Independence of study samples

The formulas and methods discussed so far assume that indi-

viduals in the sample are independent. However, if they are not,

then the sample size must be larger to accommodate for lack of

independence (sometimes referred to as clustering).

Example10: Consider the extreme example that identical twins

always respond identically to a drug; i.e., the correlation between

response from identical twins is 1.00. If so, when a researcher re-

cruits 500 pairs of identical twins, although the sample size is 1000,

it only provides us with information equivalent to 500 people; once

we know the response from one twin, having the second one adds no

further information. Here we say the effective sample sizeLVƔ

Example 11: Assume that to reduce costs of enrolling study

participants, rather than selecting 1000 people randomly from an

entire population, we randomly select 20 villages from the popu-

lation and then randomly select 50 individuals from each village

(two-stage cluster sampling). Since the responses obtained from

each village can be correlated, the effective sample size may be

less than 1000. If so, the effective sample will fall somewhere be-

tween the number of independent units (here, number of villages

= 20) and the total number of study participants (here, 1000). In

other words, our sample selection is not quite as good as recruiting

1000 independent people, but it is not as poor as selecting only

SHRSOH:LWKRXWJRLQJLQWRGHWDLOVRIWKHIRUPXODZHVXI¿FH

to say that the effective sample size depends on the total number

of people, the number of units, and the intracluster correlation,

Archives of Iranian Medicine

, Volume 16, Number 5, May 2013

299

).DPDQJDU),VODPL

i.e., the correlation between responses from individuals in each

village. In this example, if the intracluster correlation is 0.10, the

effective sample size approximately 170, which is indeed between

DQGƔ

5-2- No attrition

The formulas shown in Sections 3 assumed no sample attri-

tion. If we assume an attrition of 20%, then the initial sample size

VKRXOGEHODUJHU· IRUH[DPSOHLQVWHDG

of 1000. However, it may be impossible to determine the extent of

sample attrition prior to conducting the study.

,Q¿QLWHO\ODUJHWDUJHWSRSXODWLRQ

The formulas in Section 3 assumed that the target population

ZDVLQ¿QLWH,IWKHWDUJHWSRSXODWLRQLV¿QLWHWKHUHTXLUHGVDPSOH

size may be slightly lower.

Example 12: If the target population is only 20,000 people, to deter-

PLQHDPHDQZHPLJKWQHHGDVDPSOHVL]HRILQVWHDGRIƔ

As illustrated by these numbers (975 versus 1000), as long as the

sample is relatively small compared to the target population (e.g.,

less than 5% of the entire population), the difference in sample

VL]HIRU¿QLWHDQGLQ¿QLWH SRSXODWLRQV LV TXLWH VPDOO7KHUHIRUH

size of the target population is usually not considered in sample

size calculations.

5-4- No adjustment for baseline characteristics

The formulas in Section 3 did not consider adjusting for base-

line characteristics. Multiple regression methods that adjust for

baseline characteristics usually result in reduced variance, thus we

obtain more power than we actually planned.

Example 13: Assume that the outcome of a study is depression

after six months of treatment with X or Y. If we measure depres-

sion at study baseline, and baseline depression is highly correlated

ZLWKWKH¿QDORQHWKHQDGMXVWLQJIRUEDVHOLQHGHSUHVVLRQVKRXOG

LQSULQFLSOHUHGXFHWKHYDULDQFHRI¿QDOGHSUHVVLRQDQGWKXVPDNH

the study more powerful. If the correlation between baseline and

¿QDO GHSUHVVLRQ VFRUH LV  WDNLQJ WKLV LQIRUPDWLRQ LQWR DF-

count, then a sample size of 1000 will actually give us a power

HTXLYDOHQWWRKDYLQJLQWKHVWXG\Ɣ

Note that a correlation of 0.50 is very high. With a correlation of

0.10, information from 1000 people provides with a power equal

to having 1010 people. Most correlations are around this size

(0.10 or so). Therefore, correlations are often ignored in sample

size calculations.

6- Nonstatistical considerations in determining sample

size

In addition to statistical calculations, there are other issues that

may matter is choosing our study sample size. Funding, time,

number of available patients, ethical issues, similar research being

done elsewhere, and novelty of the research topic may play a role

in determination of sample size.

6-1- Funding

As discussed in the previous sections, we can determine a rea-

sonable range of sample sizes (e.g., from 1400 to 6000) for a

study. If a researcher has funding to study only 20 subjects, then

he perhaps shouldn’t pursue that study. On the other hand, if he

has large resources and large number of participants available,

then he can determine a sample size between 1400 and 6000 for

his study, depending on how much error he is willing to accept.

6-2- Ethical issues

Conducting a study with 100,000 people, where at most 6000 is

needed, may be considered unethical, particularly if the study is a

randomized trial testing a new drug.

6-3- Fixed number of patients available to the researcher

6RPHWLPHVVDPSOHVL]HLVDOPRVW¿[HG)RUH[DPSOHDPHGL-

cal researcher may have been able to collect data from 200 cases

of a rare disease over his 20 years of experience (roughly 10 per

year). If the researcher plans to increase sample size to 500, he

may need to wait another 30 years (perhaps not feasible), or col-

laborate with other centers in the world, which again may or may

QRWEHIHDVLEOH7KHUHIRUHVDPSOHVL]HLVHVVHQWLDOO\¿[HGDW

In circumstances like this, sample size formulas can be used, but

not to determine sample size, rather to learn about the power to

detect a certain difference. For example, with 200 cases and 800

FRQWUROV¿[LQJW\SH,HUURUDWDVVXPLQJDSUREDELOLW\RIH[-

posure of 0.20 in controls based on previous research, we will

have 84% power to detect a difference (reject the null hypothesis)

if the true probability of exposure among cases is 0.30. Although

WKHVDPSOHVL]HLV¿[HGZHFDQHVWLPDWHSRZHUWRGHWHFWDFHUWDLQ

difference.

In some ways, determination of sample size is like buying a

KRPHSDUWLFXODUO\ZKHQVDPSOHVL]HLV¿[HG:KHQ\RXGHFLGH

to buy a home and you can afford only $300,000, you may be able

to buy a home with two bedrooms and a large living room, or a

home with three bedrooms and a small living room. Likewise,

LIIRU¿QDQFLDORUWLPHFRQVWUDLQWV\RXFDQFROOHFWGDWDIURPRQO\

300 patients, that is what you can afford; with that you can get a

VPDOOĮDQGDODUJHȕRUDODUJHĮDQGDVPDOOȕRUDVPDOOĮDQG

VPDOOȕEXWDODUJHd<RXQHHGWRPDNHVDFUL¿FHVVRPHZKHUH

6-4- Novelty of the study

Novelty of the topic is important in making a decision to do a study

RU WR SXEOLVK D SDSHU 7KH ¿UVW UHSRUW RQ ZKDW LVQRZ NQRZQ DV

DFTXLUHGLPPXQRGH¿FLHQF\V\QGURPH$,'6SXEOLVKHGLQ

GHVFULEHGRQO\¿YHFDVHVRIWKLVGLVHDVHDOOLQ\RXQJKRPRVH[XDOV

However, given that the results were novel and the disease was rare,

it was worth being published.

Today, a report of a far larger number

of such cases may not be interesting enough for publication.

6-5- Similar studies being underway

Similar studies being conducted in other places could encourage

or discourage conducting studies with relatively small sample sizes.

On the one hand, availability of results from many similar studies

may take away from novelty of the study. On the other hand, if mul-

tiple low-powered studies are conducted, then one could potentially

do a meta-analysis or a combined analysis to increase power. There-

IRUHDOWKRXJKHDFKVWXG\E\LWVHOIPD\QRWEHGH¿QLWLYHFRPELQHG

together, they would greatly contribute to our knowledge.

7- Methods used to calculate sample size

Sample size can be calculated using formulas or simulation

methods. In the example below, we will calculate the sample size

using a formula.

Example 14: We would like to compare, in a randomized paral-

lel design trial, the effect of treatments X and Y in reducing serum

cholesterol in a group of hypercholesterolemic patients. What is

Archives of Iranian Medicine

, Volume 16, Number 5, May 2013

300

6DPSOH6L]H&DOFXODWLRQIRU(SLGHPLRORJLF6WXGLHV

the required sample size?

7RDQVZHUWKLVTXHVWLRQZH¿UVWQHHGWRGHWHUPLQHZKDWSDUDP-

eter we are going to compare: the percentage of patients whose

cholesterol is reduced to target levels after treatment, or the mean

cholesterol after treatment? Let’s assume we are going to compare

means. If we want equal number of patients randomized to each

group, the formula for calculating sample size in each group is:

1-E

)

(ı

ı

)

Now, we need to determine each of the components. Let’s as-

sume that we accept a type I error of 0.05, for which Z is 1.96;

and a type II error of 0.20 (power of 0.80), for which the cor-

responding Z is 0.84. We need to provide an estimated variance

of cholesterol after treatment. After some literature review, we

determine that a standard deviation of 30 mg/dL is a reasonable

estimate for each of the treatments. Most importantly, we need to

determine the minimum mean cholesterol difference between the

two groups that is clinically useful and meaningful to us. Let’s

say a difference of 3 mg/dL is the minimum that we would like to

be able to detect; below that, if we don’t detect the difference, it

doesn’t matter, as it is a clinical tie. Plugging these numbers into

WKHIRUPXODZH¿QGWKDWWKHVDPSOHVL]HZRXOGQHHGWREH

IRUHDFKVWXG\DUPRUDWRWDORIƔ

Since manual calculations may be tedious, software programs

have been developed to calculate sample size. For example, us-

ing STATA’s sampsi command, we obtain a sample size of 1570

for each group, or a total of 3140 cases. The minimal difference

between manual and software calculations is due to rounding.

Prior to the wide availability of computers, tables and nomo-

grams were developed and used to calculate sample size. Again,

the idea was to reduce the pain of using formulas. Nomograms

can be found in books or on the Internet.

Although they are rela-

tively easy to use, nomograms may not be available for all study

designs, objectives, or for all levels of type I and type II errors.

Therefore, they are not as versatile as computers in calculating

sample size, and their use provides little advantage over other

methods. Tables have similar problems.

Simulation is another approach used to calculate sample size or

power. This method is highly versatile – more so than using for-

mulas – and can be used to calculate sample size under nearly all

circumstances. It is most useful when there are no commands

in our statistical package to calculate sample size of our study,

mostly when the design is complex. However, simulation usu-

ally requires programming and therefore needs to be done by a

statistician. As this method requires computer power, it has be-

come more commonly used with the increased availability of

faster computers. The idea is that we generate populations with

the given parameters over and over (for example normal popu-

lations with means of 200 and 197 for treatments X and Y and

standard deviations of 30 for each one), do the appropriate test

(e.g., t-tests), and determine the proportion of the tests that found

DVWDWLVWLFDOO\VLJQL¿FDQWGLIIHUHQFHKHUH3YDOXH7KLV

latter proportion gives us the power. We can change the sample

size to see which sample size gives us adequate power.

8- Software used to calculate sample size

Sample size can be calculated using almost all commercial sta-

tistical software, such as STATA and SAS. For example, STATA’s

sampsi command and SAS’s PROC POWER can do the work for

a variety of designs. There is also freely available and relatively

easy-to-use software designed for sample size calculation. One

example is the PS Power and Sample Size Calculation program,

written by Dupont and Plummer at Vanderbilt University.

The

program provides a step-by-step guide to calculate sample size.

Another example is the Power program, written by Lubin and

Garcia-Closas at the U.S. National Cancer Institute.

4,5

This pro-

gram is particularly useful to calculate sample size when the out-

come of interest is interaction. Yet another example is Epi Info,

a free software for statistical analysis and power calculation, de-

veloped by the U.S. Centers for Disease Control and Prevention.

Conclusions

Statistical calculations of sample size depend on a number of

factors including, but not limited to, the type of the study, the

parameter that is going to be estimated (e.g., a mean or a pro-

portion), the variance of the variable of interest, the acceptable

type I and type II errors, clustering of the samples, and correlation

among variables. Such calculations are to some extent subjective,

because it is usually not obvious which numbers we should put

in the formulas. The truth is that the number that comes out of

the formula is only one acceptable number within an acceptable

range. In addition, sample size may also depend on a number of

nonstatistical factors, such as novelty of the study. As Norman

and colleagues have put it,

“Sample size estimates are like the

HPSHURU¶VFORWKHVZHFROOHFWLYHO\DFWLQSXEOLFDVLIWKH\SRVVHVV

an impressive aura of precision, yet privately we (statisticians)

are acutely aware of their shortcomings and extreme impreci-

sion.” Having discussed all of these limitations, it is still prudent

to calculate sample size statistically, as the results provide us with

a range of reasonable sample sizes, as well as information on the

power to detect a certain difference.

Acknowledgments

The authors would like to thank Dr. Ashkan Emadi (School of

Medicine, University of Maryland, Baltimore, MD), Dr. Mahsa

Mohebtash (Union Memorial Hospital, Baltimore, MD), and Ms.

Gillian Silver (School of Community Health and Policy, Morgan

State University) for reading the paper thoroughly and providing

constructive comments.

References

1. Centers for Disease Control (CDC) and Prevention. Pneumocystis pneumo-

nia--Los Angeles. MMWR Morb Mortal Wkly Rep. 1981; 30: 250 – 252.

2. Altman DG. Statistics and ethics in medical research: III How large a

sample? Br Med J. 1980; 281: 1336 – 1338.

3. Dupont WD, Plummer WD, Jr. Power and sample size calculations for stud-

ies involving linear regression. Control Clin Trials. 1998; 19: 589 – 601.

4. Garcia-Closas M, Lubin JH. Power and sample size calculations in

case-control studies of gene-environment interactions: comments on

different approaches. Am J Epidemiol. 1999; 149: 689 – 692.

5. The US National Cancer Institute. Available from: URL: http://dceg.

cancer.gov/tools/design/POWER. (Accessed Date: 23 April, 2013).

6. Norman G, Monteiro S, Salama S. Sample size calculations: should the em-

peror’s clothes be off the peg or made to measure? BMJ. 2012; 345: e5278.