Central
Limit Theorem

For practical purposes, the main idea of the central limit theorem (CLT) is that the average of a sample of observations drawn from some population with any shape distribution is approximately distributed as a normal distribution if certain conditions are met. In theoretical statistics there are several versions of the central limit theorem depending on how these conditions are specified. These are concerned with the types of assumptions made about the distribution of the parent population (population from which the sample is drawn) and the actual sampling procedure.

One of the
simplest versions of the theorem says that if is a random sample of size n (n _{} 30) from an infinite
population, finite standard deviation, then the standardised sample mean
converges to a standard normal distribution or, equivalently, the sample mean
approaches a normal distribution with mean equal to the population mean and
standard deviation equal to standard deviation of the population divided by the
square root of sample size n. In applications of the central limit theorem to
practical problems in statistical inference, however, statisticians are more
interested in how closely the approximate distribution of the sample mean
follows a normal distribution for finite sample sizes, than the limiting
distribution itself. Sufficiently close agreement with a normal distribution
allows statisticians to use normal theory for making inferences about
population parameters (such as the mean ) using the sample mean, irrespective
of the actual form of the parent population.

It is well known
that whatever the parent population is, the standardised variable will have a
distribution with a mean 0 and standard deviation 1 under random sampling.
Moreover, if the parent population is normal, then it is distributed exactly as
a standard normal variable for any positive integer n. The central limit
theorem states the remarkable result that, even when the parent population is
non-normal, the standardised variable is approximately normal if the sample
size is large enough (for n _{} 30). It is generally
not possible to state conditions under which the approximation given by the
central limit theorem works and what sample sizes are needed before the
approximation becomes good enough. As a general guideline, statisticians have
used the prescription that if the parent distribution is symmetric and
relatively short-tailed, then the sample mean reaches approximate normality for
smaller samples than if the parent population is skewed or long-tailed.

In this lesson, we will study the behaviour of the mean of samples of different sizes drawn from a variety of parent populations. Examining sampling distributions of sample means computed from samples of different sizes drawn from a variety of distributions, allow us to gain some insight into the behaviour of the sample mean under those specific conditions as well as examine the validity of the guidelines mentioned above for using the central limit theorem in practice.

Under certain conditions, in large samples, the sampling distribution of the sample mean can be approximated by a normal distribution. The sample size needed for the approximation to be adequate depends strongly on the shape of the parent distribution. Symmetry (or lack thereof) is particularly important. For a symmetric parent distribution, even if very different from the shape of a normal distribution, an adequate approximation can be obtained with small samples (e.g., 10 or 12 for the uniform distribution). For symmetric short-tailed parent distributions, the sample mean reaches approximate normality for smaller samples than if the parent population is skewed and long-tailed. In some extreme cases (e.g. binomial) samples sizes far exceeding the typical guidelines (e.g., 30) are needed for an adequate approximation. For some distributions without first and second moments, the central limit theorem does not hold.