One of the key considerations when designing a experiment is the sample size. In general, decreasing the sample size will decrease the accuracy of the results. But what effect does decreasing the sample size have on the distribution of sample means?

Checkout this video:

## Introduction

As the sample size decreases, the spread of the distribution of sample means decreases. The distribution becomes more concentrated around the mean.

## Theoretical Results

As the sample size decreases, the distribution of sample means approaches a normal distribution. This is due to the Central Limit Theorem, which states that the distribution of the sample means will be normal, regardless of the distribution of the population. Let’s take a look at how this works.

### The Central Limit Theorem

The Central Limit Theorem is one of the most important results in statistics. It tells us that, under certain conditions, the distribution of the sum (or average) of a large number of independent random variables will be approximately normally distributed, regardless of the underlying distribution of the individual random variables.

This result is known as the “weak law of large numbers,” and it is the foundation for many powerful statistical methods, including the analysis of means, hypothesis testing, and inference for proportions.

To understand the Central Limit Theorem, it helps to first consider what happens to the distribution of a single random variable as we take more and more samples. If we are sampling from a population with a known mean and variance, then we know that the distribution of the sample means will be Normally distributed with a mean equal to the population mean and a variance equal to the population variance divided by the sample size.

As we take more and more samples, however, something strange happens: even if we are sampling from a population with a non-normal distribution, the distribution of the sample means begins to look more and more like a Normal distribution! This phenomenon is illustrated in the figure below:

![alt text](https://i.imgur.com/ CentralLimitTheorem.png)

As you can see in this figure, as we take more and more samples (n), the distribution of the sample means approaches a Normal distribution with a mean equal to the population mean (μ) and a variance equal to the population variance (σ2) divided by n. In other words, regardless of whether or not the underlying population is Normally distributed, as n gets larger and larger, our sampling distributions will look increasingly like Normal distributions!

### The Standard Error of the Mean

If you take a random sample of size 100 from a population, the mean of that sample will not be exactly equal to the mean of the population. If you take another random sample of size 100 from the same population, the second sample may have a different mean. In fact, if you took many such samples and calculated the means of each, they would all be slightly different. The distribution of these means is called the sampling distribution of the mean. It can be shown that this sampling distribution has certain properties. One important property is that its standard deviation is equal to:

standard error = standard deviation / square root(sample size)

This formula is very important because it tells us how much variation we can expect in the means of our samples if we keep everything else constant. For example, suppose we want to know how much difference there is between the average heights of men and women in our population. We could take a random sample of men and women and calculate their respective means. Even if men and women have different average heights in our population, it’s unlikely that the first sample we took would have exactly equal means. But if we took many samples (each consisting of a random selection of men and women), then calculated the means for each sample, we would expect that most (if not all) of those means would fall fairly close to each other. In other words, there would be little variation in the means from one sample to another. On the other hand, if we took only one sample consisting of just 10 men and 10 women, then it’s quite possible that the mean height for men in our sample would be very different from the mean height for women in our sample (even though men and women have equal average heights in our population). This is because there is more variation in a small sample than there is in a large sample. The standard error formula tells us how much variation to expect in the means of our samples; the smaller the standard error, the less variation we should expect.

## Simulation Results

A smaller sample size will cause the distribution of sample means to be more dispersed. This can be seen in the simulation results below. As the sample size decreases, the spread of the distribution increases.

### Varying the Sample Size

As the sample size decreases, the distribution of the sample means becomes more spread out. This is because there is more variability in the estimates when there are fewer data points. In general, as the sample size increases, the variability of the estimates decreases.

### Varying the Population Standard Deviation

The population standard deviation (σ) is a measure of how spread out the values are in a population. The larger the standard deviation, the more the values are spread out. For example, suppose we have a population with μ = 100 and σ = 10. The distribution of values would be fairly tight, with most of the values falling between 90 and 110:

Now, let’s say we decrease the standard deviation to σ = 5. The distribution of values would be even tighter, with most of the values falling between 95 and 105:

As you can see, when the population standard deviation is smaller, the distribution of sample means will be tighter (less spread out).

## Conclusion

A smaller sample size will produce a narrower distribution of sample means. This is because there is less variability in the data when there is a smaller number of data points. When the sample size is increased, the distribution of sample means becomes more spread out because there is more variability in the data.