Central Limit Theorem (04)
Table of contents
Let’s understand in this article, What's Central Limit theorem
Central Limit Theorem
The Central Limit Theorem is a statistical concept that helps us estimate the average or mean of a population, even when we don't have data for every single member of that population.
Let's say you are a teacher and you want to know the average height of all the students in your school. You can't measure the height of every single student, so you decide to take a sample of 30 students and measure their heights.
Now, if you only measured the heights of one group of 30 students, you might not get the exact average height of the entire school population. But if you take many samples of 30 students, and calculate the average height of each sample, the Central Limit Theorem says that the distribution of those sample averages will be approximately normal.
So, by calculating the average of all those sample averages, you can get a pretty good estimate of the true average height of all the students in the school. This is because the Central Limit Theorem tells us that as the sample size gets larger, the sample mean will converge to the population mean, even if the population is not normally distributed.
Example
Here's a simple example of the CLT in action. Suppose we want to simulate rolling a fare six-sided die many times and recording the average value. We can do this in Python using the numpy
library:
import numpy as np
# Define the number of dice to roll and the number of times to repeat the experiment
n_dice = 5
n_trials = 1000
# Roll the dice and record the sample mean for each trial
sample_means = []
for i in range(n_trials):
rolls = np.random.randint(1, 7, size=n_dice)
sample_mean = np.mean(rolls)
sample_means.append(sample_mean)
In this code, we are rolling n_dice dice and repeating the experiment n_trials times. For each trial, we record the average value of the rolls and store it in the sample_means list.
We can then visualize the distribution of sample means using a histogram:
import matplotlib.pyplot as plt
# Plot the histogram of sample means
plt.hist(sample_means, bins=20)
plt.xlabel('Sample Mean')
plt.ylabel('Frequency')
plt.show()
The resulting histogram should show a bell-shaped curve that is centred around 3.5, which is the expected value of a single die roll. This is an example of the CLT in action: even though the distribution of a single die roll is discrete and uniform, the distribution of sample means becomes approximately normal as the sample size increases.
We can also calculate the mean and standard deviation of the sample means:
# Calculate the mean and standard deviation of the sample means
mean_of_sample_means = np.mean(sample_means)
std_of_sample_means = np.std(sample_means)
print(f"Mean of sample means: {mean_of_sample_means:.2f}")
print(f"Standard deviation of sample means: {std_of_sample_means:.2f}")
These values should be close to the expected value of 3.5 and the standard deviation of the single die roll, which is about 1.71. As we increase the sample size, we would expect the distribution of sample means to become increasingly normal and the mean and standard deviation to approach their expected values.
End Statement
Overall, the Central Limit Theorem is a powerful tool that allows us to make inferences about a population using only a small sample. It is an important concept to understand in statistics and has many practical applications in fields such as finance, economics, and engineering.
That's the end of the article readers!
Will be explaining more in my following blogs!
"Statistics may be dull, but they have the power to shape public policy and human lives." - Hans Rosling
Do subscribe and keep supporting! 😊