Intro To Basic Statistics (01)

A World Full of Numbers

INTRODUCTION

Statistics is the discipline that concerns the collection organization analysis interpretation and presentation of data, which means it is a set of rules and concepts for understanding the meaning coming from data.

Let me give an illustration so that you can understand what statistics mean,

Consider that you and your parents are organizing a party and are unsure of the kind of music you should play. If you wanted to get a decent idea of what most people prefer you could ask each person personally what their favourite genre of music is but it would take a lot of time, so you opt to poll your pals instead.

You make a call asking your friends to rate various genres of music on a scale from 1 to 5. After compiling all the survey data, you can use statistics to examine the findings and determine the most popular genre of music. Calculating the mean or average score for each genre of music is one technique to do this. For instance, if 10 individuals gave rock music an average rating of 4.5 and only three people gave country music an average rating of 2.0 you could insert that rock music is preferred by your friends over country music.

You can Discover how confident you can be in your judgments using statistics. Your answers might not be very reliable, for instance, if you simply polled a small group of friends. Yet you can be more confident that your responses are typical of what most people like if you polled a larger group.

Types of Statistics

There are mainly two kinds of statistics;

  1. Descriptive statistics

  2. Inferential statistics

Let’s understand what descriptive statistics are and what you mean by inferential statistics.

Descriptive Statistics

Descriptive statistics refers to analyzing data summarization of data and organizing data in the form of numbers and graphs.

In layman's terms, we can understand descriptive statistics is mainly visualization of data.

A few examples of descriptive statistics are bar plots, histograms, pie charts, probability distribution functions, cumulative distribution functions, or normal distribution.

The measure of Central tendency which is Mean, Median and Mode, Measure of variance and standard deviation all come under descriptive statistics.

Do not worry, we'll get to know each term further.

Before that let me give you an illustration of how we use Descriptive Statistics;

Let's assume we want to compare Lionel Messi and Cristiano Ronaldo's goal-scoring performances in a season. we can use descriptive statistics to summarize their performances

  1. Count: We can count the number of goals each player scored in the season. If Messi scored 30 goals and Ronaldo scored 25 goals then Messi scored more goals than Ronaldo.
my_list = [10, 20, 30, 40, 50, 10, 30, 20, 40, 10]

count = len(my_list)

print("Count of elements in the list is:", count)
Count of elements in the list is: 10
  1. Mean: we can calculate the average number of goals per match for each player. If Messi played 50 matches and scored 30 goals his average goal per match would be 0.6. If Ronaldo played 40 Matches and scored 25 goals his average goal per match would be 0.625. In this case, Ronaldo has a slightly higher average.
numbers = [10, 20, 30, 40, 50]

mean = sum(numbers) / len(numbers)

print("Mean of the numbers is:", mean)
Mean of the numbers is: 30.0
  1. Median: We can find the middle number of goals scored by each player. If Messi scores 30 goals in the season, we would need to find the 15th goal to get the median. If Ronaldo scores 25 goals in the season, we would need to find the 13th goal to get the median.
import statistics

numbers = [10, 20, 30, 40, 50]

median = statistics.median(numbers)

print("Median of the numbers is:", median)
Median of the numbers is: 30
  1. Mode: We can find the most common number of goals scored by each player. If Messi scored 30 goals in the season, 30 would be the mode. If Ronaldo scores 25 goals in the season, 25 would be the mode.
from collections import Counter

numbers = [10, 20, 30, 40, 50, 10, 30, 20, 40, 10]

count = Counter(numbers)

mode = max(count, key=count.get)

print("Mode of the numbers is:", mode)
Mode of the numbers is: 10

So, using descriptive statistics we can summarize and compare Messi and Ronaldo's performances in a season. However, one must understand that these statistics are just one way to describe their performances and they may not provide a complete picture of their skills or contributions to the team.

Inferential Statistics

It means taking some samples from population data and performing tests.

Performing inferential statistics, we can come up with certain conclusions, inferences, and decisions for that specific population. A few of the examples that come under inferential statistics are confidence intervals and hypothesis testing which are Z test and T-test or Chi-square tests.

We will understand each of the terms which I have stated above. Before that let us understand what exactly inferential statistics means by an illustration.

Let's say we have a group of 100 football players including famous players like Messi, Ronaldo and Mbappe. We want to know how fast these players can run. but instead of measuring the speed of all 100 players which would be time-consuming, we came up with the idea to measure the speed of 10 out of them.

To make a good guess about how fast all hundred players can run based on the speed of just 10 of them we use inferential statistics. This involves using some special math to figure out how likely it is that the speed of the 10 players we measured is a good representation of the speed of all 100 players.

Let's say we found that the average speed of the 10 players we measured is 20 miles per hour. Using inferential statistics, we can calculate the range of speed that we are pretty shade the rest of the players would fall into based on this sample. we might say something like “we are 95% confident that the speed of all 100 players Falls between 18 and 22 miles per hour.”

This is kind of like predicting the outcome of a football game based on the performance of a team in their past games. just like how you can make a pretty good guess about which team will win based on how they have played in the past, we can make a pretty good guess about the speed of all 100 football players based on the speed of just 10 of them using inferential statistics.

Before moving forward, let's understand a few of the basic terms used in statistics. Such as range standard deviation and variance.

Taking an illustration of cricket,

Range: MS Dhoni is a well-known Indian cricketer who is known for his excellent captaincy and finishing abilities in Limited overs cricket. In one day of cricket, he scored a total of 10,073 runs in his career. Let's say we want to know how much MS Dhoni scores in his one-day cricket career.

To do this we calculate the range of his scores by subtracting his lowest score from his highest score. According to his one-day cricket career stats, his highest score is 183* and his lowest score is 0. So, the range of his scores in one-day cricket is 183 - 0 = 183.

This means that the difference between his highest score and his lowest score in one-day cricket is 183 runs. It's important to note that the range doesn't give us any information about how consistently he scored runs or how frequently he scored high or low. It simply tells how much his score varied from his highest score to his lowest score.

numbers = [10, 20, 30, 40, 50]

range = max(numbers) - min(numbers)

print("Range of the numbers is:", range)
Range of the numbers is: 40

Variance: It is a measure of how spread out a set of data is. It tells us how far each data point is from the mean or average value. For example, let's say Virat Kohli scores 50 runs in a match. If we look at his scores in 10 matches and calculate the variance, we will get a number that tells us how much his scores vary from match to match.

import statistics

numbers = [10, 20, 30, 40, 50]

variance = statistics.variance(numbers)

print("Variance of the numbers is:", variance)
Variance of the numbers is: 250

Standard Deviation: It is just the square root of variance. It tells us how much the data is spread out about the mean. In simple terms, it gives us an idea of how much a player's scores differ from their average score.

Let's speak as an example of Sachin Tendulkar, one of the greatest cricketers of all time. Sachin has an incredible batting average of 53.78 in test cricket. This means that his average score is 53.78 runs per inning. However, we know that Sachin's scores varied from match to match. In some matches, he scored very high while in others he scored very low. By calculating the standard deviation of his scores, we can get an idea of how much his score varied.

import statistics

numbers = [10, 20, 30, 40, 50]

stdev = statistics.stdev(numbers)

print("Standard deviation of the numbers is:", stdev)
Standard deviation of the numbers is: 15.811388300841896

That's the end of my very first article readers!

Will be explaining more in my following blogs!

"Statistics are the grammar of science." - Karl Pearson

Do subscribe and keep supporting! 😊