Introduction to Descriptive Statistics
The branch of statistics concerned with quantitatively describing, visually presenting, or summarizing data is called descriptive statistics. In descriptive statistics, you analyze data using tabular, graphical, and numerical methods.
Using the Descriptive Statistics techniques, you can only make conclusions about the data being analyzed or summarized. You do not/cannot make any conclusion beyond that data because, in Descriptive Statistics, you do not assume the data being analyzed to be a Sample coming from a larger Population. In other words, Descriptive Statistics is not Inferential Statistics.
How to Perform Descriptive Statistics?
There are three methods to perform descriptive statistics. They are Numerical, Tabular, and Graphical methods.
Depending on whether you analyze one or more variables simultaneously, the analysis is called Univariate, Bivariate, or Multivariate.
- Univariate analysis – Analysis of only one variable at a time.
- Bivariate and multivariate analysis – The simultaneous analysis of two or more variables is called bivariate or multivariate analysis
Numerical methods as the name suggest is all about summarizing data by way of numbers. The most common ways of summarizing data using numerical methods are:
- Measures of Central Tendency – Mean, Median, and Mode
- Measures of Dispersion – Standard Deviation, Variance, Range, and Inter-Quartile Range
Tabular Methods summarizes the data in table form. The data is aggregated or summarized at the category level to show the count (frequency) of observations in each category.
Tabular Methods are mostly used for categorical variables. The continuous variables (say Income variable) can be summarized using tabular methods by converting continuous values into categorical using binning/bucketing methods.
The tabular reports can be made more informative by having proportions along with the absolute counts.
|Income Buckets||# Obs.||% Obs.|
|5K – 10K||45000||0.225|
|10K – 25K||40000||0.200|
|25K – 50K||28000||0.140|
|50K – 100K||16000||0.080|
Graphical Methods are a visual representation of the data. It is very well quoted that – “Visuals speak louder than Words”. Visuals are an important tool in the hands of the data scientist to present their analysis of the data. One can use a variety of charts and graphs to represent data. Some of the important ones are:
- Bar Plot
- Box Plot
- Pie Chart
- Line Plot
- Scatter Plot
We have discussed the plots in detail in the blog on Tabular & Graphical Methods.
It is important to understand the different types of variables (namely Categorical and Numerical) before we start applying the tabular, graphical, or numerical methods. As such, the focus of our next blog will be on Types of Variables.