Introduction to Descriptive Statistics

The branch of statistics concerned with quantitatively describing, visually presenting, or summarizing data is called descriptive statistics. In descriptive statistics, you analyze data using tabular, graphical, and numerical methods.

Using the Descriptive Statistics techniques, you can only make conclusions about the data being analyzed or summarized. You do not/cannot make any conclusion beyond that data because, in Descriptive Statistics, you do not assume the data being analyzed to be a Sample coming from a larger Population. In other words, Descriptive Statistics is not Inferential Statistics.

Descriptive Statistics

How to Perform Descriptive Statistics?

There are three methods to perform descriptive statistics. They are Numerical, Tabular, and Graphical methods.

Depending on whether you analyze one or more variables simultaneously, the analysis is called Univariate, Bivariate, or Multivariate.

  • Univariate analysis – Analysis of only one variable at a time.
  • Bivariate and multivariate analysis – The simultaneous analysis of two or more variables is called bivariate or multivariate analysis

Numerical Methods

Numerical methods as the name suggest is all about summarizing data by way of numbers. The most common ways of summarizing data using numerical methods are:

  • Measures of Central Tendency – Mean, Median, and Mode
  • Measures of Dispersion – Standard Deviation, Variance, Range, and Inter-Quartile Range

Tabular Methods

Tabular Methods summarizes the data in table form. The data is aggregated or summarized at the category level to show the count (frequency) of observations in each category.

Tabular Methods are mostly used for categorical variables. The continuous variables (say Income variable) can be summarized using tabular methods by converting continuous values into categorical using binning/bucketing methods.

The tabular reports can be made more informative by having proportions along with the absolute counts.

Income Buckets # Obs. % Obs.
<= 5K 65000 0.325
5K – 10K 45000 0.225
10K – 25K 40000 0.200
25K – 50K 28000 0.140
50K – 100K 16000 0.080
> 100K 6000 0.030
Total 200000 1.000

Graphical Methods

Graphical Methods are a visual representation of the data. It is very well quoted that – “Visuals speak louder than Words”. Visuals are an important tool in the hands of the data scientist to present their analysis of the data. One can use a variety of charts and graphs to represent data. Some of the important ones are:

  • Bar Plot
  • Histogram
  • Box Plot
  • Pie Chart
  • Line Plot
  • Scatter Plot

We have discussed the plots in detail in the blog on Tabular & Graphical Methods.

Next Blog

It is important to understand the different types of variables (namely Categorical and Numerical) before we start applying the tabular, graphical, or numerical methods. As such, the focus of our next blog will be on Types of Variables.

<<< previous blog         |         next blog >>>
<<< statistics blog series home >>>

How can we help?

Share This

Share this post with your friends!