Data Science

Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data. It is a concept that unifies statistics, data analysis, machine learning, and their related methods in order to understand and analyze actual phenomena with data.

Both the definitions as taken from Wikipedia reflect the importance of Numerical Skills & Statistics in Data Science. Anyone desirous to build a career in data science should develop the following four important skills as shown in the figure below:

introduction to statistics

Interpersonal Skills – Data science is an interdisciplinary field. A data scientist is supposed to coordinate and work with team members across departments and functions. S/he has to also be a good storyteller to explain his/her analysis.

Data Management Skills – Data Scientist’s job is to extract knowledge and insights from structural and unstructured data. A data scientist should have a good understanding of programming and data management tools like SQL, R, Python.

Business Domain Knowledge – Data analysis is done to solve business and social problems, as such, it is important for Data Scientist to develop a sound understanding of the domain.

Statistics and Numerical Skills – The job of a Data Scientist is to interpret, infer, extract insights from the analytical output and build mathematical models. For all of these, knowledge of mathematics & statistics is very essential.

There is a saying

” If you don’t know the business, data can teach you.

If you don’t understand the numbers, data won’t help you “

It may not be wrong to say that, statistics and numerical skills are essential skills of a good Data Scientist.

Statistics

Statistics is the discipline that concerns the collection, organization, displaying, analysis, interpretation, and presentation of data. (Wikipedia)

Performing a statistical analysis involves:

  • Collection of Data
  • Organization of Data (Data Preparation)
  • Displaying Data (Charts, Graphs and Tabular representation)
  • Data Analysis
  • Interpretation of Analysis Output
  • Representation of Data and Inferences

Statistics Blog Series

We aim to provide the students and aspiring data scientists with a good resource on statistics.  We are confident that you will be able to easily learn many complex concepts of statistics with ease. Moreover, we have provided Python and R code where necessary.

Sr. No. Statistics blog-series
1. Descriptive Statistics – Link
2. Types of Variables – Link
3. Measures of Central Tendency – Mean,     Median,      Mode

Practice Exercises – Link

4. Measures of Dispersion

5. Measures of Relationship

6. Overview of Tabular and Graphical Methods –  Link

Steps in Descriptive Analysis of data

  • Metadata Understanding – Link
  • Analysis of Single Categorical Variable using Frequency Distribution, Bar Plot, and Pie Chart with Python code / R code
  • Analysis of Single Continuous Variable using Histograms and Density Plots with Python / R code
  • Analysis of Two Continuous Variables using Scatter Plot and Correlation with Python / R code
  • Analysis of Two Categorical Variables using Crosstab and Stacked Bar Chart with Python / R code
7. Probability
8. Distributions
9. Sampling and Sampling Distribution
10. Central Limit Theorem
10. Normal Distribution
11. Hypothesis Testing

Thank you

Happy Learning !!!

We would like to express our sincere thanks and appreciation to BALAKUMARAN G for his contributions in this blog series on Statistics for Data Science.

Bala has done MBA in Business Analytics. He loves blogging on Data Science, Machine Learning, and Digital Marketing. He pursued him MBA internship from K2 Analytics.

Know about our Self paced Online and Instructor-led Online courses.

How can we help?

Share This

Share this post with your friends!