Linear Regression in Machine Learning

Dear Blog Reader – “Welcome to our Linear Regression in Machine Learning blog series”.

In this blog series, we have provided a detailed step-by-step guide to building a Linear Regression Model with R and Python syntax. With examples and datasets, we have explained the linear regression concepts and also the R/Python code output.

Linear Regression Basics

Example 1: Fill the missing value in the below data table.

Input	10	20	30	40	50
Output	6	12	18	24	?

Ans: The value in the missing cell should be 30. We can quickly see that there is a proportional relationship between Output and Input. Output = 0.6 * Input

Example 2: Fill the missing value for the Input-Output data given below.

Input	10	20	30	40	50
Output	8	14	20	26	?

Ans: The value in the missing cell should be 32. There is a proportionality relationship between Output and Input plus a fixed constant. Output = 0.6 * Input + 2

Example 3: Fill the missing value for the Input-Output data given below.

Input	10	20	30	40	50
Output	8	15	18.5	26.5	?

Ans: 31.75

To be able to predict the missing value we will have to plot the data and fit the Line of Best Fit. I hope you remember your school days working with Graph Paper and plotting the Line of Best Fit. With few data points and only two columns, we can use graph paper. However, if the number of rows and columns are many then we may have to use tools like Python and R.
The linear equation, in this case, is Output = 0.59 * Input + 2.25. Using the equation, we get 31.75 as the value for the missing cell.

Sample R code for Linear Regression

Input = c(10, 20, 30, 40)
Output = c(8, 15, 18.5, 26.5)
linear_model = lm(Output ~ Input)
linear_model$coefficients

(Intercept) Input
2.25 0.59

Sample Python code for Linear Regression

import pandas as pd
import statsmodels.formula.api as sma

Input = [10, 20, 30, 40]
Output = [8, 15, 18.5, 26.5]

df = pd.DataFrame([Input, Output], index=["Input", "Output"]).T
linear_model = sma.ols(formula ="Output ~ Input" , data = df).fit()

linear_model.params

Intercept    2.25
Input        0.59

Linear Regression Table Content

The above example was a simple example to introduce linear regression. There are lots of assumptions and concepts to learn in linear regression. We will cover all of it with R and Python code in this blog series.

Sr. No	Linear Regression Blog Series	Python & R
1	Introduction to Linear Regression Regression is a statistical process for estimating the relationship between a dependent variable (usually denoted by y) and one or more independent variables (usually denoted by x).	Link
2	Simple Linear Regression Linear Regression with only one independent variable is Simple Linear Regression	Link
3	R-Squared Concept Explained R Squared is a measure of how good the Linear Regression model is fitting the data	Link
4	Multiple Linear Regression Linear Regression with more than one independent variable is Multiple Linear Regression	Link
5	Adjusted R-Squared Concept Explained The Adjusted R Squared is a modified form of R Squared that has been adjusted for the number of predictor variables in the model.	Link
6	Multi-Collinearity and Variance Inflation Factor with Python and R code. Multicollinearity is a phenomenon when two or more independent variables are highly intercorrelated, meaning that, an independent variable can be linearly predicted from one or more other independent variables.	Link
7	Importance of Variable Transformation in Model Development Transformation refers to the replacement of a variable by some function. In machine learning, we apply Variable Transformation to improve the fit of the regression model on the data and improve the model performance.	Link
8	No Intercept Linear Regression Model and RMSE Measure A regression model with intercept = 0, i.e., the regression equation passes through the origin is a NO Intercept Regression Model. RMSE stands for Root Mean Squared Error, it is one of the model performance measures.	Link
9	Assumptions of Linear Regression Linearity, Homoscedasticity, Normal Error, No Autocorrelation of residual, No Perfect Multi-Collinearity, Exogeneity, and Sample Size.	Link

Data file

The data file used in the blog series is inc_exp_data.csv. You can download it from our Resources section.

Happy Learning. If you liked this blog series then, kindly drop in your comment, feedback, and remember to share it with your friends and colleague.

Thanking you.
Team K2 Analytics

Statistics for Data Science
Logistic Regression Blog Series
Machine Learning Certification Program

Linear Regression in Machine Learning

Linear Regression Basics

Sample R code for Linear Regression

Sample Python code for Linear Regression

Linear Regression Table Content

Data file

Related Articles:

1 Comment

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Share This