 +91 89396 94874 info@k2analytics.co.in
Select Page

Dear Blog Reader – “Welcome to our Linear Regression in Machine Learning blog series”.

In this blog series, we have provided a detailed step-by-step guide to building a Linear Regression Model with R and Python syntax. With examples and datasets, we have explained the linear regression concepts and also the R/Python code output.

## Linear Regression Basics

Example 1: Fill the missing value in the below data table.

 Input 10 20 30 40 50 Output 6 12 18 24 ?

Ans: The value in the missing cell should be 30. We can quickly see that there is a proportional relationship between Output and Input. Output = 0.6 * Input

Example 2: Fill the missing value for the Input-Output data given below.

 Input 10 20 30 40 50 Output 8 14 20 26 ?

Ans: The value in the missing cell should be 32. There is a proportionality relationship between Output and Input plus a fixed constant. Output = 0.6 * Input + 2

Example 3: Fill the missing value for the Input-Output data given below.

 Input 10 20 30 40 50 Output 8 15 18.5 26.5 ?

Ans: 31.75

To be able to predict the missing value we will have to plot the data and fit the Line of Best Fit. I hope you remember your school days working with Graph Paper and plotting the Line of Best Fit. With few data points and only two columns, we can use graph paper. However, if the number of rows and columns are many then we may have to use tools like Python and R.
The linear equation, in this case, is Output = 0.59 * Input + 2.25. Using the equation, we get 31.75 as the value for the missing cell.

### Sample R code for Linear Regression

```Input = c(10, 20, 30, 40)
Output = c(8, 15, 18.5, 26.5)
linear_model = lm(Output ~ Input)
linear_model\$coefficients
```
(Intercept)       Input
2.25                0.59

### Sample Python code for Linear Regression

```import pandas as pd
import statsmodels.formula.api as sma

Input = [10, 20, 30, 40]
Output = [8, 15, 18.5, 26.5]
```
```df = pd.DataFrame([Input, Output], index=["Input", "Output"]).T
linear_model = sma.ols(formula ="Output ~ Input" , data = df).fit()

linear_model.params```
```Intercept    2.25
Input        0.59```

## Linear Regression Table Content

The above example was a simple example to introduce linear regression. There are lots of assumptions and concepts to learn in linear regression. We will cover all of it with R and Python code in this blog series.

 Sr. No Linear Regression Blog Series Python & R 1 Introduction to Linear RegressionRegression is a statistical process for estimating the relationship between a dependent variable (usually denoted by y) and one or more independent variables (usually denoted by x). Link 2 Simple Linear RegressionLinear Regression with only one independent variable is Simple Linear Regression Link 3 R-Squared Concept ExplainedR Squared is a measure of how good the Linear Regression model is fitting the data Link 4 Multiple Linear RegressionLinear Regression with more than one independent variable is Multiple Linear Regression Link 5 Adjusted R-Squared Concept ExplainedThe Adjusted R Squared is a modified form of R Squared that has been adjusted for the number of predictor variables in the model. Link 6 Multi-Collinearity and Variance Inflation Factor with Python and R code.Multicollinearity is a phenomenon when two or more independent variables are highly intercorrelated, meaning that, an independent variable can be linearly predicted from one or more other independent variables. Link 7 Importance of Variable Transformation in Model DevelopmentTransformation refers to the replacement of a variable by some function. In machine learning, we apply Variable Transformation to improve the fit of the regression model on the data and improve the model performance. Link 8 No Intercept Linear Regression Model and RMSE MeasureA regression model with intercept = 0, i.e., the regression equation passes through the origin is a NO Intercept Regression Model. RMSE stands for Root Mean Squared Error, it is one of the model performance measures. Link 9 Assumptions of Linear RegressionLinearity, Homoscedasticity, Normal Error, No Autocorrelation of residual, No Perfect Multi-Collinearity, Exogeneity, and Sample Size. Link

## Data file

The data file used in the blog series is inc_exp_data.csv. You can download it from our section.

Happy Learning. If you liked this blog series then, kindly drop in your comment, feedback, and remember to share it with your friends and colleague.

Thanking you.
Team K2 Analytics

How can we help?