Dear Blog Reader – “Welcome to our Linear Regression in Machine Learning blog series”.

In this blog series, we have provided a detailed step-by-step guide to building a Linear Regression Model with R and Python syntax. With examples and datasets, we have explained the linear regression concepts and also the R/Python code output.

## Linear Regression Basics

**Example 1:** Fill the missing value in the below data table.

Input | 10 | 20 | 30 | 40 | 50 |

Output | 6 | 12 | 18 | 24 | ? |

**Ans:** The value in the missing cell should be 30. We can quickly see that there is a proportional relationship between Output and Input. **Output = 0.6 * Input**

**Example 2:** Fill the missing value for the Input-Output data given below.

Input | 10 | 20 | 30 | 40 | 50 |

Output | 8 | 14 | 20 | 26 | ? |

**Ans:** The value in the missing cell should be 32. There is a proportionality relationship between Output and Input plus a fixed constant. **Output = 0.6 * Input + 2**

**Example 3:** Fill the missing value for the Input-Output data given below.

Input | 10 | 20 | 30 | 40 | 50 |

Output | 8 | 15 | 18.5 | 26.5 | ? |

**Ans: 31.75**

To be able to predict the missing value we will have to plot the data and fit the Line of Best Fit. I hope you remember your school days working with Graph Paper and plotting the Line of Best Fit. With few data points and only two columns, we can use graph paper. However, if the number of rows and columns are many then we may have to use tools like Python and R.

The linear equation, in this case, is **Output = 0.59 * Input + 2.25. **Using the equation, we get 31.75 as the value for the missing cell.

### Sample R code for Linear Regression

Input = c(10, 20, 30, 40) Output = c(8, 15, 18.5, 26.5) linear_model = lm(Output ~ Input) linear_model$coefficients

2.25 0.59

### Sample Python code for Linear Regression

import pandas as pd import statsmodels.formula.api as sma Input = [10, 20, 30, 40] Output = [8, 15, 18.5, 26.5]

df = pd.DataFrame([Input, Output], index=["Input", "Output"]).T linear_model = sma.ols(formula ="Output ~ Input" , data = df).fit() linear_model.params

Intercept 2.25 Input 0.59

## Linear Regression Table Content

The above example was a simple example to introduce linear regression. There are lots of assumptions and concepts to learn in linear regression. We will cover all of it with R and Python code in this blog series.

Sr. No |
Linear Regression Blog Series |
Python & R |

1 | Introduction to Linear RegressionRegression is a statistical process for estimating the relationship between a dependent variable (usually denoted by y) and one or more independent variables (usually denoted by x). |
Link |

2 | Simple Linear RegressionLinear Regression with only one independent variable is Simple Linear Regression |
Link |

3 | R-Squared Concept ExplainedR Squared is a measure of how good the Linear Regression model is fitting the data |
Link |

4 | Multiple Linear RegressionLinear Regression with more than one independent variable is Multiple Linear Regression |
Link |

5 | Adjusted R-Squared Concept ExplainedThe Adjusted R Squared is a modified form of R Squared that has been adjusted for the number of predictor variables in the model. |
Link |

6 | Multi-Collinearity and Variance Inflation Factor with Python and R code.Multicollinearity is a phenomenon when two or more independent variables are highly intercorrelated, meaning that, an independent variable can be linearly predicted from one or more other independent variables. |
Link |

7 | Importance of Variable Transformation in Model DevelopmentTransformation refers to the replacement of a variable by some function. In machine learning, we apply Variable Transformation to improve the fit of the regression model on the data and improve the model performance. |
Link |

8 | No Intercept Linear Regression Model and RMSE MeasureA regression model with intercept = 0, i.e., the regression equation passes through the origin is a NO Intercept Regression Model. RMSE stands for Root Mean Squared Error, it is one of the model performance measures. |
Link |

9 | Assumptions of Linear RegressionLinearity, Homoscedasticity, Normal Error, No Autocorrelation of residual, No Perfect Multi-Collinearity, Exogeneity, and Sample Size. |
Link |

## Data file

The data file used in the blog series is **inc_exp_data.csv**. You can download it from our Resources section.

Happy Learning. If you liked this blog series then, kindly drop in your comment, feedback, and remember to share it with your friends and colleague.

Thanking you.

Team K2 Analytics

#### Related Articles:

Statistics for Data Science

Logistic Regression Blog Series

Machine Learning Certification Program

Good Read