A “No Intercept Linear Regression Model” is a linear regression model with intercept = 0, i.e., the regression equation passes through the origin. The aim of this blog is to help you compare regression models with and without intercept.

 

RMSE of Model With Intercept

The Root Mean Squared Error of our Intercept Model is 5872. Given below is the code. Execute and try out for yourself.

### Python code to import the File ### (File download link)
import pandas as pd 
inc_exp = pd.read_csv("Inc_Exp_Data.csv")
# Log Transformation step
import numpy as np
inc_exp['Ln_Emi_or_Rent_Amt'] = np.log(
       inc_exp['Emi_or_Rent_Amt'] + 1)
 
m_linear_mod_2 = sma.ols(
       formula = "Mthly_HH_Expense ~ Mthly_HH_Income+\
       No_of_Fly_Members + Ln_Emi_or_Rent_Amt  ",
       data = inc_exp).fit()

print("Adj. R-Squared ", 
       m_linear_mod_2.rsquared_adj.round(4))
 
Adj. R-Squared 0.7436

 

# Predict the expected monthly expense
expected = m_linear_mod.predict(inc_exp)
    
result = pd.DataFrame()
result["observed"] = inc_exp["Mthly_HH_Expense"] 
result["expected"] = expected.astype(int)
 
## Residual
result["residual"] = result["observed"] - result["expected"]


## Root Mean Squared Error = sqrt( (y - yhat)^2/n )
from sklearn import metrics 
print("Root Mean Squared Error:", 
      np.sqrt(metrics.mean_squared_error( result["observed"], 
      result["expected"] ) ).round(2) ) 
Root Mean Squared Error: 5872.30
### Import File ### (File download link)

inc_exp <- read.csv("Inc_Exp_Data.csv")
inc_exp$Ln_Emi_or_Rent_Amt = log(inc_exp$Emi_or_Rent_Amt + 1)
m_linear_mod_2 <- lm( Mthly_HH_Expense ~ Mthly_HH_Income 
                    + No_of_Fly_Members + Emi_or_Rent_Amt, 
                    data = inc_exp )



cat("Adj. R-Squared ", summary(m_linear_mod_2)$adj.r.squared)

Adj. R-Squared 0.7435739
 

View Predicted Values of Intercept Model

result.head(6) ### Python syntax 

head(result)    ### R syntax

 

Intercept Model Estimated Values for RMSE Calculation

Note: One of the expected values is negative. The estimated values in Linear Regression are not bounded and can take any value from minus infinity to plus infinity. However, Monthly Household Expense cannot be negative. To overcome it, we may choose to build a NO INTERCEPT LINEAR REGRESSION MODEL.

 

No Intercept Linear Regression Model

“No Intercept” regression model is a model without an intercept, intercept = 0. It is typically advised to not force the intercept to be 0. You should use No Intercept model only when you are sure that Y = 0 when all X = 0.

The RMSE of the No Intercept Model is 6437. It is more than the Intercept Model. The code is given below:

# The minus one (-1) in formula is Python Syntax 
# for no intercept model
no_intercept_model = sma.ols(
    formula = "Mthly_HH_Expense ~  Mthly_HH_Income +\
           No_of_Fly_Members + Ln_Emi_or_Rent_Amt - 1",
    data = inc_exp).fit()
print("Adj. R-Squared of No Intercept Model:", 
      no_intercept_model.rsquared_adj.round(4))
Adj. R-Squared of No Intercept Model: 0.9114

 

expected = no_intercept_model.predict(inc_exp)

# nim -> No Intercept Model
nim_result = pd.DataFrame()
nim_result["observed"] = inc_exp["Mthly_HH_Expense"] 
nim_result["expected"] = expected.astype(int)
nim_result["residual"] = nim_result["observed"] 
                        - nim_result["expected"]

from sklearn import metrics

print("RMSE of No Intercept Model:", 
      np.sqrt(metrics.mean_squared_error(
          nim_result["observed"], nim_result["expected"]
      ) ).round(2) )
RMSE of No Intercept Model: 6437.53

View Predicted Values of No Intercept Model

nim_result.head(6) ### Python syntax 

head(nim_result)    ### R syntax

 

Estimated Values for calculation of RMSE of the No Intercept Model

Note: None of the expected (predicted) value is negative.

Comparison of Intercept & No Intercept Model

Model Selection

The comparison of the two models shows – “RMSE of Intercept Model is lower than that of No Intercept Model”. In model selection, we should give preference to RMSE over Adj. R Squared. Why?

1. Adj. R-Squared is a relative measure of fit, whereas RMSE is an absolute measure of fit.

2. Adj. R-Squared for a No Intercept Model is computed by assuming the mean of the dependent variable to be equal to 0, which is not true. The mean of the dependent variable, i.e., Monthly Expense is 18818.

3. R-Squared is in proportion terms and is unitless, whereas the unit of RMSE is the same as the unit of the dependent variable.

Ensemble of Intercept & No Intercept Model

The issue with the Intercept Model is that it is giving negative values for our dependent variable. We may probably use an ensemble of No Intercept Model and Intercept Model. If the predicted value of the Intercept Model is negative, then we consider the predicted value of the No Intercept Model.

result["ens_expected"]= result.apply(
    lambda row: row["expected"] if row["expected"]>0 
        else row["nim_expected"], axis=1)

print("RMSE of Ensemble Model:", 
      np.sqrt(metrics.mean_squared_error(
          result["observed"], result["ens_expected"]
      ) ).round(2) )
RMSE of Ensemble Model: 5844.79

By taking the ensemble of both models, we have reduced the RMSE.

Next Blog

In the next blog, we will learn about the Linear Regression assumptions. I hope you the student/blog reader would have enjoyed the Linear Regression series so for. Kindly leave your suggestion/comments in the comment section. Moreover, let me know if this model can be further improvised.

<<< previous blog         |         next blog >>>

How can we help?

Share This

Share this post with your friends!