No Intercept Linear Regression Model and RMSE

A “No Intercept Linear Regression Model” is a linear regression model with intercept = 0, i.e., the regression equation passes through the origin. The aim of this blog is to help you compare regression models with and without intercept.

RMSE of Model With Intercept

The Root Mean Squared Error of our Intercept Model is 5872. Given below is the code. Execute and try out for yourself.

### Python code to import the File ### (File download link)

import pandas as pd

inc_exp = pd.read_csv("Inc_Exp_Data.csv")

# Log Transformation step

import numpy as np

inc_exp['Ln_Emi_or_Rent_Amt'] = np.log(
       inc_exp['Emi_or_Rent_Amt'] + 1)

m_linear_mod_2 = sma.ols(

       formula = "Mthly_HH_Expense ~ Mthly_HH_Income+\
       No_of_Fly_Members + Ln_Emi_or_Rent_Amt  ",
       data = inc_exp).fit()

print("Adj. R-Squared ", 
       m_linear_mod_2.rsquared_adj.round(4))

Adj. R-Squared 0.7436

# Predict the expected monthly expense
expected = m_linear_mod.predict(inc_exp)
    
result = pd.DataFrame()
result["observed"] = inc_exp["Mthly_HH_Expense"] 
result["expected"] = expected.astype(int)

## Residual
result["residual"] = result["observed"] - result["expected"]


## Root Mean Squared Error = sqrt( (y - yhat)^2/n )
from sklearn import metrics 
print("Root Mean Squared Error:", 
      np.sqrt(metrics.mean_squared_error( result["observed"], 
      result["expected"] ) ).round(2) )

Root Mean Squared Error: 5872.30

### Import File ### (File download link)


inc_exp <- read.csv("Inc_Exp_Data.csv")

inc_exp$Ln_Emi_or_Rent_Amt = log(inc_exp$Emi_or_Rent_Amt + 1)

m_linear_mod_2 <- lm( Mthly_HH_Expense ~ Mthly_HH_Income 
                    + No_of_Fly_Members + Emi_or_Rent_Amt, 
                    data = inc_exp )



cat("Adj. R-Squared ", summary(m_linear_mod_2)$adj.r.squared)

Adj. R-Squared 0.7435739

View Predicted Values of Intercept Model

result.head(6) ### Python syntax 

head(result)    ### R syntax

Note: One of the expected values is negative. The estimated values in Linear Regression are not bounded and can take any value from minus infinity to plus infinity. However, Monthly Household Expense cannot be negative. To overcome it, we may choose to build a NO INTERCEPT LINEAR REGRESSION MODEL.

No Intercept Linear Regression Model

“No Intercept” regression model is a model without an intercept, intercept = 0. It is typically advised to not force the intercept to be 0. You should use No Intercept model only when you are sure that Y = 0 when all X = 0.

The RMSE of the No Intercept Model is 6437. It is more than the Intercept Model. The code is given below:

# The minus one (-1) in formula is Python Syntax 
# for no intercept model

no_intercept_model = sma.ols(
    formula = "Mthly_HH_Expense ~  Mthly_HH_Income +\
           No_of_Fly_Members + Ln_Emi_or_Rent_Amt - 1",
    data = inc_exp).fit()

print("Adj. R-Squared of No Intercept Model:", 
      no_intercept_model.rsquared_adj.round(4))

Adj. R-Squared of No Intercept Model: 0.9114

expected = no_intercept_model.predict(inc_exp)

# nim -> No Intercept Model
nim_result = pd.DataFrame()
nim_result["observed"] = inc_exp["Mthly_HH_Expense"] 
nim_result["expected"] = expected.astype(int)
nim_result["residual"] = nim_result["observed"] 
                        - nim_result["expected"]

from sklearn import metrics

print("RMSE of No Intercept Model:", 
      np.sqrt(metrics.mean_squared_error(
          nim_result["observed"], nim_result["expected"]
      ) ).round(2) )

RMSE of No Intercept Model: 6437.53

View Predicted Values of No Intercept Model

nim_result.head(6) ### Python syntax 

head(nim_result)    ### R syntax

Note: None of the expected (predicted) value is negative.

Comparison of Intercept & No Intercept Model

The comparison of the two models shows – “RMSE of Intercept Model is lower than that of No Intercept Model”. In model selection, we should give preference to RMSE over Adj. R Squared. Why?

1. Adj. R-Squared is a relative measure of fit, whereas RMSE is an absolute measure of fit.

2. Adj. R-Squared for a No Intercept Model is computed by assuming the mean of the dependent variable to be equal to 0, which is not true. The mean of the dependent variable, i.e., Monthly Expense is 18818.

3. R-Squared is in proportion terms and is unitless, whereas the unit of RMSE is the same as the unit of the dependent variable.

Ensemble of Intercept & No Intercept Model

The issue with the Intercept Model is that it is giving negative values for our dependent variable. We may probably use an ensemble of No Intercept Model and Intercept Model. If the predicted value of the Intercept Model is negative, then we consider the predicted value of the No Intercept Model.

result["ens_expected"]= result.apply(
    lambda row: row["expected"] if row["expected"]>0 
        else row["nim_expected"], axis=1)

print("RMSE of Ensemble Model:", 
      np.sqrt(metrics.mean_squared_error(
          result["observed"], result["ens_expected"]
      ) ).round(2) )

RMSE of Ensemble Model: 5844.79

By taking the ensemble of both models, we have reduced the RMSE.

Next Blog

In the next blog, we will learn about the Linear Regression assumptions. I hope you the student/blog reader would have enjoyed the Linear Regression series so for. Kindly leave your suggestion/comments in the comment section. Moreover, let me know if this model can be further improvised.

<<< previous blog | next blog >>>

No Intercept Linear Regression Model and RMSE

RMSE of Model With Intercept

View Predicted Values of Intercept Model

No Intercept Linear Regression Model

View Predicted Values of No Intercept Model

Comparison of Intercept & No Intercept Model

Ensemble of Intercept & No Intercept Model

Next Blog

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Share This