Multiple Logistic Regression is used to fit a model when the dependent variable is binary and there is more than one independent predictor variable. Before you proceed, I hope you have read our article on Single Variable Logistic Regression
Other concepts and data preparation steps we have covered so far are: Business Understanding, Data Understanding, Single Variable Logistic Regression, Training & Testing, Information Value & Weight of Evidence, Missing Value Imputation, Outlier Treatment, Visualization & Pattern Detection, and Variable Transformation
We will now build a multiple logistic regression model on the Development Dataset created in Development – Validation – Hold Out blog
Let us build the Multiple Logistic Regression model considering the following independent variables and alpha significance level at 0.0001.
Interpretation of Model Summary Output
- We have set the alpha for variable significance at 0.0001. The p-value of the Gender variable is more than 0.0001 as such we may accept the Null Hypothesis, i.e. the Gender variable may be considered as insignificant and should be dropped.
- The beta coefficient of the independent variables is in line with their correlation trend with the dependent variable.
- HP_Imputed variable has a negative correlation with the Target phenomenon, hence it’s beta coefficient is negative,
- Whereas the other variables have a positive correlation and accordingly their beta coefficients have come positive.
Re-run after dropping Gender Variable
The Gender variable was insignificant in the above model run. As such, we drop the Gender variable and re-run the logistic function.
All the variables are now significant.
Model Performance Measurement
How good is our model? Is the model usable? Is the model a good fit?
To answer these questions we have to measure the model performance.
(I suggested reading our blog on 7 Important Model Performance Measures before executing the below code)
Rank Order, KS, Gains Table & Lift Chart
Shown above is Gains Table output. The “rrate” (response rate) column shows that the model is Rank Ordering with a minor crack between decile number 5 & 4. The K-S statistic of the model is 0.2894 (28.94%). The Lift in the topmost decile is 3.20.
The AUC of the model is 0.70. Based on AUC’s interpretation, the model may be considered to be a poor or just fair model.
Hosmer-Lemeshow Goodness of Fit Test
The p-value of the HL-GOF test is above 0.05. We may accept the Null Hypothesis, i.e., the model fits the data very well.
- The KS, AUC, Gini Coefficient of the above model suggests that the model is just fair. It is not a very good model. Improving the above model is an exercise for the blog reader.
- Remember, in the above regression, we have not used the Balance, No. of Credit Transactions, and SCR variable. Try to bring these variables into the model and improve the overall model performance.
- We used the Occ_WoE variable in the above model, however, we could have used the Occ_Imputed or Occ_KNN_Imputed imputed variables as created in the Missing Value Imputation step.