Analytics: Fusion of Science & Art

This article is continuation to my previous article on “How I Learnt Analtyics as Science and Art Fusion

In this article, I would like to discuss it further with an example from banking. (Dear Blog Reader, this blog is somewhat heavy on Banking & Analytics)

One of the common customer behavioural variables created from TRANSACTION data in banking is the Ratio of Debit Amount to Credit Amount for a certain time period, which can be a month, quarter, 6 month or any other period. Simply put all the withdrawals and deposits transactions of customers are separately aggregated over a certain time period and then the ratio of total withdrawal (debit) to total deposits (credit) is computed.

RATIO_DR_CR = (SUM OF DEBITS AMT) / (SUM OF CREDITS AMT)

The business hypothesis behind creating this variable is that the customer who has more debits compared to credits is potential candidate for unsecured loan products or Credit Card as compared to investment product. The typical visualization trend for this variable vs. the response rate for unsecured loans (i.e likelihood of customer taking unsecured loan) would be a graph something as shown below:

Ratio of Dr. to Cr. Amt. Vs. Response Rate
(The above graph is pretty standard in analytics and hence I am not explaining it.)

Quite often we stumble upon these kinds of patterns and as an analyst, we tend to quickly resort to various kinds of variable transformation. One can think of many transformations like Outlier Treatment, Capping at 1, Binning, Creating a Flag Variable like above 1 & below 1, Creating a derived variable which is a mirror image for values above 1, etc

Sometimes the trend would be genuine and transformation would help. At times the transformed variable may come into the model along with side-effects and have a negative impact on model statistics like CHI-SQ, KS, Stability of Variable Beta, etc. It is important as an analyst to be able to pick up this clues/instability in model, interpret the model statistics and at times think harder on variable from the domain aspect.

Now put on your Domain Cap w.r.t the variable being discussed, the commonsensical thing to ask is – How Can a Customer Withdraw More Money Compared to the Amount Being Deposited. Logically the maximum value should have been 1. The likely possibilities are:

A) Customer is withdrawing from his existing balance (funds already available in the account)
B) Customer has an overdraft facility (ability to withdraw more money from account despite not having any balance)

If you can consider the existing balance as a dummy credit transaction and redefine your Ratio Variable something like:

RATIO_DR_CR = (SUM OF DEBITS AMT) / (SUM OF CREDITS AMT + EXISTING BALANCE)

You would be surprised to note that your trend may change to something as shown in the graph and the upper bound of the variable would be around 1:

New Ratio Variable Vs. Response Rate

The trend of the new variable can be much sharper and this new ratio variable may come more stable and stronger predictor in model compared to the previous variable… so depending on the situation at hand you may have to do variable transformation or at times redefine the variable itself.

Sign-Off note: The point I am trying to drive is that a robust analytical solution/model can be built only by having a good blend of statistical process and domain knowledge. I hope that you as a blog reader would concur that analytics is a fusion of science and art; If yes click on LIKE and share the blog with your friends/colleagues.


PS: Our Next Data Science Certification Program
How can we help?

Share This

Share this post with your friends!