Darpan Saxena

Predictive Segmentation of Bank Customers


The assignment uncovers the strength of the predictive segmentation of bank customers analysis through which one can predict which segment will a person belong to.

This is, obviously, achieved through some predictors which come from the historical data that the company has. These predictors are independent variables which help us compute the segment. The segment, in our analysis, is the dependent variable.

In the following parts of this document, we would describe the process of carrying out this exercise for the HDFC bank data that was given to us.


Step 1 – Identifying the number of segments

The historic company data provided to us was the response of the customers on a questionnaire. The respondents had to fill in their response based on a Likert Scale where:

1 – Strongly Agree

2 – Agree

3 – Neither Agree nor Disagree

4 – Disagree

5 – Strongly Disagree


The image presents a glimpse of the dataset

In order to identify the number of segments, we ran Hierarchical Cluster analysis. This can be found in Analyze >> Classify >> Hierarchical Cluster

The output presented has a table called the Agglomeration Schedule which tells us about how the various respondents came together, or ‘agglomerated’, and formed clusters. Here is the snippet of the agglomeration schedule so obtained.


From the agglomeration schedule, we could see that there is a big jump in the coefficients between the 14th and the 15th stage. Therefore, we understood that at effective segmentation had been achieved by the 14th stage and that we have 6 identifiable clusters.

The respondents that belonged to each cluster are as given below:



Step 2 – Cluster Profiling

Next, it was required to identify the predictors that differentiate the clusters and hence profile the clusters based on their behavioral responses. This was achieved by K-Means Clustering.

In SPSS, this can be found in Analyze >> Classify >> Hierarchical Cluster


Here, we have to enter the number of clusters that are required. This, as we know, was computed from the Hierarchical clustering as above and it came out to be 6 clusters. That is what was entered.

The output produces an ANOVA table. The last column of this table gives us the Significance value which helps us determine which predictors are important to differentiate between the clusters.

We chose the cut-off significance value of 0.01, which meant a 99% confidence level. Based on that we needed to find out those predictors whose significance is lesser that 0.01. Such variables or predictors will be the significant ones in identifying the differentiators between segments.

Following is the glimpse of that ANOVA table.


From there we picked out the most relevant variables on which the differentiation can be done. As you can see in the table below, those variables are marked green and the Final Clusters table is copied and pasted here in Excel.


As given in the sheet Segment Profiling in our Excel attached with this document, you can see that the inferences for each of the clusters has also been written for these variables that are extremely important in determining the clusters.

As a sample of those inferences is given in the image below.

In that image, you can see that for Cluster 1 we have written our inferences by interpreting their scores. These will help us in determining the choices and the preferences of this cluster which will eventually help us in naming these clusters in the next step.

After having done this for each of the clusters, we gave each of the clusters a name that is given in the sheet title Segment Label in our attached excel.

The we named our segments as

Cluster 1 – Savers, stable and independent

Cluster 2 – Carefree, Spenders

Cluster 3 – Misers, Defaulters, Borrowers

Cluster 4 – Self-sufficient strong savers

Cluster 5 – Mild Savers

Cluster 6 – Daily wager, dependent, poor

This segmentation is correct as it satisfies the conditions for the Wilk’s Lambda and the Eigen Values.



Step 3 – Predictive Segmentation

Refer to our excel sheet titled Predictive Model. There you can see that we picked up the unstandardized Fisher’s coefficients that make up our model. These are available in B4:G18.

Below that table, we have transposed all the 20 individual responses that were given in the data. The segment to which they belong to has also been written.

Now in column I, between I4:I17 we wrote one of the 20 responses. In these cells, any new response will be pasted to check which segment it belongs to.

On it’s right, we have done a tabular multiplication of B4:G17 and I4 to I17 to create the values for each segment from our Fisher’s coefficients.

Now, each column from J to O is summed.

The prediction is in the fact that the sum of whichever Segment is the largest, the responses in column I belong to that Segment.


Step 4 – Cross-checking

We can easily check whether the segment that our model is calling out is correct or not. As we can see, our model is absolutely correct.



Individual Tasks undertaken:

The entire group worked together to make the project a success. However, in different parts some of the members worked more than the others. The data analysis part was covered by Dhiraj and Darpan. A detailed analysis was brought to life by all the members of the group. The report writing was done by Ashish, Shahroz, Darpan, Rishabh and Tanay with reviews done by other members of the group.

Leave a Comment