Zephyrnet Logo

Telecom Industry Customer Churn Prediction with K Nearest Neighbor

Date:

Rajas Sanjay Ubhare
How To Reduce Churn Using Customer Journey Analytics | Source: Pointillist

This blog aims to predict when a customer could probably churn based on the company’s data from the previous month, to offer those customers better services. This is a supervised learning problem. At the fundamental level, the tasks involved is to Load the dataset from IBM’S Watson Community’s Telecom Customer Churn dataset. This dataset contains multiple categorical variables and a few numerical variables. Since this is a supervised classification problem, we can apply a popular classification algorithm like Decision Tree, Logistic regression, SVM, Random Forest, and clustering. We have to preprocess this categorical data, and we run it through several algorithms, make predictions and note of Accuracy, Sensitivity, Specificity, and other measures. To Utilize the k-Nearest Neighbors (k-NN) algorithm to perform classification based on the distance parameter and variable selection.

However, in this article, we are just going to concentrate on the k-NN algorithm.

The name of the Data set is WA_Fn UseC_ Telco Customer Churn.csv. It is taken from IBM Watson Telecom customer churn Dataset https://www.ibm.com/communities/analytics/watson-analytics-blog/guide-to-sample-datasets/. This data set contains 7043 rows and 21 columns. The dataset does seem to have an imbalanced dataset with regards to Churn -Yes/No. There is a higher percentage of No data. The input data will be customers specifications and contract details such as, the customer is male or female, what kind of service he/she gets from the company, how he/she pays the bills, how often he/she pays the bill, is he/she senior citizen or not and so on. The output is a column of yes and no, which defines a customer keeps using the company services and pays or decides to leave the company. Customer churning is a classification problem since our output is a discrete type of data. The output variable, Churn value, takes the binary form as “ Yes” or” NO,” it will be categorized under classification problem in the supervised machine learning.

Since our data has many variables, we would need to take a prudent and informed decision based on different tools and analysis, such as histograms, box plots, etc. This process aims to identify the significant variables that apply the algorithms to get a suggestion on significant variables, calculate, accuracy, misclassification error, sensitivity, and specificity.

Out of 7043 records and 21 variables, the following variables are found as significant.

  1. Senior citizen
  2. Partner
  3. Dependent
  4. Tenure
  5. PhoneService
  6. MultipleLines
  7. InternetService
  8. OnlineSecurity
  9. OnlineBackup
  10. DeviceProtection
  11. TechSupport
  12. StreamingMovies
  13. StreamingTV
  14. Contract
  15. PaymentMethod
  16. Monthy Charges
  17. Churn (Output variable)

K-nearest neighbor algorithm (k-NN) is a non-parametric approach used for regression and classification. For both situations, the input consists of the nearest or imminent samples of training in the feature space k. The real output usually depends on whether k-NN is used for classification based problems. The output is class membership of the k-NN classification. An object is categorized by a majority vote of its neighbors, assigning the object to the most common class of its nearest k neighbors (k is a positive integer, usually small). If k = 1, the object is allocated to the closest single neighbor’s class. In the regression k-NN, the output is the value of the object’s property. The value obtained is the average/mean of all the values of the K_nearest_neighbors.

Why do we need a K-NN Algorithm | Source: javatpoint.com

The K-nearest neighbor method is the simplest classification method that classifies based on distance measures. The plus side of going ahead with the k-NN approach is that it does not require configuration, and is strictly data-based, not based by model. Therefore, no assumptions are needed for this method.

1. 8 Proven Ways to Use Chatbots for Marketing (with Real Examples)

2. How to Use Texthero to Prepare a Text-based Dataset for Your NLP Project

3. 5 Top Tips For Human-Centred Chatbot Design

4. Chatbot Conference Online

There are some advantages and disadvantages to k-NN.

Advantages include — It is intuitive and straightforward as we are working with the distance parameter. No assumptions are required about the dataset. It can be potent with an extensively large training dataset

Disadvantages include — The required size of training dataset increases exponentially with the number of predictors

An extensive training dataset takes a long time to find distances to all the neighbors and then identify the nearest one(s).

For training the model using the k-NN algorithm, we employ the Caret package — train method. Before train() method, we will use train control() method. It controls the algorithmic complexities of the train() method.

We will create dummy variables to be provided as an input to the k-NN Algorithm, and then we will train our model. In the code snippet below we have created dummy variables by using R as our programming language.

We will divided the data into Training set and Testing set with 70% of data for training purpose and 30% data for testing purpose as evident in our code snippet.

After performing EDA (Exploratory Data Analysis) we obtained a specific set of variables that were more prominent or impacted the algorithm significantly. Thus wewe dropped the following variables: PhoneServices, MultipleLines, DeviceProtection, StreamingMovies, StreamingTV, PaperlessBilling, PaymentMethod-ElectronicCheck.

We will choose selected specific significant variables out of all the dummy variables, to be given as an input for the k-NN algorithm.

We are setting three parameters of the trainControl() method. The “method” parameter usually contains the details about the resampling method. For this project, we used repeatedcv, which is repeated cross-validation. The “number” parameter used below in R code contains the number of resampling iterations. The “repeats” parameter shown below holds the entire sets of folds to compute our repeated cross-validation or “repeatedcv.” We are using number =10 and repeats =3. For the training k-NN classifier, the train() method should be passed with the “method” parameter as “knn”.

As discussed earlier for our data, preprocessing is a mandatory and crucial task. We are passing two values in our “preProcess” parameter “center” & “scale”. These two helps to centering and scaling the data. After preprocessing, these convert our training data with mean value as approximately “0” and standard deviation as “1”.

Then we will analyze the results on three cutoff values that are 0.3, 0.4, and 0.5

Implementation at Cutoff= 0.3
Implementation at Cutoff= 0.4
Implementation at Cutoff= 0.5

Now we will compare the sensitivity, specificity, and accuracy on these three cutoff values. At the end we will pick one optimum cutoff value for our algorithm to work efficiently.

We implemented the k-NN algorithm in two cases and the output that we obtained in both the cases are as follows

Case 1: Considering all variables in the Training_Set and Testing_Set.

Case 1: Result Table
Case 1: Confusion Matrix according to Cut Off value

Case 2: Choosing the significant variables in Training_Set and Testing_Set.

Case 2: Result Table
Case 2: Confusion Matrix according to Cut Off value
Plotting yields Number of Neighbors Vs Accuracy (based on repeated cross validation)

As we know, k-NN works well with a small number of input variables; thus, this algorithm can benefit from feature selection through EDA that reduces the input feature space’s dimensionality. We implemented the algorithm on a specific set of variables that we found more prominent or effective after performing the EDA. Based on these specific sets of variables, our model obtained the Accuracy and Kappa metrics results for different k values, and our training model chooses the number of neighbors = 35 (K Neighbors) as its final value for high accuracy rate. We can see the variation in Accuracy with respect to K value by plotting these in a graph.

So we conclude that by using confusion matrix analysis for case 2, for cut-off = 0.3, we found that k-NN works best, or we can say the best optimum output is obtained with 72.76% Accuracy and 82.43% Sensitivity. Even though a cut off 0.4 gives 77% accuracy as we need high sensitivity, we choose a Cut Off 0.3 with 72.76 accuracy with 82.43% sensitivity.

Source: https://chatbotslife.com/telecom-industry-customer-churn-prediction-with-k-nearest-neighbor-1d5784952c45?source=rss—-a49517e4c30b—4

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?