Resources - Blog

Predictive Scenario for Classification




Predictive Scenario On Classification.

Step 1. Loading the data

Before we load the dataset into SAC, we cut out a few records from the original dataset. We will use this to apply our model later. I cut out a column ‘Y’ from the dataset for Bank Dataset.I now have 2 datasets.

Click on the Dataset as shown below.

Select the Data upload from a file.

Click on the Select Source file.

After selecting the file click on Import.

Give a name and save the Dataset.

Here I have created two Datasets as shown below. I will use second Dataset on the first Dataset to predict the outcomes.

Step 2. Training the model

Let us now build the predictive scenario. This is where our models to predict loans (Yes/NO) to the Customers will be built and trained.            

Click on the Menu→ go to Browse → Predictive Scenario as shown below.

For this problem, our predicted entity is Yes/No. So, we will build a Classification model. Select Classification.

Give the scenario a suitable name and description.

Click on the Name to insert the dataset as shown below.

Select the Dataset.

Click on Edit variable metadata (under the input dataset field) to understand how SAC has interpreted the dataset.

Next Select the Target on which you are going to predict the outcomes here I will select variable ‘Y’ to know the prediction Yes or No.

You Can also Exclude the Columns which are not having any impact/influence on the prediction, by this we can improve your results.

Click on the Train on the bottom and wait for a while.

Here you the Global Performance Indictors and Target Statistics of the dataset.

Predictive Power: your main measure of predictive model accuracy. Here the values show 75.86%, you can be more confident when the results are near 100% and you can be when you apply the predictive model to obtain predictions. You can improve this measure by adding more variables.

Prediction Confidence: your predictive model's ability to achieve the same degree of accuracy when you apply it to a new dataset that has the same characteristics as the training dataset. If it's greater than or equal to 95%, you can consider it a robust predictive model. If it's less than 95%, then you need to improve it, for example, add new rows to your dataset.

Below prediction shows the key Influencer Contributions.

  • The Influencer Contributions show the relative importance of each variable used in the predictive model.
  • The Influencer Contributions view allows you to examine the influence on the target of each variable used in the predictive model.
  • The influencers are sorted by decreasing importance.
  • The most contributive ones are those that best explain the target.

Only the contributive influencers are displayed in the reports, the variables with no contribution are hidden. The sum of their contributions equals 100%.

Scroll to the influencing chart and the select age see their contribution.  

Influencer Contributions: Here you can see which group of the most influence on the target. In Influencer Contributions, you can analyze the influence of different categories of an influencer on the target.

Confusion Matrix: Only way to assess the model performance in detail, using standard metrics such as specificity. Thanks to the Confusion Matrix you can quickly see the actual correctly detected cases (Yes) and the false detected cases (No).

Profit Simulation tab and estimate the expected profit, based on costs and profits associated with the predicted positive and actual positive targets.

Ex: Perform a Profit / Loss Simulation Now we need to identify which Customers we should focus our resources to retain. Enter 100 in the ‘Cost Per Predicted Positive’ field and 1000 in the ‘Profit Per Actual Positive’ field – 1 represents a cost of $1,00 sanction to a customer and 10 represents a $10,000 profit in return. Click on the ‘Maximize Profit’ button.

Performance Curve: To see any errors in the predictive model or producing accurate predictions. So, here they use a large panel of performance curves in the Performance Curves tab, to compare your predictive model to a random model and a hypothetical perfect predictive model.

  • Determine the percentage of the population to contact to reach a specific percentage of the actual positive target with The Detected Target Curve.
  • Check how much better your predictive model is than the random predictive model with The Lift Curve.

  • Check how well your predictive model discriminates, in terms of the compromise between sensitivity and specificity with The Sensitivity Curve (ROC).
  • Check the values for [1-Sensitivity] or for Specificity against the population with The Lorenz Curves.
  • Understand how positive and negative targets are distributed in your predictive model with The Density Curves.

Apply the other dataset by clicking on the ‘+’ sign.

After adding the other dataset, you can see prediction is 97%.

Now click on the apply the Model as shown below.

Select the folder where you want to save the Output Dataset and other details.

     

Select the output variable and click on OK.

Fill all the details as shown below.

You can see the output dataset here click on it.

Here you the see the Predicted Category and Predicted Probability.

Now create model on the dataset.

Click on the menu → create → model→ click on get data from a dataset

In the Acquire Data drill down to Dataset and click on it.

Select the Dataset.

Click on the create Model at the bottom.

Here is the model created on the Dataset.