**Introduction**

Hey folks. Welcome back to the series, “Evaluating Deep Learning Models with Abacus.AI”. This is the second part of the four-part series where we will cover the evaluation of regression models. Again, there is no prerequisite for understanding this post and I am assuming that you have basic familiarity with machine learning.

# Regression

In classification we were popping out a class label as the result, now in regression we will predict a continuous outcome variable using a set of input features (columns in your dataset). For example, instead of classifying someone as young or old you might be interested in predicting their age, this is where you would be interested in framing the problem as regression and getting an age value as output. Now that this problem type is clear, let’s move to a quick evaluation recipe for those who might have their businesses to run or people who just want to get a high-level idea of the quality of the model in a few seconds:

**Quick Evaluation Recipe**

1. Compare every metric score with the baseline score to see if there’s an improvement or not

2. Check if WAPE lies between 0 and 2. Lower is better

3. Check R^{2}, any value between 0.6 to 0.9 is considered good. The higher the value the better.

4. Finally, go to the prediction dashboard and see if the results generated by the model comes close to the actual values. Check about 20-30 data points.**If all of the above things look good then the model is good to go and you have trained a world-class deep learning-based regression model in just a few minutes with only a few clicks.**

# Deep Dive into Evaluating Regression Models

We provide two dashboards to evaluate your regression model on the Abacus.AI platform: the metric dashboard and the prediction dashboard. Let’s kick start with the metric dashboard that contains four accuracy measures for evaluating a regression model. You would need to get a grip on WAPE and Coefficient of Variation (R^{2}) to understand the prediction quality sufficiently.

Let’s begin with understanding WAPE. This metric can be construed as the Average Absolute Error divided by the Average Actual quantity. It ranges from 0 to infinite where 0 is a perfect result. As a general rule, the lower the value, the better. You would generally notice WAPE values lie between 0 and 2.

The second one is the Coefficient of Variation, also known as R^{2}. It is a crucial metric that explains if the features used to train the model are good or not. It ranges from 0 to 1, where 0 means the dependent variable explains no variation in the target, and 1 means all the variations are explained by the dependent variables. Thus, in general, the closer the value of R2 to 1, the better because we want our features to explain most of the variation in the target column. However, a value >0.9 is a red flag because it means that there is a great possibility of dataset columns being highly correlated to the target column. In other words, the model is overfitting to the training dataset and won’t generate useful predictions on unseen data. In such cases, you would want to ignore the correlated columns as suggested by the system and train a new model.

Note: Depending upon your data, you might not get any recommendation to ignore a column which is absolutely fine.

Now that you have some grip on the metrics it is important to understand that the scores make more sense when you compare the corresponding metric values to their baseline model scores. Secondly, you should check the “Correlation between target and column” graph on the metric dashboard to verify if the correlation makes sense. For example, the correlation graph tells that the house price mostly depends on the area of the house and its location then it makes a lot of sense but if the top correlations are found with features such as pool area and number of past owners, then you might need to verify that the data is clean (and correct) and/or you might need to add more data.

In conjunction with the metric scores, you must also utilize the prediction dashboard to properly evaluate the trained models.

**Prediction Dashboard*** – Using Test Points To Evaluate Models*

*– Using Test Points To Evaluate Models*

The prediction dashboard has a button called ‘Experiment With Test Data’ – this data is from the test data split and is kept exclusively for model evaluation. In other words, the trained model HAS NOT SEEN the test data yet. So, if you find the actual values close to the predicted values, you can safely conclude that the model is doing well. Otherwise, know that the model is performing badly, and changes need to be made.

We typically advise to pick 50 to 100 test data points and repeat the process to verify the predictions generated by the trained model using the prediction dashboard.

**Prediction Explanation – ***Analyzing Feature Contribution Scores to understand the reasons behind the model’s predictions*

*Analyzing Feature Contribution Scores to understand the reasons behind the model’s predictions*

Another awesome feature of Abacus.AI platform is Machine Learning Model Explanation. You can get a sense of the most important features that contribute to the model predictions by just a quick glance at the Feature Importance Score graph on the Metric Dashboard as we discussed earlier. We provide the Top 20 features that contribute the most towards the generated predictions. Further, you can also use the Prediciton Dashboard and observe the feature contribution scores for individual test data points to understand the reasons behind the prediction made by the model.

For example, let’s say that the model explanation graph shows that the person’s monthly expense is a high amount mostly because the person has a high annual income. Similarly, for another test data point, the top reason comes out to be the luxury brand stores the person buys from. These explanations make a lot of sense. Therefore, after analyzing a few test data points, you would be confident that the model is in fact learning something meaningful and provides valuable inferences. This way, the model’s prediction doesn’t remain a complete black box and useful insights into the reasons for the model’s behavior are gained.

## Additional Reading – **Correlation with Target**

As mentioned earlier, in many cases, a strong correlation exists between the columns of the dataset and the target column. This leads to high accuracy measure scores but the model is basically overfitting and would not generalize well. Depending upon the recommendation from our system and your discretion on the importance of the columns, you should decide which column(s) to keep and which ones to ignore. Additionally, you could click on the “Ignore correlated columns and train model” button to start training a new model where all suggested columns are ignored. Finally, once your model gets trained, you would again analyze the accuracy measures to evaluate the model and see the difference from the earlier model. Additionally, you should also use the prediction dashboard to further check the quality of predictions for the new model.

Note: Depending upon your data, you might not get any recommendation to ignore a column which is absolutely fine.

**That concludes our part 2, in the next part, we will walk over the next problem type – recommendation systems. **

**See you in part 3!**