Training Parameters And Accuracy Measures

Our platform provides the flexibility to adjust a set of training parameters. There are general training parameters and advanced training options that could influence the model predictions. The predictions are measured on the basis of a set of accuracy measures or metrics that are also discussed in this section.


Training Options

Once you have fulfilled all the feature group requirements for the use case, you can set the following general and advanced training configuration options to train your ML model:

Training Option Name Description Possible Values
Name The name you would like to give to the model that is going to be trained. The system generates a default name depending upon the name of the project the model is a part of. The name can be comprised of any alphanumeric character and the length can be anywhere from 5 to 60 characters.
Set Refresh Schedule (UTC) Refresh schedule refers to the schedule when your dataset is set to be replaced by an updated copy of the particular dataset in context from your storage bucket location. This value to be entered is a CRON time string that describes the schedule in UTC time zone. A string in CRON Format. If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language.

Advanced Training Options

For Advanced Options, our AI engine will automatically set the optimum values. We recommend overriding these options only if you are familiar with deep learning. Overview of Advanced Options:

Training Option Name API Configuration Name Description Possible Values
Test Split TEST_SPLIT Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data.
Dropout DROPOUT Dropout percentage in deep neural networks. It is a regularization method used for better generalization over new data. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much and enhances their isolated learning. 0 to 90
Batch Size BATCH_SIZE The number of data points provided to the model at once (in one batch) during training. The batch size impacts how quickly a model learns and the stability of the learning process. It is an important hyperparameter that should be well-tuned. For more details, please visit our Blog. 16 / 32 / 64 / 128
Min User History Len Percentile HISTORY_LENGTH_PERCENTILE Users (in the user dataset) with a large number of records (in the user-item interaction dataset) showing their interactions with various items (in the catalog) can bias the recommendations of the system. So, users with history above the entered percentile can be discarded from the training data before using it to train the model. 95 to 100
Downsample Item Popularity Percentile DOWNSAMPLE_PERCENTILE Items that are very often used by the users can bias the recommender system to recommend only those items to all the users. So, downsampling the items that are more popular than the entered percentile can lead to better recommendations. 0.99 to 1.00
Exclude Time Features TIME_FEATURES Discard Time Features for training the model. Yes / No
Min Categorical Count MIN_CATEGORICAL_COUNT Minimum threshold to consider a value different from the unknown placeholder. Integers from 1 to 100 with a step of 5
Sample Weight SAMPLE_WEIGHT_COLUMN Sets the weight of each training sample in the objective that is optimized. Metrics are also reported based on this weight column. Any numeric column. Sample with zero weight are discarded.
Rebalance Classes REBALANCE_CLASSES Applies weights to each sample in inverse proportion to the frequency of the target class. Leads to increase minority class recall at the expense of precision. Yes / No
Rare Class Augmentation Threshold RARE_CLASS_AUGMENTATION_THRESHOLD Augments any rare class whose relative frequency with respect to the most frequent class is less than this threshold. 0.01 to 0.5
Augmentation Strategy AUGMENTATION_STRATEGY Strategy to deal with class imbalance and data augmentation. resample / smote
Training Rows Downsample Ratio TRAINING_ROWS_DOWNSAMPLE_RATIO Uses this ratio to train on a sample of the dataset provided. 0.01 to 0.9
Ignore Datetime Features IGNORE_DATETIME_FEATURES Remove all datetime features from the model. Useful while generalizing to different time periods. true or false
Use Pretrained Embeddings USE_PRETRAINED_EMBEDDINGS Whether to use pretrained embeddings or not. true or false
Max Text Words MAX_TEXT_WORDS Maximum number of words to use from text fields. 100 to 1000
AutoML Ensemble Size AUTOML_ENSEMBLE_SIZE Number of architectures to use for performing ensemble-averaging on architectures while creating an ensemble of suitable neural network architectures found by AutoML for the provided dataset(s). 2 to 10
AutoML Initial Learning Rate AUTOML_INITIAL_LEARNING_RATE Initial learning rate for seed architectures generated by AutoML. decimal (0.0001, 0.01)

Metrics

Our AI engine will calculate the following metrics for this use case:

Metric Name Description
Normalized Discounted Cumulative Gain (NDCG) NDCG is the relative relevance of a list of items compared to an ideal ranked list of items. It ranges from 0 to 1. The closer it gets to 1, the better. The basic principle behind this metric is that some items are 'more' relevant than others. Highly relevant items should come before less relevant items, which in turn should come before irrelevant items. NDCG rewards the recommender system for predicting relevant items relative to the ranking it gives them. It gives a high reward for relevant items predicted with a high rank and a low reward if relevant items are predicted with a low rank. The ideal range of NDCG values for a recommender system generally lies anywhere from 0.15 to 0.40. For more details, please visit this link.
NDCG @5 It is defined as the relative relevance of a list of top 5 items compared to an ideal ranked list of the same number of items. It ranges from 0 to 1. The closer it gets to 1, the better.
NDCG @10 It is defined as the relevancy of a list of top 10 items compared to an ideal ranked list of the same number of items. It ranges from 0 to 1. The closer it gets to 1, the better.
NDCG @N NDCG @N is defined as the relevancy of a list of top N items compared to an ideal ranked list of the same number of items. It ranges from 0 to 1. The closer it gets to 1, the better. The principle followed is the same as in NDCG. Each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the cumulative discounted gain (DCG) at N, each relevant discounted recommendation in the top N recommendations is summed together. The normalized discounted cumulative gain at N (NDCG @N) is the DCG divided by the ideal DCG (The ideal DCG is where the top N recommendations are sorted by relevance) such that the result always remains between 0 and 1. The ideal range of NDCG@5 and NDCG@10 for a recommender system generally increases with higher N amd lies anywhere from 0.15 to 0.40
Mean Average Precision (MAP) MAP is defined as the relative relevance of a recommender system such that it rewards ranking the relevant recommendations higher and penalizes if they are ranked lower. It ranges from 0 to 1. Generally you can expect to see values ranging anywhere from 0.01 to 0.1 where higher values means a better score. In recommendation systems MAP computes the mean of the Average Precision (AP) over all the users. The AP is a measure that takes in a ranked list of recommendations and compares it to a list of the true set of "correct" or "relevant" recommendations for that user. AP rewards for having a lot of "correct" (relevant) recommendations in the list, and also rewards for putting the most likely correct recommendations at the top (penalizes more when incorrect guesses are higher up in the ranking). For more details, please visit this link.
MAP @5 MAP @5 is the relative relevance of top 5 items in the list of recommendations where ranking the relevant recommendations higher is rewarded and ranking them lower is penalized. Generally you can expect to see values ranging anywhere from 0.01 to 0.1 where higher values means a better score. This metric takes top 5 items into consideration from the list of recommended items while MAP takes all of the items.
MAP @10 MAP @10 is the relative relevance of top 10 items in the list of recommendations where ranking the relevant recommendations higher is rewarded and ranking them lower is penalized. Generally you can expect to see values ranging anywhere from 0.01 to 0.1 where higher values means a better score. This metric takes top 10 items into consideration from the list of recommended items while MAP takes all of the items.
MAP @N We define MAP as the relative relevance of a recommender system such that it rewards ranking the relevant recommendations higher. MAP @N is the same as MAP except that it takes top N items in the list of recommendations instead of taking all of them. To elaborate more, in recommendation systems MAP @N computes the mean of the Average Precision (AP) of top N items over all the users. The AP @N is a measure that takes in a ranked list of N recommendations and compares it to a list of the true set of "correct" or "relevant" recommendations for that user. AP @N rewards for having a lot of "correct" (relevant) recommendations in the list, and also rewards for putting the most likely correct recommendations at the top (penalizes more when incorrect guesses are higher up in the ranking) for top N items. The ideal range of MAP @N values for a recommender system generally lies anywhere from 0.001 to 0.01. For more details, please visit this link.
Mean Reciprocal Rank (MRR) This metric measures the quality of recommendations by verifying the position of the most relevant item. It rewards the most when the most relevant item is at the top. It ranges from 0 to 1 where higher value means better recommendations. To understand MRR better, reciprocal rank needs to be defined first. Reciprocal rank is the "multiplicative inverse" of the rank of the first correct item. MRR is calculated by finding the mean of the reciprocal rank for all the ranked items in the list of recommendations. The ideal range of MRR values for a recommender system generally lies anywhere from 0.1 to 0.3. For more details, please visit this link.
Coverage This metric measures the the ratio of the items the system recommends to the total number of items present in the catalog. It ranges from 0 to 1. Generally, a high coverage value is better. The ideal range of coverage for a recommender system generally lies anywhere from 0.001 to 0.1. If coverage value is lower than the ideal range, it means that the model is unable to recommend a lot of the items present in the catalog (in some cases recommending only the popular items) which is usually caused by an insufficient number of ratings, and this is popularly known as the cold start problem. It is a great idea to show your users a diverse set of products recommended according to their preferences and also based on the similarities between the products. Coverage shows the amount of diversity in the recommendations generated by the system. For more details, please visit the link.
Personalization @10 This metric measures the level of personalization achieved for the users using different sets of top 10 items recommended to different users. The average pairwise overlap between the sets of 10 items recommended to different users is computed. It ranges from 0 to 1, where 0 means all users are recommended the same set of 10 items and 1 means no two users have overlapping sets of recommendations. In general, a score of around 0.4 or higher indicates a high level of user personalization.

Note: In addition to the above metrics, our engine will train a baseline model and generate metrics for the baseline model. Typically the metrics for your custom deep learning model should be better than the baseline model.