Training Parameters And Accuracy Measures

Our platform provides the flexibility to adjust a set of training parameters. There are general training parameters and advanced training options that could influence the model predictions. The predictions are measured on the basis of a set of accuracy measures or metrics that are also discussed in this section.


Training Options

Once you have fulfilled all the feature group requirements for the use case, you can set the following general and advanced training configuration options to train your ML model:

Training Option Name Description Possible Values
Name The name you would like to give to the model that is going to be trained. The system generates a default name depending upon the name of the project the model is a part of. The name can be comprised of any alphanumeric character and the length can be anywhere from 5 to 60 characters.
Set Refresh Schedule (UTC) Refresh schedule refers to the schedule when your dataset is set to be replaced by an updated copy of the particular dataset in context from your storage bucket location. This value to be entered is a CRON time string that describes the schedule in UTC time zone. A string in CRON Format. If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language.

Advanced Training Options

For Advanced Options, our AI engine will automatically set the optimum values. We recommend overriding these options only if you are familiar with deep learning. Overview of Advanced Options:

Training Option Name API Configuration Name Description Possible Values
Test Split TEST_SPLIT Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data.
Dropout DROPOUT Dropout percentage in deep neural networks. It is a regularization method used for better generalization over new data. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much and enhances their isolated learning. 0 to 90
Batch Size BATCH_SIZE The number of data points provided to the model at once (in one batch) during training. The batch size impacts how quickly a model learns and the stability of the learning process. It is an important hyperparameter that should be well-tuned. For more details, please visit our Blog. 16 / 32 / 64 / 128
Min User History Len Percentile HISTORY_LENGTH_PERCENTILE Users (in the user dataset) with a large number of records (in the user-item interaction dataset) showing their interactions with various items (in the catalog) can bias the recommendations of the system. So, users with history above the entered percentile can be discarded from the training data before using it to train the model. 95 to 100
Downsample Item Popularity Percentile DOWNSAMPLE_PERCENTILE Items that are very often used by the users can bias the recommender system to recommend only those items to all the users. So, downsampling the items that are more popular than the entered percentile can lead to better recommendations. 0.99 to 1.00
Exclude Time Features TIME_FEATURES Discard Time Features for training the model. Yes / No
Candidate Items Split Column CANDIDATE_ITEMS_SPLIT_COLUMN Candidate selection will sample proportionally by the values of this column. movie
Candidate Items Split Min Fraction Hint CANDIDATE_ITEMS_SPLIT_MIN_FRACTION_HINT Min fraction of items within a candidate item split. 0.0 to 0.5
Candidate Selection Max Other Event Rate CANDIDATE_SELECTION_MAX_OTHER_EVENT_RATE To prevent selecting outliers, cap the rate of target events (which are not part of session events) for candidate items to not exceed this value 0-1 (decimals allowed)
Candidate Selection Target Rate Scoring CANDIDATE_SELECTION_TARGET_RATE_SCORING Use the rate of target events to select the candidates instead of the total score. Use this to make sure new items are considered in the selection true or false
Dynamic Catalog DYNAMIC_CATALOG Use this option to indicate that the algorithm needs to account for a frequently changing catalog. true or false
Explore Lookback Hours EXPLORE_LOOKBACK_HOURS Number of hours since creation time that an item is eligible for explore fraction. 1 to 168
Max History Length MAX_HISTORY_LENGTH Maximum length of user-item history to include user in training examples. 0 to 200
Max User History Len Percentile MAX_USER_HISTORY_LEN_PERCENTILE Filter out users with history length above this percentile. 95 to 100
Min Interactions MIN_INTERACTIONS Select candidate items with at least this many interactions. 0 to 10000
Min Target Interactions MIN_TARGET_INTERACTIONS Select candidate items with at least this many target interactions. 0 to 10000
Search Query Column SEARCH_QUERY_COLUMN Column which specifies a search query term that will be used for personalization. rating
Skip History Filtering SKIP_HISTORY_FILTERING Do not remove items which have past interactions from recommendations. true or false
Test On User Split TEST_ON_USER_SPLIT Use user splits instead of using time splits, when validating and testing the model. true or false
Training Candidate Items Limit TRAINING_CANDIDATE_ITEMS_LIMIT Limit training data to these many "best performing" items. We use target events and weights to calculate it 1000 to 150000
Training Start Date TRAINING_START_DATE Only consider training interaction data after this date. Specified in the timezone of the dataset. DateTime
Unordered History UNORDERED_HISTORY Order of user item interactions is not important. true or false
Use Item Attribute Bucketing USE_ITEM_ATTRIBUTE_BUCKETING Prefer recommending items which have attribute similarity. Useful when we have natural item categories which are related, like e-commerce categories. true or false

Metrics

Our AI engine will calculate the following metrics for this use case:

Metric Name Description
Normalized Discounted Cumulative Gain (NDCG) NDCG is the relative relevance of a list of items compared to an ideal ranked list of items. It ranges from 0 to 1. The closer it gets to 1, the better. The basic principle behind this metric is that some items are 'more' relevant than others. Highly relevant items should come before less relevant items, which in turn should come before irrelevant items. NDCG rewards the recommender system for predicting relevant items relative to the ranking it gives them. It gives a high reward for relevant items predicted with a high rank and a low reward if relevant items are predicted with a low rank. The ideal range of NDCG values for a recommender system generally lies anywhere from 0.15 to 0.40. For more details, please visit this link.
NDCG @5 It is defined as the relative relevance of a list of top 5 items compared to an ideal ranked list of the same number of items. It ranges from 0 to 1. The closer it gets to 1, the better.
NDCG @10 It is defined as the relevancy of a list of top 10 items compared to an ideal ranked list of the same number of items. It ranges from 0 to 1. The closer it gets to 1, the better.
NDCG @N NDCG @N is defined as the relevancy of a list of top N items compared to an ideal ranked list of the same number of items. It ranges from 0 to 1. The closer it gets to 1, the better. The principle followed is the same as in NDCG. Each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the cumulative discounted gain (DCG) at N, each relevant discounted recommendation in the top N recommendations is summed together. The normalized discounted cumulative gain at N (NDCG @N) is the DCG divided by the ideal DCG (The ideal DCG is where the top N recommendations are sorted by relevance) such that the result always remains between 0 and 1. The ideal range of NDCG@5 and NDCG@10 for a recommender system generally increases with higher N amd lies anywhere from 0.15 to 0.40
Mean Average Precision (MAP) MAP is defined as the relative relevance of a recommender system such that it rewards ranking the relevant recommendations higher and penalizes if they are ranked lower. It ranges from 0 to 1. Generally you can expect to see values ranging anywhere from 0.01 to 0.1 where higher values means a better score. In recommendation systems MAP computes the mean of the Average Precision (AP) over all the users. The AP is a measure that takes in a ranked list of recommendations and compares it to a list of the true set of "correct" or "relevant" recommendations for that user. AP rewards for having a lot of "correct" (relevant) recommendations in the list, and also rewards for putting the most likely correct recommendations at the top (penalizes more when incorrect guesses are higher up in the ranking). For more details, please visit this link.
MAP @5 MAP @5 is the relative relevance of top 5 items in the list of recommendations where ranking the relevant recommendations higher is rewarded and ranking them lower is penalized. Generally you can expect to see values ranging anywhere from 0.01 to 0.1 where higher values means a better score. This metric takes top 5 items into consideration from the list of recommended items while MAP takes all of the items.
MAP @10 MAP @10 is the relative relevance of top 10 items in the list of recommendations where ranking the relevant recommendations higher is rewarded and ranking them lower is penalized. Generally you can expect to see values ranging anywhere from 0.01 to 0.1 where higher values means a better score. This metric takes top 10 items into consideration from the list of recommended items while MAP takes all of the items.
MAP @N We define MAP as the relative relevance of a recommender system such that it rewards ranking the relevant recommendations higher. MAP @N is the same as MAP except that it takes top N items in the list of recommendations instead of taking all of them. To elaborate more, in recommendation systems MAP @N computes the mean of the Average Precision (AP) of top N items over all the users. The AP @N is a measure that takes in a ranked list of N recommendations and compares it to a list of the true set of "correct" or "relevant" recommendations for that user. AP @N rewards for having a lot of "correct" (relevant) recommendations in the list, and also rewards for putting the most likely correct recommendations at the top (penalizes more when incorrect guesses are higher up in the ranking) for top N items. The ideal range of MAP @N values for a recommender system generally lies anywhere from 0.001 to 0.01. For more details, please visit this link.
Mean Reciprocal Rank (MRR) This metric measures the quality of recommendations by verifying the position of the most relevant item. It rewards the most when the most relevant item is at the top. It ranges from 0 to 1 where higher value means better recommendations. To understand MRR better, reciprocal rank needs to be defined first. Reciprocal rank is the "multiplicative inverse" of the rank of the first correct item. MRR is calculated by finding the mean of the reciprocal rank for all the ranked items in the list of recommendations. The ideal range of MRR values for a recommender system generally lies anywhere from 0.1 to 0.3. For more details, please visit this link.
Coverage This metric measures the the ratio of the items the system recommends to the total number of items present in the catalog. It ranges from 0 to 1. Generally, a high coverage value is better. The ideal range of coverage for a recommender system generally lies anywhere from 0.001 to 0.1. If coverage value is lower than the ideal range, it means that the model is unable to recommend a lot of the items present in the catalog (in some cases recommending only the popular items) which is usually caused by an insufficient number of ratings, and this is popularly known as the cold start problem. It is a great idea to show your users a diverse set of products recommended according to their preferences and also based on the similarities between the products. Coverage shows the amount of diversity in the recommendations generated by the system. For more details, please visit the link.
Personalization @10 This metric measures the level of personalization achieved for the users using different sets of top 10 items recommended to different users. The average pairwise overlap between the sets of 10 items recommended to different users is computed. It ranges from 0 to 1, where 0 means all users are recommended the same set of 10 items and 1 means no two users have overlapping sets of recommendations. In general, a score of around 0.4 or higher indicates a high level of user personalization.

Note: In addition to the above metrics, our engine will train a baseline model and generate metrics for the baseline model. Typically the metrics for your custom deep learning model should be better than the baseline model.