Our platform provides the flexibility to adjust a set of training parameters. There are general training parameters and advanced training options that could influence the model predictions. The predictions are measured on the basis of a set of accuracy measures or metrics that are also discussed in this section.
Once you have fulfilled all the feature group requirements for the use case, you can set the following general and advanced training configuration options to train your ML model:
Training Option Name | Description | Possible Values |
---|---|---|
Name | The name you would like to give to the model that is going to be trained. The system generates a default name depending upon the name of the project the model is a part of. | The name can be comprised of any alphanumeric character and the length can be anywhere from 5 to 60 characters. |
Set Refresh Schedule (UTC) | Refresh schedule refers to the schedule when your dataset is set to be replaced by an updated copy of the particular dataset in context from your storage bucket location. This value to be entered is a CRON time string that describes the schedule in UTC time zone. | A string in CRON Format. If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language. |
For Advanced Options, our AI engine will automatically set the optimum values. We recommend overriding these options only if you are familiar with deep learning. Overview of Advanced Options:
Training Option Name | API Configuration Name | Description | Possible Values |
---|---|---|---|
Test Split | TEST_SPLIT | Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. | Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. |
Dropout | DROPOUT | Dropout percentage in deep neural networks. It is a regularization method used for better generalization over new data. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much and enhances their isolated learning. | 0 to 90 |
Batch Size | BATCH_SIZE | The number of data points provided to the model at once (in one batch) during training. The batch size impacts how quickly a model learns and the stability of the learning process. It is an important hyperparameter that should be well-tuned. For more details, please visit our Blog. | 16 / 32 / 64 / 128 |
Min User History Len Percentile | HISTORY_LENGTH_PERCENTILE | Users (in the user dataset) with a large number of records (in the user-item interaction dataset) showing their interactions with various items (in the catalog) can bias the recommendations of the system. So, users with history above the entered percentile can be discarded from the training data before using it to train the model. | 95 to 100 |
Downsample Item Popularity Percentile | DOWNSAMPLE_PERCENTILE | Items that are very often used by the users can bias the recommender system to recommend only those items to all the users. So, downsampling the items that are more popular than the entered percentile can lead to better recommendations. | 0.99 to 1.00 |
Exclude Time Features | TIME_FEATURES | Discard Time Features for training the model. | Yes / No |
Candidate Items Split Column | CANDIDATE_ITEMS_SPLIT_COLUMN | Candidate selection will sample proportionally by the values of this column. | movie |
Candidate Items Split Min Fraction Hint | CANDIDATE_ITEMS_SPLIT_MIN_FRACTION_HINT | Min fraction of items within a candidate item split. | 0.0 to 0.5 |
Candidate Selection Max Other Event Rate | CANDIDATE_SELECTION_MAX_OTHER_EVENT_RATE | To prevent selecting outliers, cap the rate of target events (which are not part of session events) for candidate items to not exceed this value | 0-1 (decimals allowed) |
Candidate Selection Target Rate Scoring | CANDIDATE_SELECTION_TARGET_RATE_SCORING | Use the rate of target events to select the candidates instead of the total score. Use this to make sure new items are considered in the selection | true or false |
Dynamic Catalog | DYNAMIC_CATALOG | Use this option to indicate that the algorithm needs to account for a frequently changing catalog. | true or false |
Explore Lookback Hours | EXPLORE_LOOKBACK_HOURS | Number of hours since creation time that an item is eligible for explore fraction. | 1 to 168 |
Max History Length | MAX_HISTORY_LENGTH | Maximum length of user-item history to include user in training examples. | 0 to 200 |
Max User History Len Percentile | MAX_USER_HISTORY_LEN_PERCENTILE | Filter out users with history length above this percentile. | 95 to 100 |
Min Interactions | MIN_INTERACTIONS | Select candidate items with at least this many interactions. | 0 to 10000 |
Min Target Interactions | MIN_TARGET_INTERACTIONS | Select candidate items with at least this many target interactions. | 0 to 10000 |
Search Query Column | SEARCH_QUERY_COLUMN | Column which specifies a search query term that will be used for personalization. | rating |
Skip History Filtering | SKIP_HISTORY_FILTERING | Do not remove items which have past interactions from recommendations. | true or false |
Test On User Split | TEST_ON_USER_SPLIT | Use user splits instead of using time splits, when validating and testing the model. | true or false |
Training Candidate Items Limit | TRAINING_CANDIDATE_ITEMS_LIMIT | Limit training data to these many "best performing" items. We use target events and weights to calculate it | 1000 to 150000 |
Training Start Date | TRAINING_START_DATE | Only consider training interaction data after this date. Specified in the timezone of the dataset. | DateTime |
Unordered History | UNORDERED_HISTORY | Order of user item interactions is not important. | true or false |
Use Item Attribute Bucketing | USE_ITEM_ATTRIBUTE_BUCKETING | Prefer recommending items which have attribute similarity. Useful when we have natural item categories which are related, like e-commerce categories. | true or false |
Exclude Time Features | TIME_FEATURES | Discard Time Features for training the model. | Yes / No |
Our AI engine will calculate the following metrics for this use case:
Metric Name | Description |
---|---|
Normalized Discounted Cumulative Gain (NDCG) | NDCG is the relative relevance of a list of items compared to an ideal ranked list of items. It ranges from 0 to 1. The closer it gets to 1, the better. The basic principle behind this metric is that some items are 'more' relevant than others. Highly relevant items should come before less relevant items, which in turn should come before irrelevant items. NDCG rewards the recommender system for predicting relevant items relative to the ranking it gives them. It gives a high reward for relevant items predicted with a high rank and a low reward if relevant items are predicted with a low rank. The ideal range of NDCG values for a recommender system generally lies anywhere from 0.15 to 0.40. For more details, please visit this link. |
NDCG @5 | It is defined as the relative relevance of a list of top 5 items compared to an ideal ranked list of the same number of items. It ranges from 0 to 1. The closer it gets to 1, the better. |
NDCG @10 | It is defined as the relevancy of a list of top 10 items compared to an ideal ranked list of the same number of items. It ranges from 0 to 1. The closer it gets to 1, the better. |
NDCG @N | NDCG @N is defined as the relevancy of a list of top N items compared to an ideal ranked list of the same number of items. It ranges from 0 to 1. The closer it gets to 1, the better. The principle followed is the same as in NDCG. Each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the cumulative discounted gain (DCG) at N, each relevant discounted recommendation in the top N recommendations is summed together. The normalized discounted cumulative gain at N (NDCG @N) is the DCG divided by the ideal DCG (The ideal DCG is where the top N recommendations are sorted by relevance) such that the result always remains between 0 and 1. The ideal range of NDCG@5 and NDCG@10 for a recommender system generally increases with higher N amd lies anywhere from 0.15 to 0.40 |
Mean Average Precision (MAP) | MAP is defined as the relative relevance of a recommender system such that it rewards ranking the relevant recommendations higher and penalizes if they are ranked lower. It ranges from 0 to 1. Generally you can expect to see values ranging anywhere from 0.01 to 0.1 where higher values means a better score. In recommendation systems MAP computes the mean of the Average Precision (AP) over all the users. The AP is a measure that takes in a ranked list of recommendations and compares it to a list of the true set of "correct" or "relevant" recommendations for that user. AP rewards for having a lot of "correct" (relevant) recommendations in the list, and also rewards for putting the most likely correct recommendations at the top (penalizes more when incorrect guesses are higher up in the ranking). For more details, please visit this link. |
MAP @5 | MAP @5 is the relative relevance of top 5 items in the list of recommendations where ranking the relevant recommendations higher is rewarded and ranking them lower is penalized. Generally you can expect to see values ranging anywhere from 0.01 to 0.1 where higher values means a better score. This metric takes top 5 items into consideration from the list of recommended items while MAP takes all of the items. |
MAP @10 | MAP @10 is the relative relevance of top 10 items in the list of recommendations where ranking the relevant recommendations higher is rewarded and ranking them lower is penalized. Generally you can expect to see values ranging anywhere from 0.01 to 0.1 where higher values means a better score. This metric takes top 10 items into consideration from the list of recommended items while MAP takes all of the items. |
MAP @N | We define MAP as the relative relevance of a recommender system such that it rewards ranking the relevant recommendations higher. MAP @N is the same as MAP except that it takes top N items in the list of recommendations instead of taking all of them. To elaborate more, in recommendation systems MAP @N computes the mean of the Average Precision (AP) of top N items over all the users. The AP @N is a measure that takes in a ranked list of N recommendations and compares it to a list of the true set of "correct" or "relevant" recommendations for that user. AP @N rewards for having a lot of "correct" (relevant) recommendations in the list, and also rewards for putting the most likely correct recommendations at the top (penalizes more when incorrect guesses are higher up in the ranking) for top N items. The ideal range of MAP @N values for a recommender system generally lies anywhere from 0.001 to 0.01. For more details, please visit this link. |
Mean Reciprocal Rank (MRR) | This metric measures the quality of recommendations by verifying the position of the most relevant item. It rewards the most when the most relevant item is at the top. It ranges from 0 to 1 where higher value means better recommendations. To understand MRR better, reciprocal rank needs to be defined first. Reciprocal rank is the "multiplicative inverse" of the rank of the first correct item. MRR is calculated by finding the mean of the reciprocal rank for all the ranked items in the list of recommendations. The ideal range of MRR values for a recommender system generally lies anywhere from 0.1 to 0.3. For more details, please visit this link. |
Coverage | This metric measures the the ratio of the items the system recommends to the total number of items present in the catalog. It ranges from 0 to 1. Generally, a high coverage value is better. The ideal range of coverage for a recommender system generally lies anywhere from 0.001 to 0.1. If coverage value is lower than the ideal range, it means that the model is unable to recommend a lot of the items present in the catalog (in some cases recommending only the popular items) which is usually caused by an insufficient number of ratings, and this is popularly known as the cold start problem. It is a great idea to show your users a diverse set of products recommended according to their preferences and also based on the similarities between the products. Coverage shows the amount of diversity in the recommendations generated by the system. For more details, please visit the link. |
Personalization @10 | This metric measures the level of personalization achieved for the users using different sets of top 10 items recommended to different users. The average pairwise overlap between the sets of 10 items recommended to different users is computed. It ranges from 0 to 1, where 0 means all users are recommended the same set of 10 items and 1 means no two users have overlapping sets of recommendations. In general, a score of around 0.4 or higher indicates a high level of user personalization. |
Note: In addition to the above metrics, our engine will train a baseline model and generate metrics for the baseline model. Typically the metrics for your custom deep learning model should be better than the baseline model.