Training config for the PERSONALIZATION problem type
KEY | TYPE | Description |
---|---|---|
TRAINING_MODE | PersonalizationTrainingMode | whether to train in production or experimental mode. Defaults to EXP. |
ITEM_QUERY_COLUMN | str | Name of column in the item catalog that will be matched to the query column in the interactions table. |
ACTION_TYPES_EXCLUSION_DAYS | Dict[str, float] | Mapping from action type to number of days for which we exclude previously interacted items from prediction |
TEST_ROW_INDICATOR | str | Column indicating which rows to use for training (TRAIN), validation (VAL) and testing (TEST). |
ADD_TIME_FEATURES | bool | Include interaction time as a feature. |
USE_USER_ID_FEATURE | bool | Use user id as a feature in CTR models. |
SESSION_EVENT_TYPES | List[str] | List of event types to treat as occurrences of sessions. |
DOWNSAMPLE_ITEM_POPULARITY_PERCENTILE | float | Downsample items more popular than this percentile. |
COMPUTE_RERANK_METRICS | bool | Compute metrics based on rerank results. |
TEST_SPLIT_ON_LAST_K_ITEMS | bool | Use last k items instead of global timestamp splits, when validating and testing the model. |
MAX_HISTORY_LENGTH | int | Maximum length of user-item history to include user in training examples. |
SESSION_DEDUPE_MINS | float | Minimum number of minutes between two sessions for a user. |
FULL_DATA_RETRAINING | bool | Train models separately with all the data. |
SEQUENTIAL_TRAINING | bool | Train a mode sequentially through time. |
FILTER_HISTORY | bool | Do not recommend items the user has already interacted with. |
TEST_ON_USER_SPLIT | bool | Use user splits instead of using time splits, when validating and testing the model. |
SORT_OBJECTIVE | PersonalizationObjective | Ranking scheme used to sort models on the metrics page. |
TEST_SPLIT | int | Percent of dataset to use for test data. We support using a range between 6% to 20% of your dataset to use as test data. |
TARGET_ACTION_TYPES | List[str] | List of action types to use as targets for training. |
INCLUDE_ITEM_ID_FEATURE | bool | Add Item-Id to the input features of the model. Applicable for Embedding distance and CTR models. |
COMPUTE_SESSION_METRICS | bool | Evaluate models based on how well they are able to predict the next session of interactions. |
OPTIMIZED_EVENT_TYPE | str | The final event type to optimize for and compute metrics on. |
BATCH_SIZE | BatchSize | Batch size for neural network. |
DISABLE_TIMESTAMP_SCALAR_FEATURES | bool | Exclude timestamp scalar features. |
RECENT_DAYS_FOR_TRAINING | int | Limit training data to a certain latest number of days. |
TEST_LAST_ITEMS_LENGTH | int | Number of items to leave out for each user when using leave k out folds. |
DISABLE_TRANSFORMER | bool | Disable training the transformer algorithm. |
TEST_WINDOW_LENGTH_HOURS | int | Duration (in hours) of most recent time window to use when validating and testing the model. |
MIN_ITEM_HISTORY | int | Minimum number of interactions an item must have to be included in training. |
TARGET_ACTION_WEIGHTS | Dict[str, float] | Dictionary of action types to weights for training. |
QUERY_COLUMN | str | Name of column in the interactions table that represents a natural language query, e.g. 'blue t-shirt'. |
DISABLE_GPU | boo | Disable training on GPU. |
DATA_SPLIT_FEATURE_GROUP_TABLE_NAME | str | Specify the table name of the feature group to export training data with the fold column. |
EXPLICIT_TIME_SPLIT | bool | Sets an explicit time-based test boundary. |
TRAINING_START_DATE | str | Only consider training interaction data after this date. Specified in the timezone of the dataset. |
MAX_USER_HISTORY_LEN_PERCENTILE | int | Filter out users with history length above this percentile. |
OBJECTIVE | PersonalizationObjective | Ranking scheme used to select final best model. |
DROPOUT_RATE | int | Dropout rate for neural network. |