Training config for the PERSONALIZATION problem type
| KEY | TYPE | Description |
|---|---|---|
| TRAINING_START_DATE | str | Only consider training interaction data after this date. Specified in the timezone of the dataset. |
| DISABLE_TRANSFORMER | bool | Disable training the transformer algorithm. |
| TARGET_ACTION_WEIGHTS | Dict[str, float] | Dictionary of action types to weights for training. |
| SEQUENTIAL_TRAINING | bool | Train a mode sequentially through time. |
| MIN_ITEM_HISTORY | int | Minimum number of interactions an item must have to be included in training. |
| ITEM_QUERY_COLUMN | str | Name of column in the item catalog that will be matched to the query column in the interactions table. |
| TEST_LAST_ITEMS_LENGTH | int | Number of items to leave out for each user when using leave k out folds. |
| FILTER_HISTORY | bool | Do not recommend items the user has already interacted with. |
| DISABLE_GPU | boo | Disable training on GPU. |
| TEST_SPLIT | int | Percent of dataset to use for test data. We support using a range between 6% to 20% of your dataset to use as test data. |
| BATCH_SIZE | BatchSize | Batch size for neural network. |
| MAX_HISTORY_LENGTH | int | Maximum length of user-item history to include user in training examples. |
| RECENT_DAYS_FOR_TRAINING | int | Limit training data to a certain latest number of days. |
| ADD_TIME_FEATURES | bool | Include interaction time as a feature. |
| OPTIMIZED_EVENT_TYPE | str | The final event type to optimize for and compute metrics on. |
| TEST_ON_USER_SPLIT | bool | Use user splits instead of using time splits, when validating and testing the model. |
| DROPOUT_RATE | int | Dropout rate for neural network. |
| DATA_SPLIT_FEATURE_GROUP_TABLE_NAME | str | Specify the table name of the feature group to export training data with the fold column. |
| FULL_DATA_RETRAINING | bool | Train models separately with all the data. |
| COMPUTE_RERANK_METRICS | bool | Compute metrics based on rerank results. |
| EXPLICIT_TIME_SPLIT | bool | Sets an explicit time-based test boundary. |
| INCLUDE_ITEM_ID_FEATURE | bool | Add Item-Id to the input features of the model. Applicable for Embedding distance and CTR models. |
| DOWNSAMPLE_ITEM_POPULARITY_PERCENTILE | float | Downsample items more popular than this percentile. |
| TARGET_ACTION_TYPES | List[str] | List of action types to use as targets for training. |
| TRAINING_MODE | PersonalizationTrainingMode | whether to train in production or experimental mode. Defaults to EXP. |
| SESSION_EVENT_TYPES | List[str] | List of event types to treat as occurrences of sessions. |
| SESSION_DEDUPE_MINS | float | Minimum number of minutes between two sessions for a user. |
| ACTION_TYPES_EXCLUSION_DAYS | Dict[str, float] | Mapping from action type to number of days for which we exclude previously interacted items from prediction |
| SORT_OBJECTIVE | PersonalizationObjective | Ranking scheme used to sort models on the metrics page. |
| TEST_ROW_INDICATOR | str | Column indicating which rows to use for training (TRAIN), validation (VAL) and testing (TEST). |
| TEST_SPLIT_ON_LAST_K_ITEMS | bool | Use last k items instead of global timestamp splits, when validating and testing the model. |
| DISABLE_TIMESTAMP_SCALAR_FEATURES | bool | Exclude timestamp scalar features. |
| COMPUTE_SESSION_METRICS | bool | Evaluate models based on how well they are able to predict the next session of interactions. |
| QUERY_COLUMN | str | Name of column in the interactions table that represents a natural language query, e.g. 'blue t-shirt'. |
| TEST_WINDOW_LENGTH_HOURS | int | Duration (in hours) of most recent time window to use when validating and testing the model. |
| USE_USER_ID_FEATURE | bool | Use user id as a feature in CTR models. |
| OBJECTIVE | PersonalizationObjective | Ranking scheme used to select final best model. |
| MAX_USER_HISTORY_LEN_PERCENTILE | int | Filter out users with history length above this percentile. |