PersonalizationTrainingConfig

Training config for the PERSONALIZATION problem type

KEY TYPE Description
TRAINING_MODE PersonalizationTrainingMode whether to train in production or experimental mode. Defaults to EXP.
ITEM_QUERY_COLUMN str Name of column in the item catalog that will be matched to the query column in the interactions table.
ACTION_TYPES_EXCLUSION_DAYS Dict[str, float] Mapping from action type to number of days for which we exclude previously interacted items from prediction
TEST_ROW_INDICATOR str Column indicating which rows to use for training (TRAIN), validation (VAL) and testing (TEST).
ADD_TIME_FEATURES bool Include interaction time as a feature.
USE_USER_ID_FEATURE bool Use user id as a feature in CTR models.
SESSION_EVENT_TYPES List[str] List of event types to treat as occurrences of sessions.
DOWNSAMPLE_ITEM_POPULARITY_PERCENTILE float Downsample items more popular than this percentile.
COMPUTE_RERANK_METRICS bool Compute metrics based on rerank results.
TEST_SPLIT_ON_LAST_K_ITEMS bool Use last k items instead of global timestamp splits, when validating and testing the model.
MAX_HISTORY_LENGTH int Maximum length of user-item history to include user in training examples.
SESSION_DEDUPE_MINS float Minimum number of minutes between two sessions for a user.
FULL_DATA_RETRAINING bool Train models separately with all the data.
SEQUENTIAL_TRAINING bool Train a mode sequentially through time.
FILTER_HISTORY bool Do not recommend items the user has already interacted with.
TEST_ON_USER_SPLIT bool Use user splits instead of using time splits, when validating and testing the model.
SORT_OBJECTIVE PersonalizationObjective Ranking scheme used to sort models on the metrics page.
TEST_SPLIT int Percent of dataset to use for test data. We support using a range between 6% to 20% of your dataset to use as test data.
TARGET_ACTION_TYPES List[str] List of action types to use as targets for training.
INCLUDE_ITEM_ID_FEATURE bool Add Item-Id to the input features of the model. Applicable for Embedding distance and CTR models.
COMPUTE_SESSION_METRICS bool Evaluate models based on how well they are able to predict the next session of interactions.
OPTIMIZED_EVENT_TYPE str The final event type to optimize for and compute metrics on.
BATCH_SIZE BatchSize Batch size for neural network.
DISABLE_TIMESTAMP_SCALAR_FEATURES bool Exclude timestamp scalar features.
RECENT_DAYS_FOR_TRAINING int Limit training data to a certain latest number of days.
TEST_LAST_ITEMS_LENGTH int Number of items to leave out for each user when using leave k out folds.
DISABLE_TRANSFORMER bool Disable training the transformer algorithm.
TEST_WINDOW_LENGTH_HOURS int Duration (in hours) of most recent time window to use when validating and testing the model.
MIN_ITEM_HISTORY int Minimum number of interactions an item must have to be included in training.
TARGET_ACTION_WEIGHTS Dict[str, float] Dictionary of action types to weights for training.
QUERY_COLUMN str Name of column in the interactions table that represents a natural language query, e.g. 'blue t-shirt'.
DISABLE_GPU boo Disable training on GPU.
DATA_SPLIT_FEATURE_GROUP_TABLE_NAME str Specify the table name of the feature group to export training data with the fold column.
EXPLICIT_TIME_SPLIT bool Sets an explicit time-based test boundary.
TRAINING_START_DATE str Only consider training interaction data after this date. Specified in the timezone of the dataset.
MAX_USER_HISTORY_LEN_PERCENTILE int Filter out users with history length above this percentile.
OBJECTIVE PersonalizationObjective Ranking scheme used to select final best model.
DROPOUT_RATE int Dropout rate for neural network.