Training config for the PREDICTIVE_MODELING problem type
| KEY | TYPE | Description | 
|---|---|---|
| TARGET_ENCODE_CATEGORICALS | bool | Use this to turn target encoding on categorical features on or off. | 
| IGNORE_DATETIME_FEATURES | bool | Remove all datetime features from the model. Useful while generalizing to different time periods. | 
| LOSS_FUNCTION | RegressionLossFunction | Loss function to be used as objective for model training. | 
| TIMESTAMP_BASED_SPLITTING_COLUMN | str | Timestamp column selected for splitting into test and train. | 
| TARGET_TRANSFORM | RegressionTargetTransform | Specify a transform (e.g. log, quantile) to apply to the target variable. | 
| FULL_DATA_RETRAINING | bool | Train models separately with all the data. | 
| DROP_ORIGINAL_CATEGORICALS | bool | This option helps us choose whether to also feed the original label encoded categorical columns to the mdoels along with their target encoded versions. | 
| NUM_CV_FOLDS | int | Specify the value of k in k-fold cross validation. | 
| CUSTOM_LOSS_FUNCTIONS | List[str] | Registered custom losses available for selection. | 
| DISABLE_TEST_VAL_FOLD | bool | Do not create a TEST_VAL set. All records which would be part of the TEST_VAL fold otherwise, remain in the TEST fold. | 
| MONOTONICALLY_DECREASING_FEATURES | List[str] | Constrain the model such that it behaves as if the target feature is monotonically decreasing with the selected features | 
| BATCH_SIZE | BatchSize | Batch size. | 
| DO_MASKED_LANGUAGE_MODEL_PRETRAINING | bool | Specify whether to run a masked language model unsupervised pretraining step before supervized training in certain supported algorithms which use BERT-like backbones. | 
| SAMPLING_UNIT_KEYS | List[str] | Constrain train/test separation to partition a column. | 
| PARTIAL_DEPENDENCE_ANALYSIS | PartialDependenceAnalysis | Specify whether to run partial dependence plots for all features or only some features. | 
| ACTIVE_LABELS_COLUMN | str | Specify a column to use as the active columns in a multi label setting. | 
| MAX_TOKENS_IN_SENTENCE | int | Specify the max tokens to be kept in a sentence based on the truncation strategy. | 
| MONOTONICALLY_INCREASING_FEATURES | List[str] | Constrain the model such that it behaves as if the target feature is monotonically increasing with the selected features | 
| TREE_HPO_MODE | None | (RegressionTreeHPOMode): Turning off Rapid Experimentation will take longer to train. | 
| IS_MULTILINGUAL | bool | Enable algorithms which process text using pretrained multilingual NLP models. | 
| K_FOLD_CROSS_VALIDATION | bool | Use this to force k-fold cross validation bagging on or off. | 
| PRETRAINED_LLM_NAME | str | Enable algorithms which process text using pretrained large language models. | 
| TRUNCATION_STRATEGY | str | What strategy to use to deal with text rows with more than a given number of tokens (if num of tokens is more than "max_tokens_in_sentence"). | 
| PERFORM_FEATURE_SELECTION | bool | If enabled, additional algorithms which support feature selection as a pretraining step will be trained separately with the selected subset of features. The details about their selected features can be found in their respective logs. | 
| TEST_ROW_INDICATOR | str | Column indicating which rows to use for training (TRAIN) and testing (TEST). Validation (VAL) can also be specified. | 
| SAMPLE_WEIGHT | str | Specify a column to use as the weight of a sample for training and eval. | 
| REBALANCE_CLASSES | bool | Class weights are computed as the inverse of the class frequency from the training dataset when this option is selected as "Yes". It is useful when the classes in the dataset are unbalanced. Re-balancing classes generally boosts recall at the cost of precision on rare classes. | 
| TEST_SPLIT | int | Percent of dataset to use for test data. We support using a range between 5% to 20% of your dataset to use as test data. | 
| TRAINING_ROWS_DOWNSAMPLE_RATIO | float | Uses this ratio to train on a sample of the dataset provided. | 
| PRETRAINED_MODEL_NAME | str | Enable algorithms which process text using pretrained multilingual NLP models. | 
| TIMESTAMP_BASED_SPLITTING_METHOD | RegressionTimeSplitMethod | Method of selecting TEST set, top percentile wise or after a given timestamp. | 
| CUSTOM_METRICS | List[str] | Registered custom metrics available for selection. | 
| DATA_SPLIT_FEATURE_GROUP_TABLE_NAME | str | Specify the table name of the feature group to export training data with the fold column. | 
| FEATURE_SELECTION_INTENSITY | int | This determines the strictness with which features will be filtered out. 1 being very lenient (more features kept), 100 being very strict. | 
| TYPE_OF_SPLIT | RegressionTypeOfSplit | Type of data splitting into train/test (validation also). | 
| MIN_CATEGORICAL_COUNT | int | Minimum threshold to consider a value different from the unknown placeholder. | 
| RARE_CLASS_AUGMENTATION_THRESHOLD | float | Augments any rare class whose relative frequency with respect to the most frequent class is less than this threshold. Default = 0.1 for classification problems with rare classes. | 
| NUMERIC_CLIPPING_PERCENTILE | float | Uses this option to clip the top and bottom x percentile of numeric feature columns where x is the value of this option. | 
| TEST_SPLITTING_TIMESTAMP | str | Rows with timestamp greater than this will be considered to be in the test set. | 
| MAX_TEXT_WORDS | int | Maximum number of words to use from text fields. | 
| OBJECTIVE | RegressionObjective | Ranking scheme used to select final best model. | 
| SORT_OBJECTIVE | RegressionObjective | Ranking scheme used to sort models on the metrics page. | 
| LOSS_PARAMETERS | str | Loss function params in format | 
| AUGMENTATION_STRATEGY | RegressionAugmentationStrategy | Strategy to deal with class imbalance and data augmentation. | 
| DROPOUT_RATE | int | Dropout percentage rate. |