Our platform provides the flexibility to adjust a set of training parameters. There are general training parameters and advanced training options that could influence the model predictions. The predictions are measured on the basis of a set of accuracy measures or metrics that are also discussed in this section.
Once you have fulfilled all the feature group requirements for the use case, you can set the following general and advanced training configuration options to train your ML model:
Training Option Name | Description | Possible Values |
---|---|---|
Name | The name you would like to give to the model that is going to be trained. The system generates a default name depending upon the name of the project the model is a part of. | The name can be comprised of any alphanumeric character and the length can be anywhere from 5 to 60 characters. |
Set Refresh Schedule (UTC) | Refresh schedule refers to the schedule when your dataset is set to be replaced by an updated copy of the particular dataset in context from your storage bucket location. This value to be entered is a CRON time string that describes the schedule in UTC time zone. | A string in CRON Format. If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language. |
For Advanced Options, our AI engine will automatically set the optimum values. We recommend overriding these options only if you are familiar with deep learning. Overview of Advanced Options:
Training Option Name | API Configuration Name | Description | Possible Values |
---|---|---|---|
Test Split | TEST_SPLIT | Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. | Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. |
Dropout | DROPOUT | Dropout percentage in deep neural networks. It is a regularization method used for better generalization over new data. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much and enhances their isolated learning. | 0 to 90 |
Batch Size | BATCH_SIZE | The number of data points provided to the model at once (in one batch) during training. The batch size impacts how quickly a model learns and the stability of the learning process. It is an important hyperparameter that should be well-tuned. For more details, please visit our Blog. | 16 / 32 / 64 / 128 |
Spike Up | SPIKE_UP | Detect outliers with a high value. | true/false |
Value High | VALUE_HIGH | Detect unusually high values. | true/false |
Mixture of Gaussians | MIXTURE_OF_GAUSSIANS | Detect unusual combinations of values using mixture of Gaussians. | true/false |
Variational Autoencoder | VARIATIONAL_AUTOENCODER | Detect unusual combinations of values using variational autoencoder. | true/false |
Spike Down | SPIKE_DOWN | Detect outliers with a low value. | true/false |
Trend Change | TREND_CHANGE | Detect changes to the trend. | true/false |
Experiment Mode | EXPERIMENT_MODE | Switch to an experimental mode. | no_experiment / experiment_a |
Code Version | CODE_VERSION | Switch to a legacy version of the codebase. | 2.0 / 1.0 |
Our AI engine will calculate the following metrics for this use case:
Metric Name | Description |
---|---|
Mean of Reconstruction Error | While approximating complex data distributions to identify useful patterns in data, the objective is to find the most optimal distribution from a set of many possible distributions. This objective is achieved by minimizing the difference between the distribution within the input data points and the target distribution (approximated distribution). This difference is known as reconstruction error. The mean of reconstruction error is the average of all reconstruction errors from all the test data points. Elaborating more on reconstruction error, variational inference is a technique in statistics that is used to approximate complex distributions. The idea is to set a parametrized family of distribution (for example the family of Gaussians, whose parameters are the mean and the covariance) and to look for the best approximation of our target distribution among this family. The best element in the family is one that minimize a given approximation error measurement (most of the time the Kullback-Leibler divergence between approximation and target) and is found by gradient descent over the parameters that describe the family. This approximation error measurement is called reconstruction error. For further details, please visit this link. |
Standard Deviation of Reconstruction Error | The standard deviation of reconstruction error is the square root of the average of the squared differences of reconstruction error value from the mean of reconstruction error. |