Training Parameters And Accuracy Measures

Our platform provides the flexibility to adjust a set of training parameters. There are general training parameters and advanced training options that could influence the model predictions. The predictions are measured on the basis of a set of accuracy measures or metrics that are also discussed in this section.


Training Options

Once you have fulfilled all the feature group requirements for the use case, you can set the following general and advanced training configuration options to train your ML model:

Training Option Name Description Possible Values
Name The name you would like to give to the model that is going to be trained. The system generates a default name depending upon the name of the project the model is a part of. The name can be comprised of any alphanumeric character and the length can be anywhere from 5 to 60 characters.
Set Refresh Schedule (UTC) Refresh schedule refers to the schedule when your dataset is set to be replaced by an updated copy of the particular dataset in context from your storage bucket location. This value to be entered is a CRON time string that describes the schedule in UTC time zone. A string in CRON Format. If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language.

Advanced Training Options

For Advanced Options, our AI engine will automatically set the optimum values. We recommend overriding these options only if you are familiar with deep learning. Overview of Advanced Options:

Training Option Name API Configuration Name Description Possible Values
Test Split TEST_SPLIT Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data.
Dropout DROPOUT Dropout percentage in deep neural networks. It is a regularization method used for better generalization over new data. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much and enhances their isolated learning. 0 to 90
Batch Size BATCH_SIZE The number of data points provided to the model at once (in one batch) during training. The batch size impacts how quickly a model learns and the stability of the learning process. It is an important hyperparameter that should be well-tuned. For more details, please visit our Blog. 16 / 32 / 64 / 128
Spike Up SPIKE_UP Detect outliers with a high value. true/false
Value High VALUE_HIGH Detect unusually high values. true/false
Mixture of Gaussians MIXTURE_OF_GAUSSIANS Detect unusual combinations of values using mixture of Gaussians. true/false
Variational Autoencoder VARIATIONAL_AUTOENCODER Detect unusual combinations of values using variational autoencoder. true/false
Spike Down SPIKE_DOWN Detect outliers with a low value. true/false
Trend Change TREND_CHANGE Detect changes to the trend. true/false
Experiment Mode EXPERIMENT_MODE Switch to an experimental mode. no_experiment / experiment_a
Code Version CODE_VERSION Switch to a legacy version of the codebase. 2.0 / 1.0

Metrics

Our AI engine will calculate the following metrics for this use case:

Metric Name Description
Mean of Reconstruction Error While approximating complex data distributions to identify useful patterns in data, the objective is to find the most optimal distribution from a set of many possible distributions. This objective is achieved by minimizing the difference between the distribution within the input data points and the target distribution (approximated distribution). This difference is known as reconstruction error. The mean of reconstruction error is the average of all reconstruction errors from all the test data points. Elaborating more on reconstruction error, variational inference is a technique in statistics that is used to approximate complex distributions. The idea is to set a parametrized family of distribution (for example the family of Gaussians, whose parameters are the mean and the covariance) and to look for the best approximation of our target distribution among this family. The best element in the family is one that minimize a given approximation error measurement (most of the time the Kullback-Leibler divergence between approximation and target) and is found by gradient descent over the parameters that describe the family. This approximation error measurement is called reconstruction error. For further details, please visit this link.
Standard Deviation of Reconstruction Error The standard deviation of reconstruction error is the square root of the average of the squared differences of reconstruction error value from the mean of reconstruction error.