Training Parameters And Accuracy Measures

Our platform provides the flexibility to adjust a set of training parameters. There are general training parameters and advanced training options that could influence the model predictions. The predictions are measured on the basis of a set of accuracy measures or metrics that are also discussed in this section.


Training Options

Once you have fulfilled all the feature group requirements for the use case, you can set the following general and advanced training configuration options to train your ML model:

Training Option Name Description Possible Values
Name The name you would like to give to the model that is going to be trained. The system generates a default name depending upon the name of the project the model is a part of. The name can be comprised of any alphanumeric character and the length can be anywhere from 5 to 60 characters.
Cumulative Prediction Lengths None None
Historical Frequency None None
Set Refresh Schedule (UTC) Refresh schedule refers to the schedule when your dataset is set to be replaced by an updated copy of the particular dataset in context from your storage bucket location. This value to be entered is a CRON time string that describes the schedule in UTC time zone. A string in CRON Format. If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language.

Advanced Training Options

For Advanced Options, our AI engine will automatically set the optimum values. We recommend overriding these options only if you are familiar with deep learning. Overview of Advanced Options:

Training Option Name API Configuration Name Description Possible Values
Test Split TEST_SPLIT Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data.
Skip Input Transform SKIP_INPUT_TRANSFORM Avoid doing numeric scaling transformations on the input. Yes/No
Skip Target Transform SKIP_TARGET_TRANSFORM Avoid doing numeric scaling transformations on the target. Yes/No
Enable Neural Features ENABLE_NEURAL_FEATURES Use neural feature search algorithm. Yes/No

Metrics

Our AI engine will calculate the following metrics for this use case:

Metric Name Description
Average Coefficient of Determination None. For further details, please visit this page.
Average Mean Absolute Error None. For further details, please visit this link.
Average Root Mean Square Error None. For further details, please visit this link.
Average Weighted Mean Percentage Error None. For further details, please visit the wiki link .
Cumulative Prediction Length None.
Coefficient of Determination (R2) The coefficient of determination (denoted by R2) is a key output of regression analysis. For understanding R2, we would need to first understand dependent variables and independent variables. A dependent variable is a feature (column) in the dataset that contributes toward the change in the target variable / independent variable (the class value or the target column in the dataset). The coefficient of determination (R2) is the percentage variance in the dependent variable that the independent variables explain collectively. The value of R2 ranges from 0 to 1 where 0 means no variation in the target variable is explained by the dependent variables and 1 means all the variations are explained by the dependent variables. Thus, in general, the closer the value of R2 to 1, the better. For further details, please visit this page.
Root Mean Square Error (RMSE) It is the square root of the average of squares of all the differences between the predicted and actual result values. In other words, the difference between the predicted and actual values is calculated and squared. Next, an average of all the squared differences is calculated. Finally, the root of the average value is taken and considered as RMSE. This makes RMSE, a non-negative value, and a score of 0 (almost never achieved in practice) would indicate a perfect fit to the data. In general, a lower RMSE score is better than a higher one. However, comparisons across different types of data would be invalid because the metric is dependent on the scale of the numbers used. The errors are squared before they are averaged, so RMSE gives a relatively high weight to large errors. This means that it is more useful when large errors are particularly undesirable, for example, camera calibration where being off by 5 degrees is more than twice as bad as being off by 2.5 degrees. Further this makes RMSE sensitive to outliers. RMSE is fundamentally harder to understand and interpret as compared to the Mean Absolute Error (MAE). Each error in MAE influences it in direct proportion to the absolute value of the error, which is not the case for RMSE. For further details, please visit this link.
Mean Absolute Error (MAE) MAE is the average difference between the predicted and actual values. The lower the value of this metric the better. A score of 0 means that the model has perfect results. MAE uses the same scale as the data being measured. So, it is a scale-dependent accuracy measure and therefore cannot be used to make comparisons between the models that use different scales. It measures the average magnitude of the errors in a set of predictions, without considering their direction. It is a common measure of forecast error in time series analysis and is relatively easier to interpret (as compared to Root Mean Square Error). For further details, please visit this link.
Mean Absolute Percentage Error (MAPE) This metric is defined as the average of the difference between the prediction and observed results calculated as percentage with respect to observed results over all the data points. A metric score of 0% means that the model forecasted with perfect results whereas a metric score of 100% means that the model was completely inaccurate. In other words, it is a statistical measure of how accurate a forecast system is, such that it measures this accuracy as the average absolute percent error for each time period where it subtracts forecasted value with the actual value and then divides it by actual value. MAPE is commonly used as a loss function for regression problems and in model evaluation, because of its very intuitive interpretation in terms of relative error. The interpretation of MAPE is quite simple and allows it to be used as a very common measure for forecast error but it has drawbacks in practical application. It works best if there are no extremes to the data (and no zeros because of division by zero problem). For further details, please visit this link.
Average Coefficient of Determination None. For further details, please visit this page.

Note: In addition to the above metrics, our engine will train a baseline model and generate metrics for the baseline model. Typically the metrics for your custom deep learning model should be better than the baseline model.