Our platform provides the flexibility to adjust a set of training parameters. There are general training parameters and advanced training options that could influence the model predictions. The predictions are measured on the basis of a set of accuracy measures or metrics that are also discussed in this section.
Once you have fulfilled all the feature group requirements for the use case, you can set the following general and advanced training configuration options to train your ML model:
Training Option Name | Description | Possible Values |
---|---|---|
Name | The name you would like to give to the model that is going to be trained. The system generates a default name depending upon the name of the project the model is a part of. | The name can be comprised of any alphanumeric character and the length can be anywhere from 5 to 60 characters. |
For Advanced Options, our AI engine will automatically set the optimum values. We recommend overriding these options only if you are familiar with deep learning. Overview of Advanced Options:
Training Option Name | API Configuration Name | Description | Possible Values |
---|---|---|---|
Test Split | TEST_SPLIT | Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. | Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. |
Dropout | DROPOUT | Dropout percentage in deep neural networks. It is a regularization method used for better generalization over new data. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much and enhances their isolated learning. | 0 to 90 |
Batch Size | BATCH_SIZE | The number of data points provided to the model at once (in one batch) during training. The batch size impacts how quickly a model learns and the stability of the learning process. It is an important hyperparameter that should be well-tuned. For more details, please visit our Blog. | 16 / 32 / 64 / 128 |
Active Labels Column | ACTIVE_LABELS_COLUMN | A column in the dataset that marks the labels whose presence/absence is known for each data point in the dataset. | Automatic / active_labels |
Our AI engine will calculate the following metrics for this use case:
Metric Name | Description |
---|---|
Accuracy | This metric calculates the percentage of predictions that were correct of the total number of predictions made by the model. 100% means that the model completed the task with no errors. |
Area Under ROC Curve (AUC Curve) | AUC, or Area Under the Curve, describes a model's ability to distinguish between two or more classes, with a higher AUC indicating superior performance in correctly predicting positive instances as positive and negative instances as negative. Conversely, an AUC close to 0 suggests the model is incorrectly classifying negative instances as positive and vice versa. A value between 0.6 and 1 signifies that the model has learned meaningful patterns rather than making random guesses. AUC serves as a performance metric for classification problems across various threshold settings, offering an aggregate measure of performance. Its desirability stems from being scale-invariant, assessing the ranking quality of predictions, and classification-threshold-invariant, evaluating prediction quality regardless of the chosen classification threshold. For more details, please visit the link. |
Class Label | This is the name of the class for which we are computing the metrics. |
Support | This is the number of occurrences of a class label in the dataset. E.g. let's say if 'car' class appears 1000 times in the dataset with 10,000 data points then the support will be 1000 for 'car' class. |
Precision | Precision is the percentage of your results which are relevant. In other words, it is the fraction of relevant results among the retrieved results. It ranges from 0 to 1. The closer it gets to 1, the better. For further details, please visit this link. |
Recall | Recall is the percentage of total relevant results correctly classified by the model. In other words, it is the fraction of the total amount of relevant instances that were actually retrieved. It has a range from 0 to 1. The closer it gets to 1, the better. For further details, please visit this link. |
Note: In addition to the above metrics, our engine will train a baseline model and generate metrics for the baseline model. Typically the metrics for your custom deep learning model should be better than the baseline model.