Our platform provides the flexibility to adjust a set of training parameters. There are general training parameters and advanced training options that could influence the model predictions. The predictions are measured on the basis of a set of accuracy measures or metrics that are also discussed in this section.
Once you have fulfilled all the feature group requirements for the use case, you can set the following general and advanced training configuration options to train your ML model:
Training Option Name | Description | Possible Values |
---|---|---|
Name | The name you would like to give to the model that is going to be trained. The system generates a default name depending upon the name of the project the model is a part of. | The name can be comprised of any alphanumeric character and the length can be anywhere from 5 to 60 characters. |
Image Table | Serves as input for vision models, incorporating image data and metadata to provide the essential structure required during the training of vision models. | Documents can be in formats like plain text, CSV, JSON, PDF, docx, or zip with IMAGE and DOCUMENT ID columns set |
Set Refresh Schedule (UTC) | Refresh schedule refers to the schedule when your dataset is set to be replaced by an updated copy of the particular dataset in context from your storage bucket location. This value to be entered is a CRON time string that describes the schedule in UTC time zone. | A string in CRON Format. If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language. |
For Advanced Options, our AI engine will automatically set the optimum values. We recommend overriding these options only if you are familiar with deep learning. Overview of Advanced Options:
Training Option Name | API Configuration Name | Description | Possible Values |
---|---|---|---|
Test Split | TEST_SPLIT | Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. | Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. |
Dropout | DROPOUT | Dropout percentage in deep neural networks. It is a regularization method used for better generalization over new data. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much and enhances their isolated learning. | 0 to 90 |
Batch Size | BATCH_SIZE | The number of data points provided to the model at once (in one batch) during training. The batch size impacts how quickly a model learns and the stability of the learning process. It is an important hyperparameter that should be well-tuned. For more details, please visit our Blog. | 16 / 32 / 64 / 128 |
Our AI engine will calculate the following metrics for this use case:
Metric Name | Description |
---|---|
Accuracy | This metric calculates the percentage of predictions that were correct of the total number of predictions made by the model. 100% means that the model completed the task with no errors. |
Log Loss | Log loss, or logarithmic loss, is a measure of how well a probabilistic model predicts the likelihood of true outcomes, with lower values indicating better performance. |
Class Label | This is the name of the class for which we are computing the metrics. |
Support | This is the number of occurrences of a class label in the dataset. E.g. let's say if 'car' class appears 1000 times in the dataset with 10,000 data points then the support will be 1000 for 'car' class. |
Precision | Precision is the percentage of your results which are relevant. In other words, it is the fraction of relevant results among the retrieved results. It ranges from 0 to 1. The closer it gets to 1, the better. For further details, please visit this link. |
Recall | Recall is the percentage of total relevant results correctly classified by the model. In other words, it is the fraction of the total amount of relevant instances that were actually retrieved. It has a range from 0 to 1. The closer it gets to 1, the better. For further details, please visit this link. |