Training Parameters And Accuracy Measures

Our platform provides the flexibility to adjust a set of training parameters. There are general training parameters and advanced training options that could influence the model predictions. The predictions are measured on the basis of a set of accuracy measures or metrics that are also discussed in this section.


Training Options

Once you have fulfilled all the feature group requirements for the use case, you can set the following general and advanced training configuration options to train your ML model:

Training Option Name Description Possible Values
Name The name you would like to give to the model that is going to be trained. The system generates a default name depending upon the name of the project the model is a part of. The name can be comprised of any alphanumeric character and the length can be anywhere from 5 to 60 characters.
Set Refresh Schedule (UTC) Refresh schedule refers to the schedule when your dataset is set to be replaced by an updated copy of the particular dataset in context from your storage bucket location. This value to be entered is a CRON time string that describes the schedule in UTC time zone. A string in CRON Format. If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language.
Timeseries Frequency Frequency of timeseries data, used to fill missing timestamp values. Hourly / Daily / Weekly (Monday to next Monday) / Weekly / Monthly / Yearly

Advanced Training Options

For Advanced Options, our AI engine will automatically set the optimum values. We recommend overriding these options only if you are familiar with deep learning. Overview of Advanced Options:

Training Option Name API Configuration Name Description Possible Values
Type of Split TYPE_OF_SPLIT_TIMESERIES_ANOMALY Defines the underlying method that will be used to split data into train & test set. We support the following ways to do this :

(i) Automatic Time Based: Time based split where the train/val/test splits are decided automatically. By default, 10% of the data is used for testing, but you can override this by putting a different value into Test Split.

(ii) Fixed Timestamp Based: A timestamp can be set such that the model could be trained on a sequence of data and then made to predict on the next sequence of data based on the timestamp i.e. data points before the selected timestamp will go into train set and after that timestamp will go into test set.
Automatic Time Based / Fixed Timestamp Based
Test Split TEST_SPLIT Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data. Percentage of the dataset to use as test data. A range from 5% to 20% of the dataset as test data is recommended. The test data is used to estimate the accuracy and generalizing capability of the model on unseen (new) data.
Test Start TEST_START Limit training data to dates before the provided test start. datetime
FILL MISSING VALUES FILL_MISSING_VALUES_TIMESERIES_ANOMALY The fill missing values option provides different methods to fill in missing data so you can eliminate sparsity. We support three different types of filling methods :
1. BACK : Fills in any missing values between global start date and the item start date
2. MIDDLE : Fills in any missing values between item start date and item end date.
3. FUTURE : Fills in any missing values between item end date and global end date.
We support filling these values using AVERAGE, MEDIAN, MIN, or MAX of all item values. We also provide the option to fill using a CUSTOM float value. Additionally, support for BACK FILL, FORWARD FILL, LINEAR, and NEAREST interpolation methods is provided.
Note that the missing values are filled only after aggregating the data as per the TIMESERIES FREQUENCY setting.
BACK / MIDDLE / FUTURE
Handle Zeros as Missing Values HANDLE_ZEROES_AS_MISSING_VALUES When this option is selected, the Abacus.AI platform will consider zero values as missing values. This means that any techniques applied to fill missing values will also apply to these zero values. true / false
Min Samples in Normal Region MIN_SAMPLES_IN_NORMAL_REGION Adjust this to fine-tune the number of anomalies to be identified. Number of detected anomalies will increase with increase in min samples in normal region. Integers from 1 to 1000
Anomaly Type ANOMALY_TYPE select what kind of peaks to detect as anomalies. Only takes effect if there is only one unignored numerical column in training Feature Group High peak / Low peak
Threshold Score THRESHOLD_SCORE Threshold score to be used for anomaly detection. The threshold score is a value between 0 and 1. The higher the threshold score, the more stringent the anomaly detection. Floating value from 0 to 1

Metrics

Our AI engine will calculate the following metrics for this use case:

Metric Name Description
Recall Recall is the percentage of total relevant results correctly classified by the model. In other words, it is the fraction of the total amount of relevant instances that were actually retrieved. It has a range from 0 to 1. The closer it gets to 1, the better. For further details, please visit this link.
Area Under ROC Curve (AUC Curve) AUC, or Area Under the Curve, describes a model's ability to distinguish between two or more classes, with a higher AUC indicating superior performance in correctly predicting positive instances as positive and negative instances as negative. Conversely, an AUC close to 0 suggests the model is incorrectly classifying negative instances as positive and vice versa. A value between 0.6 and 1 signifies that the model has learned meaningful patterns rather than making random guesses. AUC serves as a performance metric for classification problems across various threshold settings, offering an aggregate measure of performance. Its desirability stems from being scale-invariant, assessing the ranking quality of predictions, and classification-threshold-invariant, evaluating prediction quality regardless of the chosen classification threshold. For more details, please visit the link.
Total Anomaly Count Total Number of Anomalies detected in Test data.
Fraction of Anomalies Detected Fraction of anomalies detected in the test data.
Average Anomalies Per Series Average number of anomalies detected per series in test data.
Average Anomalies Per Series Per Dataset Frequency Average number of anomalies detected per series per frequency of dataset in test data , frequency can be 8HR, 12HR, 1DAY, 1WEEK, etc determined from input data
Average Anomaly Score Mean of anomaly scores for all datapoints in test data