Our platform provides the flexibility to adjust a set of training parameters. There are general training parameters and advanced training options that could influence the model predictions. The predictions are measured on the basis of a set of accuracy measures or metrics that are also discussed in this section.
Once you have fulfilled all the feature group requirements for the use case, you can set the following general and advanced training configuration options to train your ML model:
Training Option Name | Description | Possible Values |
---|---|---|
Name | The name you would like to give to the model that is going to be trained. The system generates a default name depending upon the name of the project the model is a part of. | The name can be comprised of any alphanumeric character and the length can be anywhere from 5 to 60 characters. |
List of documents | A compilation of text-based documents that encompass articles, reviews, and written material for training or evaluating NLP models. | Documents can be in formats like plain text, CSV, JSON, PDF, docx, or zip |
Evaluation | Pairs of search queries and ground truth answerts designed to assess a model's performance in a project. While optional, it is highly recommended to include an evaluation set for evaluating the model's effectiveness | Pairs of questions and answers stored in JSON or CSV |
Document Retrievers | List of document retriever names to use for the feature stores this model was trained with. | List of document retriever names |
Number of Completion Tokens | Default for maximum number of tokens for chat answers. Reducing this will get faster responses which are more succinct. | Integers from 1 to the maximum token limit for the model. |
Temperature | Regulates the randomness, or creativity, of the LLM responses. Increasing it increases the randomness. | Floats from 0.0 to 2.0 |
Metadata Columns | Metadata columns to include in the retrieved search results. | List of columns |
Include General Knowledge | Allow the LLM to rely not just on search results, but to fall back on general knowledge. | true/false |
Behavior Instructions | Customize the overall role instructions for the LLM. | String |
Response Instructions | Customize instructions for what the LLM responses should look like. | String |
Max Search Results | Maximum number of search results in the retrieval augmentation step. If we know that the questions are likely to have snippets which are easily matched in the documents, then a lower number will help with accuracy. | Integers greater than 0 |
Data Feature Group IDs | List of feature group ids to use to possibly query for the chatllm. | List of feature group ids |
Data Prompt Context | Prompt context for the data feature group ids. | String |
Our AI engine will calculate the following metrics for this use case:
Metric Name | Description |
---|---|
Bert F1 Score | BERTScore is an automatic evaluation metric used for testing the goodness of text generation systems. Unlike existing popular methods that compute token level syntactical similarity, BERTScore focuses on computing semantic similarity between tokens of reference and hypothesis. BERT F1 mimics the standard F1 by being the harmonic mean of BERT Precision and Recall, hence ranging between 0 and 1. |
METEOR Score | METEOR (Metric for Evaluation of Translation with Explicit ORdering) is a metric for the evaluation of machine translation output. It takes into account stemming and synonymy matching, along with the standard exact word matching. The metric is based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision. |
BLEU Score | BLEU (BiLingual Evaluation Understudy) is a metric for automatically evaluating machine-translated text between 0 and 1 with one being the highest attainable score. Quality is considered to be the correspondence between a machine's output and that of a human: The central idea behind BLEU is that the closer a machine translation is to a professional human translation, the better it is. |
ROUGE-L Score | ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation. Longest Common Subsequence (LCS) based statistics. Longest common subsequence problem takes into account sentence-level structure similarity naturally and identifies longest co-occurring in sequence n-grams automatically. |
Count | Number of samples used to compute the metric. |