Training Parameters And Accuracy Measures

Our platform provides the flexibility to adjust a set of training parameters. There are general training parameters and advanced training options that could influence the model predictions. The predictions are measured on the basis of a set of accuracy measures or metrics that are also discussed in this section.

Training Options

Once you have fulfilled all the feature group requirements for the use case, you can set the following general and advanced training configuration options to train your ML model:

Training Option Name	Description	Possible Values
Name	The name you would like to give to the model that is going to be trained. The system generates a default name depending upon the name of the project the model is a part of.	The name can be comprised of any alphanumeric character and the length can be anywhere from 5 to 60 characters.
List of documents	A compilation of text-based documents that encompass articles, reviews, and written material for training or evaluating NLP models.	Documents can be in formats like plain text, CSV, JSON, PDF, docx, or zip
Evaluation	Pairs of search queries and ground truth answerts designed to assess a model's performance in a project. While optional, it is highly recommended to include an evaluation set for evaluating the model's effectiveness	Pairs of questions and answers stored in JSON or CSV
Document Retrievers	List of document retriever names to use for the feature stores this model was trained with.	List of document retriever names
Number of Completion Tokens	Default for maximum number of tokens for chat answers. Reducing this will get faster responses which are more succinct.	Integers from 1 to the maximum token limit for the model.
Temperature	Regulates the randomness, or creativity, of the LLM responses. Increasing it increases the randomness.	Floats from 0.0 to 2.0
Metadata Columns	Metadata columns to include in the retrieved search results.	List of columns
Include General Knowledge	Allow the LLM to rely not just on search results, but to fall back on general knowledge.	true/false
Behavior Instructions	Customize the overall role instructions for the LLM.	String
Response Instructions	Customize instructions for what the LLM responses should look like.	String
Max Search Results	Maximum number of search results in the retrieval augmentation step. If we know that the questions are likely to have snippets which are easily matched in the documents, then a lower number will help with accuracy.	Integers greater than 0
Data Feature Group IDs	List of feature group ids to use to possibly query for the chatllm.	List of feature group ids
Data Prompt Context	Prompt context for the data feature group ids.	String

Metrics

Our AI engine will calculate the following metrics for this use case:

Metric Name	Description
Bert F1 Score	BERTScore is an automatic evaluation metric used for testing the goodness of text generation systems. Unlike existing popular methods that compute token level syntactical similarity, BERTScore focuses on computing semantic similarity between tokens of reference and hypothesis. BERT F1 mimics the standard F1 by being the harmonic mean of BERT Precision and Recall, hence ranging between 0 and 1.
METEOR Score	METEOR (Metric for Evaluation of Translation with Explicit ORdering) is a metric for the evaluation of machine translation output. It takes into account stemming and synonymy matching, along with the standard exact word matching. The metric is based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision.
BLEU Score	BLEU (BiLingual Evaluation Understudy) is a metric for automatically evaluating machine-translated text between 0 and 1 with one being the highest attainable score. Quality is considered to be the correspondence between a machine's output and that of a human: The central idea behind BLEU is that the closer a machine translation is to a professional human translation, the better it is.
ROUGE-L Score	ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation. Longest Common Subsequence (LCS) based statistics. Longest common subsequence problem takes into account sentence-level structure similarity naturally and identifies longest co-occurring in sequence n-grams automatically.
Count	Number of samples used to compute the metric.