Problem Type: ChatLLM

Use ChatLLM to create your custom ChatGPT and answer questions based on your specialized document corpus. ChatLLM models leverage the best of both fine-tuning and Retrieval Augmented Generation (RAG) techniques to build a custom chatbot on your knowledge base. This model can be trained on documents in a variety of formats and will then be able to answer questions quickly and accurately. After training your first model, you can use an evaluation set to assess the model's performance on your most important questions. Here's a quick evaluation recipe to ensure you have a high-level understanding of your model's performance.


Quick Evaluation Recipe

  1. Train Your Model with an Evaluation Feature Group
    Ensure you have trained your model with an Evaluation Feature Group. This will be a set of questions and answers that you expect the LLM to answer accurately, based on your personal dataset.

ChatLLM

  1. Compare Metric Scores Across Models
    Compare metric scores between different models to determine the ideal model to use. For all scores, the ideal value is 1.0. The higher the score, the more closely the LLM response matches the ground truth. If the evaluation questions are well-matched to the documents provided, you should expect a BERT F1 score between 0.7 and 1.0.

  2. Check the ROUGE Score
    Examine the ROUGE score to see the direct overlap between the LLM and human responses. A ROUGE score above 0.3 is usually considered good, given the variation in how responses are formed. These scores are highly dependent on the specific ground truth you use and can be skewed by more or less verbose responses.

  3. Examine Individual Questions
    Review individual questions and compare the answers from different LLMs. Verify that the LLM-generated answers are accurate and align with the ground truth.


Once you have verified that the LLM responses are accurate to the ground truth, you have officially trained your own world-class ChatLLM to respond to your customized documents. You can now deploy your model and continue to ask specific questions in the Predictions Dashboard.