Use ChatLLM to create your custom ChatGPT and answer questions based on your specialized document corpus. ChatLLM models leverage the best of both finetuning and Retrieval Augmented Generation (RAG) techniques to build a custom chatbot on your knowledge base. This model can be trained on documents in a variety of formats, and will then be able to answer questions quickly and accurately. After you train your first model, you can use an evaluation set to accurately assess the models performance on your most important questions. Here's a quick evaluation recipe to ensure you have a high level understanding of your model's performance.
Compare metric scores between different models to determine the ideal model to use. For all scores, the ideal value is 1.0, and the higher the score is, the more closely the LLM response matches the ground truth. If the evaluation questions are well matched to the documents provided, you should expect a BERT F1 score between 0.7 and 1.0.
Check the ROUGE score to see the direct overlap between the LLM and human responses. A ROUGE score above 0.3 is usually considered good, given the variation in how responses are formed. These scores are very dependant on the specific ground truth you use, and can be skewed by more or less verbose responses.
Examine individual questions and compare the answers from different LLMs. Verify that the LLM-generated answers are accurate to the ground-truth.
Once you have verified that the LLM responses are accurate to the ground truth, you have officially trained your own world-class Chat LLM to respond to your customized documents. You can now deploy your model and continue to ask specific questions in the Predictions dash.