Training config for the CHAT_LLM problem type
| KEY | TYPE | Description |
|---|---|---|
| ENABLE_LLM_REWRITE | bool | If enabled, an LLM will rewrite the RAG queries sent to document retriever. Disabled by default. |
| ENABLE_WEB_SEARCH | bool | Allow the LLM to use Web Search Engines to retrieve information for better results. |
| DATABASE_CONNECTOR_ID | str | Database connector ID to use for connecting external database that gives access to structured data to the LLM. |
| INCLUDE_GENERAL_KNOWLEDGE | bool | Allow the LLM to rely not just on RAG search results, but to fall back on general knowledge. Disabled by default. |
| BEHAVIOR_INSTRUCTIONS | str | Customize the overall behaviour of the model. This controls things like - when to execute code (if enabled), write sql query, search web (if enabled), etc. |
| COLUMN_FILTERING_INSTRUCTIONS | str | Instructions for a LLM call to automatically generate filter expressions on document metadata to retrieve relevant documents for the conversation. |
| DATABASE_CONNECTOR_IDS | List[str] | List of database connector IDs to use for connecting external databases that give access to structured data to the LLM. |
| LOOKUP_REWRITE_INSTRUCTIONS | None | None |
| ENABLE_INLINE_SOURCE_CITATIONS | bool | Enable inline citations of the sources in the response. |
| MASK_PII | bool | Mask PII in the prompts and uploaded documents before sending it to the LLM. Only available for Enterprise users and will cause validation errors if set to True for ChatLLM Teams users. |
| JSON_RESPONSE_INSTRUCTIONS | str | Instructions to be followed while generating the json_response if `response_format` is set to "JSON". This can include the schema information if the schema is dynamic and its keys cannot be pre-determined. |
| DATABASE_CONNECTOR_TABLES | List[str] | List of tables to use from the database connector for the ChatLLM. |
| HIDE_SQL_AND_CODE | bool | When running data queries, this will hide the generated SQL and Code in the response. |
| DATA_PROMPT_CONTEXT | str | Prompt context for the data feature group IDs. |
| NUM_COMPLETION_TOKENS | int | Default for maximum number of tokens for chat answers. Reducing this will get faster responses which are more succinct. |
| ENABLE_CODE_EXECUTION | bool | Enable python code execution in the ChatLLM. This equips the LLM with a python kernel in which all its code is executed. |
| BUILTIN_TOOLS | List[SystemConnectorTool] | List of builtin system connector tools to use in the ChatLLM. Using builtin tools does not require enabling tool bar (enable_tool_bar flag). |
| SEARCH_SCORE_CUTOFF | float | Minimum search score to consider a document as a valid search result. |
| ENABLE_TOOL_BAR | bool | Enable the tool bar in Enterprise ChatLLM to provide additional functionalities like tool_use, web_search, image_gen, etc. Enabling this requires enable_web_search to be enabled. |
| UNKNOWN_ANSWER_PHRASE | str | Fallback response when the LLM can't find an answer. |
| AGENTIC_LOOP_MODE | bool | Enables use of agentic loop that uses a series of tool calls when needed to respond. If set to False, the agentic loop will not be used. If not set or set to Auto, the agentic loop will be automatically used based on certain conditions like presence of tools in the model. |
| CONFIG_CONNECTORS | List[str] | List of names of config connectors to use in the ChatLLM. This should not be used with document_retrievers. |
| RETRIEVAL_COLUMNS | list | Include the metadata column values in the retrieved search results. |
| TEMPERATURE | float | The generative LLM temperature. |
| DATA_PROMPT_COLUMN_CONTEXT | Dict[str, str] | Dict of 'table_name.column_name' and 'column_context' pairs to provide column context for some selected columns in the selected structured data table. This replaces the default auto-generated information about the column data. |
| DISABLE_DATA_SUMMARIZATION | bool | After executing a query summarize the reponse and reply back with only the table and query run. |
| DATA_FEATURE_GROUP_IDS | None | (List[str]): List of feature group IDs to use to possibly query for the ChatLLM. The created ChatLLM is commonly referred to as DataLLM. |
| DISABLE_DATA_FETCH_FOR_TRAINING | bool | Train using only table and column schema metadata without fetching sample data. This speeds up training but may result in less context for the model. |
| INCLUDE_BM25_RETRIEVAL | bool | Combine BM25 search score with vector search using reciprocal rank fusion. |
| FILTER_COLUMNS | list | Allow users to filter the document retrievers on these metadata columns. |
| DATA_PROMPT_TABLE_CONTEXT | Dict[str, str] | Dict of table name and table context pairs to provide table wise context for each structured data table. |
| JSON_RESPONSE_SCHEMA | str | Specifies the JSON schema that the model should adhere to if `response_format` is set to "JSON". This should be a json-formatted string where each field of the expected schema is mapped to a dictionary containing the fields 'type', 'required' and 'description'. For example - '{"sample_field": {"type": "integer", "required": true, "description": "Sample Field"}}' |
| MCP_SERVERS | List[str] | List of names of MCP servers to use in the ChatLLM. This should not be used with document_retrievers. |
| RESPONSE_INSTRUCTIONS | str | Customized instructions for how the model should respond inlcuding the format, persona and tone of the answers. |
| DOCUMENT_RETRIEVERS | List[str] | List of names or IDs of document retrievers to use as vector stores of information for RAG responses. |
| KEYWORD_REQUIREMENT_INSTRUCTIONS | str | Instructions for a LLM call to automatically generate keyword requirements to retrieve relevant documents for the conversation. |
| QUERY_REWRITE_INSTRUCTIONS | str | Special instructions for the LLM which rewrites the RAG query. |
| ENABLE_RESPONSE_CACHING | bool | Enable caching of LLM responses to speed up response times and improve reproducibility. |
| METADATA_COLUMNS | None | None |
| RESPONSE_FORMAT | None | (str): When set to 'JSON', the LLM will generate a JSON formatted string. |
| DATA_COLUMNS_TO_IGNORE | List[str] | Columns to ignore while encoding information about structured data tables in context for the LLM. A list of strings of format " |
| MAX_SEARCH_RESULTS | int | Maximum number of search results in the retrieval augmentation step. If we know that the questions are likely to have snippets which are easily matched in the documents, then a lower number will help with accuracy. |