Training config for the NAMED_ENTITY_EXTRACTION problem type
KEY | TYPE | Description |
---|---|---|
DOCUMENT_FORMAT | NLPDocumentFormat | Format of the input documents. |
TEST_ROW_INDICATOR | str | Column indicating which rows to use for training (TRAIN) and testing (TEST). |
ADDITIONAL_EXTRACTION_INSTRUCTIONS | str | Additional instructions to guide the LLM in extracting the entities. Only used with LLM algorithms. |
SAVE_PREDICTED_PDF | bool | Whether to save predicted PDF documents |
MINIMUM_BOUNDING_BOX_OVERLAP_RATIO | float | Tokens are considered to belong to annotation if the user bounding box is provided and ratio of (token_bounding_box ∩ annotation_bounding_box) / token_bounding_area is greater than the provided value. |
ENHANCED_OCR | bool | Enhanced text extraction from predicted digital documents |
LLM_FOR_NER | NERForLLM | LLM to use for NER from among available LLM |
TEST_SPLIT | int | Percent of dataset to use for test data. We support using a range between 5 ( i.e. 5% ) to 20 ( i.e. 20% ) of your dataset. |
ACTIVE_LABELS_COLUMN | str | Entities that have been marked in a particular text |