NamedEntityExtractionTrainingConfig

Training config for the NAMED_ENTITY_EXTRACTION problem type

KEY TYPE Description
SAVE_PREDICTED_PDF bool Whether to save predicted PDF documents
MINIMUM_BOUNDING_BOX_OVERLAP_RATIO float Tokens are considered to belong to annotation if the user bounding box is provided and ratio of (token_bounding_box ∩ annotation_bounding_box) / token_bounding_area is greater than the provided value.
LLM_FOR_NER NERForLLM LLM to use for NER from among available LLM
DOCUMENT_FORMAT NLPDocumentFormat Format of the input documents.
ENHANCED_OCR bool Enhanced text extraction from predicted digital documents
ADDITIONAL_EXTRACTION_INSTRUCTIONS str Additional instructions to guide the LLM in extracting the entities. Only used with LLM algorithms.
ACTIVE_LABELS_COLUMN str Entities that have been marked in a particular text
TEST_ROW_INDICATOR str Column indicating which rows to use for training (TRAIN) and testing (TEST).
TEST_SPLIT int Percent of dataset to use for test data. We support using a range between 5 ( i.e. 5% ) to 20 ( i.e. 20% ) of your dataset.