Document processing configuration.
KEY | TYPE | Description |
---|---|---|
documentType | DocumentType | Type of document. Can be one of Text, Tables and Forms, Embedded Images, etc. If not specified, type will be decided automatically. |
convertToMarkdown | bool | Whether to convert extracted text to markdown. Defaults to False. This option only takes effect when extract_bounding_boxes is True. |
removeWatermarks | bool | Whether to remove watermarks. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True. |
highlightRelevantText | bool | Whether to extract bounding boxes and highlight relevant text in search results. Defaults to False. |
maskPii | bool | Whether to mask personally identifiable information (PII) in the document text/tokens. Defaults to False. |
useFullOcr | bool | Whether to perform full OCR. If True, OCR will be performed on the full page. If False, OCR will be performed on the non-text regions only. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True. |
extractBoundingBoxes | bool | Whether to perform OCR and extract bounding boxes. If False, no OCR will be done but only the embedded text from digital documents will be extracted. Defaults to False. |
ocrMode | OcrMode | OCR mode. There are different OCR modes available for different kinds of documents and use cases. This option only takes effect when extract_bounding_boxes is True. |
removeHeaderFooter | bool | Whether to remove headers and footers. Defaults to False. This option only takes effect when extract_bounding_boxes is True. |