KEY |
TYPE |
Description |
documentType |
DocumentType |
Type of document. Can be one of Text, Tables and Forms, Embedded Images, etc. If not specified, type will be decided automatically. |
convertToMarkdown |
bool |
Whether to convert extracted text to markdown. Defaults to False. This option only takes effect when extract_bounding_boxes is True. |
removeWatermarks |
bool |
Whether to remove watermarks. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True. |
highlightRelevantText |
bool |
Whether to extract bounding boxes and highlight relevant text in search results. Defaults to False. |
maskPii |
bool |
Whether to mask personally identifiable information (PII) in the document text/tokens. Defaults to False. |
useFullOcr |
bool |
Whether to perform full OCR. If True, OCR will be performed on the full page. If False, OCR will be performed on the non-text regions only. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True. |
extractBoundingBoxes |
bool |
Whether to perform OCR and extract bounding boxes. If False, no OCR will be done but only the embedded text from digital documents will be extracted. Defaults to False. |
ocrMode |
OcrMode |
OCR mode. There are different OCR modes available for different kinds of documents and use cases. This option only takes effect when extract_bounding_boxes is True. |
removeHeaderFooter |
bool |
Whether to remove headers and footers. Defaults to False. This option only takes effect when extract_bounding_boxes is True. |