DocumentProcessingConfig

Document processing configuration.

KEY TYPE Description
documentType DocumentType Type of document. Can be one of Text, Tables and Forms, Embedded Images, etc. If not specified, type will be decided automatically.
convertToMarkdown bool Whether to convert extracted text to markdown. Defaults to False. This option only takes effect when extract_bounding_boxes is True.
removeWatermarks bool Whether to remove watermarks. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
highlightRelevantText bool Whether to extract bounding boxes and highlight relevant text in search results. Defaults to False.
maskPii bool Whether to mask personally identifiable information (PII) in the document text/tokens. Defaults to False.
useFullOcr bool Whether to perform full OCR. If True, OCR will be performed on the full page. If False, OCR will be performed on the non-text regions only. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
extractBoundingBoxes bool Whether to perform OCR and extract bounding boxes. If False, no OCR will be done but only the embedded text from digital documents will be extracted. Defaults to False.
ocrMode OcrMode OCR mode. There are different OCR modes available for different kinds of documents and use cases. This option only takes effect when extract_bounding_boxes is True.
removeHeaderFooter bool Whether to remove headers and footers. Defaults to False. This option only takes effect when extract_bounding_boxes is True.