Abacus.AI - DocumentProcessingConfig

Overview Use Cases Connectors Authentication Getting Started with the Python SDK API API Reference API Classes API Inputs AIAgentBatchPredictionArgs AIAgentTrainingConfig AbacusUsageMetricsDatasetConfig AccuracyBelowThresholdConditionConfig AgentConversationMessage AgentFlowButtonResponseSection AlertActionConfig AlertConditionConfig ApplicationConnectorDatasetConfig AttachmentParsingConfig Base64ImageResponseSection BatchPredictionArgs BiasViolationConditionConfig Blob BlobInput BoxDatasetConfig ChartResponseSection ChatLLMBatchPredictionArgs ChatLLMPredictionArguments ChatLLMTrainingConfig ClusteringTimeseriesTrainingConfig ClusteringTrainingConfig CodeResponseSection CollapseResponseSection ConfluenceDatasetConfig ConstraintConfig ConstraintProjectFeatureGroupConfig CrawlerTransformConfig CumulativeForecastingPredictionArguments CumulativeForecastingTrainingConfig CustomAlgorithmTrainingConfig CustomTrainedModelTrainingConfig DataGenerationConfig DataIntegrityViolationConditionConfig DatabaseConnectorExportConfig DataframeResponseSection DatasetConfig DatasetDocumentProcessingConfig DecisionNode DeployableAlgorithm DocumentClassificationTrainingConfig DocumentProcessingConfig DocumentRetrieverConfig DocumentSummarizationTrainingConfig DocumentVisualizationTrainingConfig EmailActionConfig EventAnomalyTrainingConfig ExtractDocumentDataConfig FeatureDriftConditionConfig FeatureGroupExportConfig FeatureMappingConfig FeatureStorePredictionArguments FieldDescriptor FileConnectorExportConfig ForecastingBatchPredictionArgs ForecastingMonitorConfig ForecastingPredictionArguments ForecastingTrainingConfig FreshserviceDatasetConfig GoogleAnalyticsDatasetConfig GoogleDriveDatasetConfig HistoryLengthDriftConditionConfig HotkeyPrompt ImageUrlResponseSection IncrementalDatabaseConnectorConfig ItemAttributesStdDevThreshold JSONSchema JiraDatasetConfig KafkaDatasetConfig LastNMergeConfig ListResponseSection MarkdownConfig MergeConfig MonitorFilteringConfig MonitorThresholdConfig NSamplingConfig NamedEntityExtractionBatchPredictionArgs NamedEntityExtractionTrainingConfig NaturalLanguageSearchPredictionArguments NaturalLanguageSearchTrainingConfig OneDriveDatasetConfig OperatorConfig OptimizationPredictionArguments OptimizationTrainingConfig OutputVariableMapping ParsingConfig PercentSamplingConfig PersonalizationBatchPredictionArgs PersonalizationTrainingConfig PredictionArguments PredictionCountConditionConfig PredictiveModelingBatchPredictionArgs PretrainedModelsBatchPredictionArgs ProjectFeatureGroupConfig ProjectFeatureGroupTypeMappingsConfig PythonFunctionArgument RegressionPredictionArguments RegressionTrainingConfig ResponseSection RestrictFeatureMappings ReviewModeProjectFeatureGroupConfig RuntimeSchemaResponseSection SamplingConfig Segment SentenceBoundaryDetectionBatchPredictionArgs SentenceBoundaryDetectionTrainingConfig SentimentDetectionTrainingConfig SftpDatasetConfig SharepointDatasetConfig StdDevThreshold StreamingConnectorDatasetConfig SystemConnectorTool TargetDriftConditionConfig TeamsScraperDatasetConfig TextResponseSection ThemeAnalysisBatchPredictionArgs ThemeAnalysisTrainingConfig TimeWindowConfig TimeWindowMergeConfig TimeseriesAnomalyPredictionArguments TimeseriesAnomalyTrainingConfig TrainablePlugAndPlayBatchPredictionArgs TrainingConfig TriggerConfig UnionTransformConfig UnpivotConfig VectorStoreConfig WorkflowGraph WorkflowGraphEdge WorkflowGraphNode WorkflowNodeInputMapping WorkflowNodeInputSchema WorkflowNodeOutputMapping WorkflowNodeOutputSchema WorkflowNodeTemplateConfig WorkflowNodeTemplateInput WorkflowNodeTemplateOutput ZendeskDatasetConfig Documentation Chat Bot API Search How to

KEY	TYPE	Description
ocrMode	OcrMode	OCR mode. There are different OCR modes available for different kinds of documents and use cases. This option only takes effect when extract_bounding_boxes is True.
extractBoundingBoxes	bool	Whether to perform OCR and extract bounding boxes. If False, no OCR will be done but only the embedded text from digital documents will be extracted. Defaults to False.
useFullOcr	bool	Whether to perform full OCR. If True, OCR will be performed on the full page. If False, OCR will be performed on the non-text regions only. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
documentType	DocumentType	Type of document. Can be one of Text, Tables and Forms, Embedded Images, etc. If not specified, type will be decided automatically.
maskPii	bool	Whether to mask personally identifiable information (PII) in the document text/tokens. Defaults to False.
removeWatermarks	bool	Whether to remove watermarks. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
removeHeaderFooter	bool	Whether to remove headers and footers. Defaults to False. This option only takes effect when extract_bounding_boxes is True.
highlightRelevantText	bool	Whether to extract bounding boxes and highlight relevant text in search results. Defaults to False.
extractImages	bool	Whether to extract images from the document e.g. diagrams in a PDF page. Defaults to False.
convertToMarkdown	bool	Whether to convert extracted text to markdown. Defaults to False. This option only takes effect when extract_bounding_boxes is True.