Method
createDatasetVersionFromDocumentReprocessing POST
Copy POST

Creates a new dataset version for a source docstore dataset with the provided document processing configuration. This does not re-import the data but uses the same data which is imported in the latest dataset version and only performs document processing on it.

Arguments:

REQUIRED KEY TYPE DESCRIPTION
Yes datasetId str The unique ID associated with the dataset to use as the source dataset.
No documentProcessingConfig DatasetDocumentProcessingConfig The document processing configuration to use for the new dataset version. If not specified, the document processing configuration from the source dataset will be used.
KEY TYPE Description
pageTextColumn str Name of the output column which contains the extracted text for each page. If not provided, no column will be created.
Note: The arguments for the API methods follow camelCase but for Python SDK underscore_case is followed.

Response:

KEY TYPE DESCRIPTION
success Boolean true if the call succeeded, false if there was an error
result DatasetVersion
KEY TYPE Description
datasetVersion str The unique identifier of the dataset version.
status str The current status of the dataset version
datasetId str A reference to the Dataset this dataset version belongs to.
size int The size in bytes of the file.
rowCount int Number of rows in the dataset version.
fileInspectMetadata dict Metadata information about file's inspection. For example - the detected delimiter for CSV files.
createdAt str The timestamp this dataset version was created.
error str If status is FAILED, this field will be populated with an error.
incrementalQueriedAt str If the dataset version is from an incremental dataset, this is the last entry of timestamp column when the dataset version was created.
uploadId str If the dataset version is being uploaded, this the reference to the Upload
mergeFileSchemas bool If the merge file schemas policy is enabled.
databaseConnectorConfig dict The database connector query used to retrieve data for this version.
applicationConnectorConfig dict The application connector used to retrieve data for this version.
invalidRecords str Invalid records in the dataset version

Exceptions:

TYPE WHEN
DataNotFoundError

datasetId is not found.

Language: