REQUIRED |
KEY |
TYPE |
DESCRIPTION |
Yes |
tableName |
str |
Organization-unique table name for this dataset.
|
No |
fileFormat |
str |
The file format of the dataset.
|
No |
csvDelimiter |
str |
If the file format is CSV, use a specific CSV delimiter.
|
No |
isDocumentset |
bool |
Signifies if the dataset is a docstore dataset. A docstore dataset contains documents like images, PDFs, audio files etc. or is tabular data with links to such files.
|
No |
extractBoundingBoxes |
bool |
Signifies whether to extract bounding boxes out of the documents. Only valid if is_documentset if True.
|
No |
parsingConfig |
ParsingConfig |
Custom config for dataset parsing.
KEY |
TYPE |
Description |
csvDelimiter |
str |
Delimiter for CSV files. Defaults to None. |
escape |
str |
Escape character for CSV files. Defaults to '"'. |
filePathWithSchema |
str |
Path to the file with schema. Defaults to None. |
|
No |
mergeFileSchemas |
bool |
Signifies whether to merge the schemas of all files in the dataset. If is_documentset is True, this is also set to True by default.
|
No |
documentProcessingConfig |
DatasetDocumentProcessingConfig |
The document processing configuration. Only valid if is_documentset is True.
KEY |
TYPE |
Description |
pageTextColumn |
str |
Name of the output column which contains the extracted text for each page. If not provided, no column will be created. |
|
No |
versionLimit |
int |
The number of recent versions to preserve for the dataset (minimum 30).
|
Note: The arguments for the API methods follow camelCase but for Python SDK underscore_case is followed.