createDatasetFromDatabaseConnector POST

Creates a dataset from a Database Connector.


Yes tableName str Organization-unique table name.
Yes databaseConnectorId str Unique String Identifier of the Database Connector to import the dataset from.
No objectName str If applicable, the name/ID of the object in the service to query.
No columns str The columns to query from the external service object.
No queryArguments str Additional query arguments to filter the data.
No refreshSchedule str The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.
No sqlQuery str The full SQL query to use when fetching data. If present, this parameter will override `object_name`, `columns`, `timestamp_column`, and `query_arguments`.
No incremental bool Signifies if the dataset is an incremental dataset.
No attachmentParsingConfig AttachmentParsingConfig The attachment parsing configuration. Only valid when attachments are being imported, either will take fg name and column name, or we will take list of urls to import (e.g. importing attachments via Salesforce).
KEY TYPE Description
featureGroupName str feature group name
columnName str column name
urls str list of urls
No incrementalDatabaseConnectorConfig IncrementalDatabaseConnectorConfig The config for incremental datasets. Only valid if incremental is True
KEY TYPE Description
timestampColumn str If dataset is incremental, this is the column name of the required column in the dataset. This column must contain timestamps in descending order which are used to determine the increments of the incremental dataset.
No documentProcessingConfig DatasetDocumentProcessingConfig The document processing configuration. Only valid when documents are being imported (e.g. importing KnowledgeArticleDescriptions via Salesforce).
KEY TYPE Description
pageTextColumn str Name of the output column which contains the extracted text for each page. If not provided, no column will be created.
No versionLimit int The number of recent versions to preserve for the dataset (minimum 30).
Note: The arguments for the API methods follow camelCase but for Python SDK underscore_case is followed.


success Boolean true if the call succeeded, false if there was an error
result Dataset
KEY TYPE Description
datasetId str The unique identifier of the dataset.
sourceType str The source of the Dataset. EXTERNAL_SERVICE, UPLOAD, or STREAMING.
dataSource str Location of data. It may be a URI such as an s3 bucket or the database table.
createdAt str The timestamp at which this dataset was created.
ignoreBefore str The timestamp at which all previous events are ignored when training.
ephemeral bool The dataset is ephemeral and not used for training.
lookbackDays int Specific to streaming datasets, this specifies how many days worth of data to include when generating a snapshot. Value of 0 indicates leaves this selection to the system.
databaseConnectorId str The Database Connector used.
databaseConnectorConfig dict The database connector query used to retrieve data.
connectorType str The type of connector used to get this dataset FILE or DATABASE.
featureGroupTableName str The table name of the dataset's feature group
applicationConnectorId str The Application Connector used.
applicationConnectorConfig dict The application connector query used to retrieve data.
incremental bool If dataset is an incremental dataset.
isDocumentset bool If dataset is a documentset.
extractBoundingBoxes bool Signifies whether to extract bounding boxes out of the documents. Only valid if is_documentset if True.
mergeFileSchemas bool If the merge file schemas policy is enabled.
referenceOnlyDocumentset bool Signifies whether to save the data reference only. Only valid if is_documentset if True.
versionLimit int Version limit for the dataset.
latestDatasetVersion DatasetVersion The latest version of this dataset.
KEY TYPE Description
datasetVersion str The unique identifier of the dataset version.
status str The current status of the dataset version
datasetId str A reference to the Dataset this dataset version belongs to.
size int The size in bytes of the file.
rowCount int Number of rows in the dataset version.
fileInspectMetadata dict Metadata information about file's inspection. For example - the detected delimiter for CSV files.
createdAt str The timestamp this dataset version was created.
error str If status is FAILED, this field will be populated with an error.
incrementalQueriedAt str If the dataset version is from an incremental dataset, this is the last entry of timestamp column when the dataset version was created.
uploadId str If the dataset version is being uploaded, this the reference to the Upload
mergeFileSchemas bool If the merge file schemas policy is enabled.
databaseConnectorConfig dict The database connector query used to retrieve data for this version.
applicationConnectorConfig dict The application connector used to retrieve data for this version.
invalidRecords str Invalid records in the dataset version
schema DatasetColumn List of resolved columns.
KEY TYPE Description
name str The unique name of the column.
dataType str The underlying data type of each column.
detectedDataType str The detected data type of the column.
featureType str Feature type of the column.
detectedFeatureType str The detected feature type of the column.
originalName str The original name of the column.
validDataTypes List[str] The valid data type options for this column.
timeFormat str The detected time format of the column.
timestampFrequency str The detected frequency of the timestamps in the dataset.
refreshSchedules RefreshSchedule List of schedules that determines when the next version of the dataset will be created.
KEY TYPE Description
refreshPolicyId str The unique identifier of the refresh policy
nextRunTime str The next run time of the refresh policy. If null, the policy is paused.
cron str A cron-style string that describes the when this refresh policy is to be executed in UTC
refreshType str The type of refresh that will be run
error str An error message for the last pipeline run of a policy
parsingConfig ParsingConfig The parsing config used for dataset.
KEY TYPE Description
filePathWithSchema str Path to the file with schema. Defaults to None.
escape str Escape character for CSV files. Defaults to '"'.
csvDelimiter str Delimiter for CSV files. Defaults to None.
documentProcessingConfig DocumentProcessingConfig The document processing config used for dataset (when is_documentset is True).
KEY TYPE Description
removeWatermarks bool Whether to remove watermarks. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
extractBoundingBoxes bool Whether to perform OCR and extract bounding boxes. If False, no OCR will be done but only the embedded text from digital documents will be extracted. Defaults to False.
maskPii bool Whether to mask personally identifiable information (PII) in the document text/tokens. Defaults to False.
documentType DocumentType Type of document. Can be one of Text, Tables and Forms, Embedded Images, etc. If not specified, type will be decided automatically.
removeHeaderFooter bool Whether to remove headers and footers. Defaults to False. This option only takes effect when extract_bounding_boxes is True.
highlightRelevantText bool Whether to extract bounding boxes and highlight relevant text in search results. Defaults to False.
ocrMode OcrMode OCR mode. There are different OCR modes available for different kinds of documents and use cases. This option only takes effect when extract_bounding_boxes is True.
convertToMarkdown bool Whether to convert extracted text to markdown. Defaults to False. This option only takes effect when extract_bounding_boxes is True.
extractImages bool Whether to extract images from the document e.g. diagrams in a PDF page. Defaults to False.
useFullOcr bool Whether to perform full OCR. If True, OCR will be performed on the full page. If False, OCR will be performed on the non-text regions only. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
attachmentParsingConfig AttachmentParsingConfig The attachment parsing config used for dataset (eg. for salesforce attachment parsing)
KEY TYPE Description
featureGroupName str feature group name
columnName str column name
urls str list of urls



