A dataset reference
| KEY | TYPE | Description |
|---|---|---|
| datasetId | str | The unique identifier of the dataset. |
| sourceType | str | The source of the Dataset. EXTERNAL_SERVICE, UPLOAD, or STREAMING. |
| dataSource | str | Location of data. It may be a URI such as an s3 bucket or the database table. |
| createdAt | str | The timestamp at which this dataset was created. |
| ignoreBefore | str | The timestamp at which all previous events are ignored when training. |
| ephemeral | bool | The dataset is ephemeral and not used for training. |
| lookbackDays | int | Specific to streaming datasets, this specifies how many days worth of data to include when generating a snapshot. Value of 0 indicates leaves this selection to the system. |
| databaseConnectorId | str | The Database Connector used. |
| databaseConnectorConfig | dict | The database connector query used to retrieve data. |
| connectorType | str | The type of connector used to get this dataset FILE or DATABASE. |
| featureGroupTableName | str | The table name of the dataset's feature group |
| applicationConnectorId | str | The Application Connector used. |
| applicationConnectorConfig | dict | The application connector query used to retrieve data. |
| incremental | bool | If dataset is an incremental dataset. |
| isDocumentset | bool | If dataset is a documentset. |
| extractBoundingBoxes | bool | Signifies whether to extract bounding boxes out of the documents. Only valid if is_documentset if True. |
| mergeFileSchemas | bool | If the merge file schemas policy is enabled. |
| referenceOnlyDocumentset | bool | Signifies whether to save the data reference only. Only valid if is_documentset if True. |
| versionLimit | int | Version limit for the dataset. |
| latestDatasetVersion | DatasetVersion | The latest version of this dataset. |
| schema | DatasetColumn | List of resolved columns. |
| refreshSchedules | RefreshSchedule | List of schedules that determines when the next version of the dataset will be created. |
| parsingConfig | ParsingConfig | The parsing config used for dataset. |
| documentProcessingConfig | DocumentProcessingConfig | The document processing config used for dataset (when is_documentset is True). |
| attachmentParsingConfig | AttachmentParsingConfig | The attachment parsing config used for dataset (eg. for salesforce attachment parsing) |