Method

createDocumentRetriever POST

Copy POST

Returns a document retriever that stores embeddings for document chunks in a feature group.

Arguments:

REQUIRED

KEY

TYPE

DESCRIPTION

Yes

projectId

str

The ID of project that the Document Retriever is created in.

Yes

name

str

The name of the Document Retriever. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.

Yes

featureGroupId

str

The ID of the feature group that the Document Retriever is associated with.

documentRetrieverConfig

DocumentRetrieverConfig

The configuration, including chunk_size and chunk_overlap_fraction, for document retrieval.

KEY	TYPE	Description
chunkSize	int	The size of text chunks in the vector store.
standaloneDeployment	bool	If True, the document retriever will be deployed as a standalone deployment.
chunkOverlapFraction	float	The fraction of overlap between chunks.
summaryInstructions	str	Instructions for the LLM to generate the document summary.
scoreMultiplierColumn	str	If provided, will use the values in this metadata column to modify the relevance score of returned chunks for all queries.
useDocumentSummary	bool	If True, uses the summary of the document in addition to chunks of the document for indexing and querying.
chunkSizeFactors	list	Chunking data with multiple sizes. The specified list of factors are used to calculate more sizes, in addition to `chunk_size`.
pruneVectors	bool	Transform vectors using SVD so that the average component of vectors in the corpus are removed.
indexMetadataColumns	bool	If True, metadata columns of the FG will also be used for indexing and querying.
textEncoder	VectorStoreTextEncoder	Encoder used to index texts from the documents.

Note: The arguments for the API methods follow camelCase but for Python SDK underscore_case is followed.

Response:

KEY

TYPE

DESCRIPTION

success

Boolean

true if the call succeeded, false if there was an error

result

DocumentRetriever

KEY

TYPE

Description

name

str

The name of the document retriever.

documentRetrieverId

str

The unique identifier of the vector store.

createdAt

str

When the vector store was created.

featureGroupId

str

The feature group id associated with the document retriever.

featureGroupName

str

The feature group name associated with the document retriever.

indexingRequired

bool

Whether the document retriever is required to be indexed due to changes in underlying data.

latestDocumentRetrieverVersion

DocumentRetrieverVersion

The latest version of vector store.

KEY

TYPE

Description

documentRetrieverId

str

The unique identifier of the Document Retriever.

documentRetrieverVersion

str

The unique identifier of the Document Retriever version.

createdAt

str

When the Document Retriever was created.

status

str

The status of Document Retriever version. It represents indexing status until indexing isn't complete, and deployment status after indexing is complete.

deploymentStatus

str

The status of deploying the Document Retriever version.

featureGroupId

str

The feature group id associated with the document retriever.

featureGroupVersion

str

The unique identifier of the feature group version at which the Document Retriever version is created.

error

str

The error message when it failed to create the document retriever version.

numberOfChunks

int

The number of chunks for the document retriever.

embeddingFileSize

int

The size of embedding file for the document retriever.

warnings

list

The warning messages when creating the document retriever.

resolvedConfig

VectorStoreConfig

The resolved configurations, such as default settings, for indexing documents.

KEY	TYPE	Description
chunkSize	int	The size of text chunks in the vector store.
standaloneDeployment	bool	If True, the document retriever will be deployed as a standalone deployment.
chunkOverlapFraction	float	The fraction of overlap between chunks.
summaryInstructions	str	Instructions for the LLM to generate the document summary.
scoreMultiplierColumn	str	If provided, will use the values in this metadata column to modify the relevance score of returned chunks for all queries.
useDocumentSummary	bool	If True, uses the summary of the document in addition to chunks of the document for indexing and querying.
chunkSizeFactors	list	Chunking data with multiple sizes. The specified list of factors are used to calculate more sizes, in addition to `chunk_size`.
pruneVectors	bool	Transform vectors using SVD so that the average component of vectors in the corpus are removed.
indexMetadataColumns	bool	If True, metadata columns of the FG will also be used for indexing and querying.
textEncoder	VectorStoreTextEncoder	Encoder used to index texts from the documents.

documentRetrieverConfig

VectorStoreConfig

The config used to create the document retriever version.

KEY	TYPE	Description
chunkSize	int	The size of text chunks in the vector store.
standaloneDeployment	bool	If True, the document retriever will be deployed as a standalone deployment.
chunkOverlapFraction	float	The fraction of overlap between chunks.
summaryInstructions	str	Instructions for the LLM to generate the document summary.
scoreMultiplierColumn	str	If provided, will use the values in this metadata column to modify the relevance score of returned chunks for all queries.
useDocumentSummary	bool	If True, uses the summary of the document in addition to chunks of the document for indexing and querying.
chunkSizeFactors	list	Chunking data with multiple sizes. The specified list of factors are used to calculate more sizes, in addition to `chunk_size`.
pruneVectors	bool	Transform vectors using SVD so that the average component of vectors in the corpus are removed.
indexMetadataColumns	bool	If True, metadata columns of the FG will also be used for indexing and querying.
textEncoder	VectorStoreTextEncoder	Encoder used to index texts from the documents.

documentRetrieverConfig

VectorStoreConfig

The config for vector store creation.

KEY	TYPE	Description
chunkSize	int	The size of text chunks in the vector store.
standaloneDeployment	bool	If True, the document retriever will be deployed as a standalone deployment.
chunkOverlapFraction	float	The fraction of overlap between chunks.
summaryInstructions	str	Instructions for the LLM to generate the document summary.
scoreMultiplierColumn	str	If provided, will use the values in this metadata column to modify the relevance score of returned chunks for all queries.
useDocumentSummary	bool	If True, uses the summary of the document in addition to chunks of the document for indexing and querying.
chunkSizeFactors	list	Chunking data with multiple sizes. The specified list of factors are used to calculate more sizes, in addition to `chunk_size`.
pruneVectors	bool	Transform vectors using SVD so that the average component of vectors in the corpus are removed.
indexMetadataColumns	bool	If True, metadata columns of the FG will also be used for indexing and querying.
textEncoder	VectorStoreTextEncoder	Encoder used to index texts from the documents.

Exceptions:

TYPE	WHEN
DataNotFoundError	`projectId` is not found.
DataNotFoundError	`featureGroupId` is not found.

Language: