Method
getRelevantSnippets POST
Copy POST

Retrieves snippets relevant to a given query from specified documents. This function supports flexible input options,

Arguments:

REQUIRED KEY TYPE DESCRIPTION
No docIds List[str] A list of document store IDs to retrieve the snippets from.
No blobs dict A dictionary mapping document names to the blob data.
No query str Query string to find relevant snippets in the documents.
No documentRetrieverConfig DocumentRetrieverConfig If provided, used to configure the retrieval steps like chunking for embeddings.
KEY TYPE Description
indexMetadataColumns bool If True, metadata columns of the FG will also be used for indexing and querying.
chunkSize int The size of text chunks in the vector store.
standaloneDeployment bool If True, the document retriever will be deployed as a standalone deployment.
scoreMultiplierColumn str If provided, will use the values in this metadata column to modify the relevance score of returned chunks for all queries.
chunkOverlapFraction float The fraction of overlap between chunks.
pruneVectors bool Transform vectors using SVD so that the average component of vectors in the corpus are removed.
chunkSizeFactors list Chunking data with multiple sizes. The specified list of factors are used to calculate more sizes, in addition to `chunk_size`.
textEncoder VectorStoreTextEncoder Encoder used to index texts from the documents.
summaryInstructions str Instructions for the LLM to generate the document summary.
useDocumentSummary bool If True, uses the summary of the document in addition to chunks of the document for indexing and querying.
No honorSentenceBoundary bool If provided, will honor sentence boundary when returning the snippets.
No numRetrievalMarginWords int If provided, will add this number of words from left and right of the returned snippets.
No maxWordsPerSnippet int If provided, will limit the number of words in each snippet to the value specified.
No maxSnippetsPerDocument int If provided, will limit the number of snippets retrieved from each document to the value specified.
No startWordIndex int If provided, will start the snippet at the index (of words in the document) specified.
No endWordIndex int If provided, will end the snippet at the index of (of words in the document) specified.
No includingBoundingBoxes bool If true, will include the bounding boxes of the snippets if they are available.
No text str Plain text from which to retrieve snippets.
No documentProcessingConfig DocumentProcessingConfig The document processing configuration used to extract text when doc_ids or blobs are provided. If provided, this will override including_bounding_boxes parameter.
KEY TYPE Description
removeWatermarks bool Whether to remove watermarks. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
convertToMarkdown bool Whether to convert extracted text to markdown. Defaults to False. This option only takes effect when extract_bounding_boxes is True.
extractBoundingBoxes bool Whether to perform OCR and extract bounding boxes. If False, no OCR will be done but only the embedded text from digital documents will be extracted. Defaults to False.
ocrMode OcrMode OCR mode. There are different OCR modes available for different kinds of documents and use cases. This option only takes effect when extract_bounding_boxes is True.
removeHeaderFooter bool Whether to remove headers and footers. Defaults to False. This option only takes effect when extract_bounding_boxes is True.
documentType DocumentType Type of document. Can be one of Text, Tables and Forms, Embedded Images, etc. If not specified, type will be decided automatically.
extractImages bool Whether to extract images from the document e.g. diagrams in a PDF page. Defaults to False.
highlightRelevantText bool Whether to extract bounding boxes and highlight relevant text in search results. Defaults to False.
useFullOcr bool Whether to perform full OCR. If True, OCR will be performed on the full page. If False, OCR will be performed on the non-text regions only. By default, it will be decided automatically based on the OCR mode and the document type. This option only takes effect when extract_bounding_boxes is True.
maskPii bool Whether to mask personally identifiable information (PII) in the document text/tokens. Defaults to False.
Note: The arguments for the API methods follow camelCase but for Python SDK underscore_case is followed.

Response:

KEY TYPE DESCRIPTION
success Boolean true if the call succeeded, false if there was an error
result list[DocumentRetrieverLookupResult]

Exceptions:

TYPE WHEN
DataNotFoundError

docIds is not found.

Language: