DocumentData

Data extracted from a docstore document.

KEY TYPE Description
docId str Unique Docstore string identifier for the document.
mimeType str The mime type of the document.
pageCount int The number of pages for which the data is available. This is generally same as the total number of pages but may be less than the total number of pages in the document if processing is done only for selected pages.
totalPageCount int The total number of pages in the document.
extractedText str The extracted text in the document obtained from OCR.
embeddedText str The embedded text in the document. Only available for digital documents.
pages list List of embedded text for each page in the document. Only available for digital documents.
tokens list List of extracted tokens in the document obtained from OCR.
metadata list List of metadata for each page in the document.
pageMarkdown list The markdown text for the page.
extractedPageText list List of extracted text for each page in the document obtained from OCR. Available when return_extracted_page_text parameter is set to True in the document data retrieval API.
augmentedPageText list List of extracted text for each page in the document obtained from OCR augmented with embedded links in the document.