PageData

Data extracted from a docstore page.

KEY TYPE Description
docId str Unique Docstore string identifier for the document.
page int The page number. Starts from 0.
height int The height of the page in pixels.
width int The width of the page in pixels.
pageCount int The total number of pages in document.
pageText str The text extracted from the page.
pageTokenStartOffset int The offset of the first token in the page.
tokenCount int The number of tokens in the page.
tokens list The tokens in the page.
extractedText str The extracted text in the page obtained from OCR.
rotationAngle float The detected rotation angle of the page in degrees. Positive values indicate clockwise and negative values indicate anti-clockwise rotation from the original orientation.
pageMarkdown str The markdown text for the page.
embeddedText str The embedded text in the page. Only available for digital documents.