Data extracted from a docstore page.
KEY | TYPE | Description |
---|---|---|
docId | str | Unique Docstore string identifier for the document. |
page | int | The page number. Starts from 0. |
height | int | The height of the page in pixels. |
width | int | The width of the page in pixels. |
pageCount | int | The total number of pages in document. |
pageText | str | The text extracted from the page. |
pageTokenStartOffset | int | The offset of the first token in the page. |
tokenCount | int | The number of tokens in the page. |
tokens | list | The tokens in the page. |
extractedText | str | The extracted text in the page obtained from OCR. |
rotationAngle | float | The detected rotation angle of the page in degrees. Positive values indicate clockwise and negative values indicate anti-clockwise rotation from the original orientation. |
pageMarkdown | str | The markdown text for the page. |
embeddedText | str | The embedded text in the page. Only available for digital documents. |