KEY |
TYPE |
Description |
docId |
str |
Unique Docstore string identifier for the document. |
page |
int |
The page number. Starts from 0. |
height |
int |
The height of the page in pixels. |
width |
int |
The width of the page in pixels. |
pageCount |
int |
The total number of pages in document. |
pageText |
str |
The text extracted from the page. |
pageTokenStartOffset |
int |
The offset of the first token in the page. |
tokenCount |
int |
The number of tokens in the page. |
tokens |
list |
The tokens in the page. |
extractedText |
str |
The extracted text in the page obtained from OCR. |
rotationAngle |
float |
The detected rotation angle of the page in degrees. Positive values indicate clockwise and negative values indicate anti-clockwise rotation from the original orientation. |
pageMarkdown |
str |
The markdown text for the page. |
embeddedText |
str |
The embedded text in the page. Only available for digital documents. |