To train a model under this use case, you will need to create feature groups of the following type(s):
Feature Group Type | API Configuration Name | Required | Description |
---|---|---|---|
Labeled document data | DOCUMENTS | True | For detailed guidelines on the format of documents and annotations, please refer to the "Named Entity Recognition Guidelines" use case documentation. |
Note: Once you upload the datasets under each Feature Group Type that comply with their respective required schemas, you will need to create Machine learning (ML) features that would be used to train your ML model(s). We use the term, "Feature Group" for a group of ML features (dataset columns) under a specific Feature Group Type. Our system support extensible schemas that enables you to provide any number of additional columns/features that you think are relevant to that Feature Group Type.
For detailed guidelines on the format of documents and annotations, please refer to the "Named Entity Recognition Guidelines" use case documentation.
Feature Mapping | Feature Type | Required | Description |
---|---|---|---|
DOCUMENT | Y | Document text. Represents either the full content or tokenized segments of the document. For example: {content: "sample text 1"} or {tokens: [{content: "sample", start_offset: "00:00", end_offset: "00:07"}]} | |
ANNOTATIONS | N | Lists of labels for document text. Includes text-based annotations with start and end offsets, and supports bounding box annotations for precise labeling. For example: [{text_extraction: {text_segment: {end_offset: 6, start_offset: 0}}, display_name: "Sample annotation label 1"}] | |
DOCUMENT_ID | N | The unique identifier of the document | |
STATUS | N | The status of the review document | |
COMMENTS | N | Comments on the review document | |
METADATA | N | Metadata of the annotation | |
ROW_ID | N | The unique identifier of the feature group row. |