Generate synthetic data using a model for finetuning an LLM.
| KEY | TYPE | Description |
|---|---|---|
| idCol | str | Name of the identifier column. |
| completionCol | str | Name of the output completion column. |
| frequencyPenalty | float | Penalty for frequency of token appearance. |
| tokenBudget | int | Token budget for generation. |
| generationInstructions | str | Instructions for the data generation model. |
| descriptionCol | str | Name of the description column. |
| promptCol | str | Name of the input prompt column. |
| fewshotExamples | int | Number of fewshot examples used to prompt the model. |
| examplesPerTarget | int | Number of examples per target. |
| verifyResponse | bool | Whether to verify the response. |
| concurrency | int | Number of concurrent processes. |
| documentationCharLimit | int | Character limit for documentation. |
| seed | Optional[int] | Seed for random number generation. |
| subsetSize | Optional[int] | Size of the subset to use for generation. |
| model | str | Model to use for data generation. |
| temperature | float | Sampling temperature for the model. |
| oversample | bool | Whether to oversample the data. |