Generate synthetic data using a model for finetuning an LLM.
KEY | TYPE | Description |
---|---|---|
subsetSize | Optional[int] | Size of the subset to use for generation. |
fewshotExamples | int | Number of fewshot examples used to prompt the model. |
model | str | Model to use for data generation. |
promptCol | str | Name of the input prompt column. |
concurrency | int | Number of concurrent processes. |
oversample | bool | Whether to oversample the data. |
temperature | float | Sampling temperature for the model. |
tokenBudget | int | Token budget for generation. |
verifyResponse | bool | Whether to verify the response. |
documentationCharLimit | int | Character limit for documentation. |
frequencyPenalty | float | Penalty for frequency of token appearance. |
descriptionCol | str | Name of the description column. |
seed | Optional[int] | Seed for random number generation. |
idCol | str | Name of the identifier column. |
generationInstructions | str | Instructions for the data generation model. |
completionCol | str | Name of the output completion column. |
examplesPerTarget | int | Number of examples per target. |