DataGenerationConfig

Generate synthetic data using a model for finetuning an LLM.

KEY TYPE Description
temperature float Sampling temperature for the model.
fewshotExamples int Number of fewshot examples used to prompt the model.
concurrency int Number of concurrent processes.
subsetSize Optional[int] Size of the subset to use for generation.
promptCol str Name of the input prompt column.
model str Model to use for data generation.
frequencyPenalty float Penalty for frequency of token appearance.
tokenBudget int Token budget for generation.
completionCol str Name of the output completion column.
descriptionCol str Name of the description column.
oversample bool Whether to oversample the data.
documentationCharLimit int Character limit for documentation.
examplesPerTarget int Number of examples per target.
verifyResponse bool Whether to verify the response.
idCol str Name of the identifier column.
seed Optional[int] Seed for random number generation.
generationInstructions str Instructions for the data generation model.