DataGenerationConfig

Generate synthetic data using a model for finetuning an LLM.

KEY TYPE Description
examplesPerTarget int Number of examples per target.
fewshotExamples int Number of fewshot examples used to prompt the model.
generationInstructions str Instructions for the data generation model.
seed Optional[int] Seed for random number generation.
subsetSize Optional[int] Size of the subset to use for generation.
promptCol str Name of the input prompt column.
temperature float Sampling temperature for the model.
documentationCharLimit int Character limit for documentation.
oversample bool Whether to oversample the data.
frequencyPenalty float Penalty for frequency of token appearance.
concurrency int Number of concurrent processes.
model str Model to use for data generation.
verifyResponse bool Whether to verify the response.
completionCol str Name of the output completion column.
tokenBudget int Token budget for generation.
descriptionCol str Name of the description column.
idCol str Name of the identifier column.