DataGenerationConfig

Generate synthetic data using a model for finetuning an LLM.

KEY TYPE Description
seed Optional[int] Seed for random number generation.
fewshotExamples int Number of fewshot examples used to prompt the model.
oversample bool Whether to oversample the data.
idCol str Name of the identifier column.
frequencyPenalty float Penalty for frequency of token appearance.
generationInstructions str Instructions for the data generation model.
temperature float Sampling temperature for the model.
subsetSize Optional[int] Size of the subset to use for generation.
concurrency int Number of concurrent processes.
examplesPerTarget int Number of examples per target.
promptCol str Name of the input prompt column.
descriptionCol str Name of the description column.
model str Model to use for data generation.
verifyResponse bool Whether to verify the response.
tokenBudget int Token budget for generation.
documentationCharLimit int Character limit for documentation.
completionCol str Name of the output completion column.