DataGenerationConfig

Generate synthetic data using a model for finetuning an LLM.

KEY TYPE Description
model str Model to use for data generation.
temperature float Sampling temperature for the model.
completionCol str Name of the output completion column.
descriptionCol str Name of the description column.
promptCol str Name of the input prompt column.
fewshotExamples int Number of fewshot examples used to prompt the model.
oversample bool Whether to oversample the data.
concurrency int Number of concurrent processes.
frequencyPenalty float Penalty for frequency of token appearance.
seed Optional[int] Seed for random number generation.
tokenBudget int Token budget for generation.
examplesPerTarget int Number of examples per target.
idCol str Name of the identifier column.
generationInstructions str Instructions for the data generation model.
documentationCharLimit int Character limit for documentation.
verifyResponse bool Whether to verify the response.
subsetSize Optional[int] Size of the subset to use for generation.