DataGenerationConfig

Generate synthetic data using a model for finetuning an LLM.

KEY TYPE Description
subsetSize Optional[int] Size of the subset to use for generation.
fewshotExamples int Number of fewshot examples used to prompt the model.
model str Model to use for data generation.
promptCol str Name of the input prompt column.
concurrency int Number of concurrent processes.
oversample bool Whether to oversample the data.
temperature float Sampling temperature for the model.
tokenBudget int Token budget for generation.
verifyResponse bool Whether to verify the response.
documentationCharLimit int Character limit for documentation.
frequencyPenalty float Penalty for frequency of token appearance.
descriptionCol str Name of the description column.
seed Optional[int] Seed for random number generation.
idCol str Name of the identifier column.
generationInstructions str Instructions for the data generation model.
completionCol str Name of the output completion column.
examplesPerTarget int Number of examples per target.