DataGenerationConfig

Generate synthetic data using a model for finetuning an LLM.

KEY TYPE Description
verifyResponse bool Whether to verify the response.
completionCol str Name of the output completion column.
temperature float Sampling temperature for the model.
model str Model to use for data generation.
generationInstructions str Instructions for the data generation model.
frequencyPenalty float Penalty for frequency of token appearance.
seed Optional[int] Seed for random number generation.
oversample bool Whether to oversample the data.
examplesPerTarget int Number of examples per target.
concurrency int Number of concurrent processes.
promptCol str Name of the input prompt column.
subsetSize Optional[int] Size of the subset to use for generation.
documentationCharLimit int Character limit for documentation.
tokenBudget int Token budget for generation.
fewshotExamples int Number of fewshot examples used to prompt the model.
idCol str Name of the identifier column.
descriptionCol str Name of the description column.