DataGenerationConfig

Generate synthetic data using a model for finetuning an LLM.

KEY TYPE Description
completionCol str Name of the output completion column.
documentationCharLimit int Character limit for documentation.
idCol str Name of the identifier column.
seed Optional[int] Seed for random number generation.
descriptionCol str Name of the description column.
tokenBudget int Token budget for generation.
verifyResponse bool Whether to verify the response.
fewshotExamples int Number of fewshot examples used to prompt the model.
subsetSize Optional[int] Size of the subset to use for generation.
model str Model to use for data generation.
generationInstructions str Instructions for the data generation model.
oversample bool Whether to oversample the data.
promptCol str Name of the input prompt column.
examplesPerTarget int Number of examples per target.
concurrency int Number of concurrent processes.
frequencyPenalty float Penalty for frequency of token appearance.
temperature float Sampling temperature for the model.