Synthetic Data Generator

intermediatedataMin 16K context

Generates realistic synthetic datasets that preserve the statistical properties and relationships of source data without exposing real records. Covers schema-aware generation, correlated and time-series fields, class balancing for ML training, and constraint preservation, with code for tools like SDV, Faker, or custom generators.

Use Cases

  • Generating synthetic training data that mirrors real distributions
  • Balancing minority classes for ML models
  • Producing correlated multi-column and time-series data
  • Preserving referential integrity across tables
  • Creating privacy-safe datasets for sharing

Example Prompt

Generate synthetic data for an e-commerce schema for ML experimentation.

Tables:
- customers(id, age, region, signup_date)
- orders(id, customer_id, order_date, amount, category)

Requirements:
- Preserve realistic age/amount distributions and customer-order relationships
- 10,000 customers, ~3 orders each
- No real PII

Provide generation code and a short fidelity check.

Recommended Models

Compatible Tools

claude-codecursorkiroany

Modalities

Input: text, code
Output: code, text

Related Skills

Author

OpenModels Community

@openmodelsrun