Synthetic Data Generator

Name: OpenModels
Creator: OpenModels
License: https://github.com/openmodelsrun/openmodels

intermediatedataMin 16K context

Generates realistic synthetic datasets that preserve the statistical properties and relationships of source data without exposing real records. Covers schema-aware generation, correlated and time-series fields, class balancing for ML training, and constraint preservation, with code for tools like SDV, Faker, or custom generators.

synthetic-data data-generation machine-learning privacy sampling simulation

Use Cases

Generating synthetic training data that mirrors real distributions
Balancing minority classes for ML models
Producing correlated multi-column and time-series data
Preserving referential integrity across tables
Creating privacy-safe datasets for sharing

Example Prompt

Generate synthetic data for an e-commerce schema for ML experimentation.

Tables:
- customers(id, age, region, signup_date)
- orders(id, customer_id, order_date, amount, category)

Requirements:
- Preserve realistic age/amount distributions and customer-order relationships
- 10,000 customers, ~3 orders each
- No real PII

Provide generation code and a short fidelity check.

Recommended Models

gpt-5 claude-opus-4-6 deepseek-v4

Compatible Tools

claude-codecursorkiroany

Modalities

Input: text, code

→

Output: code, text

Author

OpenModels Community

@openmodelsrun