Synthetic Data Generator
intermediatedataMin 16K context
Generates realistic synthetic datasets that preserve the statistical properties and relationships of source data without exposing real records. Covers schema-aware generation, correlated and time-series fields, class balancing for ML training, and constraint preservation, with code for tools like SDV, Faker, or custom generators.
Use Cases
- Generating synthetic training data that mirrors real distributions
- Balancing minority classes for ML models
- Producing correlated multi-column and time-series data
- Preserving referential integrity across tables
- Creating privacy-safe datasets for sharing
Example Prompt
Generate synthetic data for an e-commerce schema for ML experimentation. Tables: - customers(id, age, region, signup_date) - orders(id, customer_id, order_date, amount, category) Requirements: - Preserve realistic age/amount distributions and customer-order relationships - 10,000 customers, ~3 orders each - No real PII Provide generation code and a short fidelity check.
Recommended Models
Compatible Tools
claude-codecursorkiroany
Modalities
Input: text, code
→Output: code, text
Related Skills
Author
OpenModels Community