Data Pipeline Builder

advanceddataMin 64K context

Designs and generates data pipeline configurations for ETL/ELT workflows. Supports Apache Airflow DAGs, dbt models, Spark jobs, and streaming pipelines with Kafka or Flink. Creates data quality checks, schema evolution strategies, and monitoring dashboards for pipeline health.

Use Cases

  • Generating Airflow DAGs for complex ETL workflows
  • Creating dbt models with proper staging, intermediate, and mart layers
  • Designing Kafka streaming pipelines with schema registry
  • Building data quality validation rules with Great Expectations
  • Setting up incremental data loading patterns

Example Prompt

Design a data pipeline to ingest e-commerce order data.

Source: PostgreSQL (orders, order_items, customers, products tables)
Destination: Snowflake data warehouse
Schedule: Every 15 minutes (near real-time)

Requirements:
1. Incremental extraction using CDC (Change Data Capture)
2. dbt transformation layer with:
   - Staging models (1:1 source mapping)
   - Intermediate models (joins, deduplication)
   - Mart models (fact_orders, dim_customers, dim_products)
3. Data quality checks after each layer
4. Schema evolution handling (new columns, type changes)
5. Alerting on pipeline failures or quality issues

Generate:
- Airflow DAG for orchestration
- dbt models (SQL + schema.yml)
- Data quality assertions
- Monitoring dashboard queries

Recommended Models

Compatible Tools

claude-codecursorkiroany

Modalities

Input: text, code
Output: code, text

Related Skills

Author

OpenModels Community

@openmodelsrun
Data Pipeline Builder — AI Agent Skill | OpenModels