PDF Data Extraction

intermediatedataMin 32K context

Extracts structured data from PDFs and scanned documents — invoices, receipts, forms, contracts, reports, and tables. Returns clean, typed output (JSON, CSV, or Markdown tables), handles multi-page layouts and nested tables, and flags low-confidence fields for review. Uses vision-capable models for image-based and scanned PDFs.

Use Cases

  • Extracting line items and totals from invoices and receipts
  • Converting PDF tables into clean CSV or JSON
  • Pulling structured fields from forms and applications
  • Digitizing scanned documents into machine-readable data
  • Extracting key terms and clauses from contracts

Example Prompt

Extract structured data from the attached invoice PDF. Return JSON with:
- vendor name, invoice number, issue date, due date
- line items (description, quantity, unit price, amount)
- subtotal, tax, and total
- currency

Flag any field you are less than 90% confident about under a "needs_review" key.

Recommended Models

Compatible Tools

claude-codekiroany

Modalities

Input: file, image, text
Output: text, file

Related Skills

Author

OpenModels Community

@openmodelsrun
PDF Data Extraction — AI Agent Skill | OpenModels