PDF Data Extraction

Name: OpenModels
Creator: OpenModels
License: https://github.com/openmodelsrun/openmodels

IntermediatedataMinimum 32K context

Extracts structured data from PDFs and scanned documents — invoices, receipts, forms, contracts, reports, and tables. Returns clean, typed output (JSON, CSV, or Markdown tables), handles multi-page layouts and nested tables, and flags low-confidence fields for review. Uses vision-capable models for image-based and scanned PDFs.

pdf extraction documents ocr tables structured-data forms

Use cases

Extracting line items and totals from invoices and receipts
Converting PDF tables into clean CSV or JSON
Pulling structured fields from forms and applications
Digitizing scanned documents into machine-readable data
Extracting key terms and clauses from contracts

Example prompt

Extract structured data from the attached invoice PDF. Return JSON with:
- vendor name, invoice number, issue date, due date
- line items (description, quantity, unit price, amount)
- subtotal, tax, and total
- currency

Flag any field you are less than 90% confident about under a "needs_review" key.

Recommended models

claude-opus-4-8 gpt-5-5 gemini-3-1-pro gemini-3-flash

Compatible tools

claude-codekiroany

Modalities

Input: file, image, text

→

Output: text, file

Author

OpenModels Community

@openmodelsrun