PDF Data Extraction
intermediatedataMin 32K context
Extracts structured data from PDFs and scanned documents — invoices, receipts, forms, contracts, reports, and tables. Returns clean, typed output (JSON, CSV, or Markdown tables), handles multi-page layouts and nested tables, and flags low-confidence fields for review. Uses vision-capable models for image-based and scanned PDFs.
Use Cases
- Extracting line items and totals from invoices and receipts
- Converting PDF tables into clean CSV or JSON
- Pulling structured fields from forms and applications
- Digitizing scanned documents into machine-readable data
- Extracting key terms and clauses from contracts
Example Prompt
Extract structured data from the attached invoice PDF. Return JSON with: - vendor name, invoice number, issue date, due date - line items (description, quantity, unit price, amount) - subtotal, tax, and total - currency Flag any field you are less than 90% confident about under a "needs_review" key.
Recommended Models
Compatible Tools
claude-codekiroany
Modalities
Input: file, image, text
→Output: text, file
Related Skills
Author
OpenModels Community