logo

thepi.pe

Beta
GitHub Stars

Extract clean data from tricky documents.

Extract clean markdown or structured data from PDFs, word docs, webpages, and more. Powered by vision-language models like GPT-4o. All open source.

Get clean data, fast.

Deploy an AI-native ETL pipeline in minutes, powered by state-of-the-art vision-language models.

Scrape complex documents.
Extract markdown, data tables, figures, and equations from complex documents and webpages. Our models are trained on a diverse range of tricky layouts and data sources.
Extract structured data.
Deploy an AI document pipeline in 5 minutes. Get clean markdown and structured data using top vision-language models. Output to Markdown, JSON, CSV, or SQL.
Double-check every extraction.
Use agentic "double checking", with a human-in-the-loop (HITL) workflow for mission-critical data extraction and auditing.
Secure by design.
Easily deploy on-prem with local models. We take privacy seriously and never store your data.
Logo

API Pricing

We offer a hosted API and cloud platform with pricing based on tokens scraped or extracted. One token is roughly one word.

Starter

Free

Instantly extract clean data from documents and webpages with our hosted API.
✔️ 100K tokens
✔️ API access
✔️ Platform access
✔️ Files up to 20 MB
❌ Official support

Hobby

$15 one-time

Instantly extract clean data from documents and webpages with our hosted API.
✔️ 1M tokens
✔️ API access
✔️ Platform access
✔️ Files up to 20 MB
✔️ Official support

Scale

$130 one-time

Instantly extract clean data from documents and webpages with our hosted API.
✔️ 10M tokens
✔️ API access
✔️ Platform access
✔️ Files up to 50 MB
✔️ Official support

Enterprise

Contact us

Book a chat to discuss custom projects, integrations, on-prem support, and pricing.
✔️ Unlimited tokens
✔️ No filesize limits
✔️ On-premise support
We offer a 30-day money-back guarantee. If you are not satisfied, we will refund your payment.
Supscriptions can be cancelled at any time via the settings menu, or by contacting emmett@thepi.pe by email.
Machine learning has a large environmental impact, so we contribute 2% of our revenue to CO₂ removal. Ask how our offset program can help you meet your sustainability goals.

Frequently Asked Questions