logo

thepi.pe

Beta
GitHub Stars

Extract clean data from tricky documents.

Extract clean markdown or structured data from PDFs, word docs, webpages, and more. Powered by vision-language models and all open source.

Get clean data, fast.

Deploy an AI-native ETL pipeline in minutes, powered by state-of-the-art vision-language models.

Scrape complex documents.
Extract markdown, data tables, figures, and equations from complex documents and webpages. Our models are trained on a diverse range of tricky layouts and data sources.
Extract structured data.
Deploy an AI document pipeline in 5 minutes. Get clean markdown and structured data using top vision-language models. Output to Markdown, JSON, CSV, or SQL.
Double-check every extraction.
Use agentic "double checking", with a human-in-the-loop (HITL) workflow for mission-critical data extraction and auditing.
Secure by design.
Easily set up your own on-prem air-gapped deployment with local AI models.
Logo
Trusted by
Delta InsuranceCal.comInfosysMongoDBVestone CapitalHouston ChronicleNYU
Delta InsuranceCal.comInfosysMongoDBVestone CapitalHouston ChronicleNYU
Delta InsuranceCal.comInfosysMongoDBVestone CapitalHouston ChronicleNYU
Delta InsuranceCal.comInfosysMongoDBVestone CapitalHouston ChronicleNYU
Delta InsuranceCal.comInfosysMongoDBVestone CapitalHouston ChronicleNYU
Delta InsuranceCal.comInfosysMongoDBVestone CapitalHouston ChronicleNYU

API Pricing

We offer a hosted API and cloud platform to scrape and extract data. One token is roughly one word.

Self-Hosted

Free

Deploy on your own infrastructure with full control and privacy.
✔️ Bring your own key
✔️ Open source
✔️ Community support

Hobby

$25/month

Instantly extract clean data from documents and webpages with our hosted platform.
✔️ 1M tokens/month
✔️ Platform access
✔️ Files up to 20 MB
✔️ Official support

Scale

$210/month

Instantly extract clean data from documents and webpages with our hosted platform.
✔️ 10M tokens/month
✔️ Platform access
✔️ Files up to 50 MB
✔️ Official support

Business

Contact us

Book a chat to discuss custom projects, integrations, on-prem support, and pricing.
✔️ Custom limits
✔️ Custom features
✔️ Unlimited upload
✔️ On-premise support
Machine learning has a large environmental impact, so we contribute 2% of our revenue to CO₂ removal. Ask how our offset program can help you meet your sustainability goals.
Cancel anytime. If you are not satisfied within the first 30 days, contact emmett@thepi.pe for a full refund.

Frequently Asked Questions