Extract clean markdown or structured data from PDFs, word docs, webpages, and more. Powered by vision-language models and all open source.
Get clean data, fast.
Deploy an AI-native ETL pipeline in minutes, powered by state-of-the-art vision-language models.
Scrape complex documents.
Extract markdown, data tables, figures, and equations from complex documents and webpages. Our models are trained on a diverse range of tricky layouts and data sources.
Extract structured data.
Deploy an AI document pipeline in 5 minutes. Get clean markdown and structured data using top vision-language models. Output to Markdown, JSON, CSV, or SQL.
Double-check every extraction.
Use agentic "double checking", with a human-in-the-loop (HITL) workflow for mission-critical data extraction and auditing.
Secure by design.
Easily set up your own on-prem air-gapped deployment with local AI models.
We offer a hosted API and cloud platform to scrape and extract data. One token is roughly one word.
Self-Hosted
Free
Deploy on your own infrastructure with full control and privacy.
✔️ Bring your own key
✔️ Open source
✔️ Community support
Hobby
$25/month
Instantly extract clean data from documents and webpages with our hosted platform.
✔️ 1M tokens/month
✔️ Platform access
✔️ Files up to 20 MB
✔️ Official support
Scale
$210/month
Instantly extract clean data from documents and webpages with our hosted platform.
✔️ 10M tokens/month
✔️ Platform access
✔️ Files up to 50 MB
✔️ Official support
Business
Contact us
Book a chat to discuss custom projects, integrations, on-prem support, and pricing.
✔️ Custom limits
✔️ Custom features
✔️ Unlimited upload
✔️ On-premise support
Machine learning has a large environmental impact, so we contribute 2% of our revenue to CO₂ removal. Ask how our offset program can help you meet your sustainability goals.
Cancel anytime. If you are not satisfied within the first 30 days, contact emmett@thepi.pe for a full refund.