logo

thepi.pe

Beta
GitHub Stars

Connect multimodal LLMs to
real-world data.

Extract clean markdown or structured data from PDFs, webpages, videos, and more. Use your favorite vision-language models like GPT-4o. All open source.

Get clean data, fast.

Deploy an AI-native ETL pipeline in minutes, powered by state-of-the-art vision-language models.

Scrape complex visuals.
Extract markdown, data tables, figures, and equations from complex documents and webpages. Our models are trained on a diverse range of tricky layouts and data sources.
Get LLM-ready prompts.
Get cleanly scraped data as chunked prompts ready for any multimodal LLM, vector database, or RAG framework.
Extract structured data.
Use our cheap and accurate vision-language models to deploy an intelligent structured data extraction pipeline in 10 minutes, with JSON, CSV, or SQL output.
Secure by design.
Easily deploy on-prem with local models. We take privacy seriously and never store your data.
Logo

API Pricing

We offer a hosted API and cloud platform with pricing based on tokens scraped or extracted. One token is roughly one word.

Self-hosted

Free

Instantly extract clean data from documents and webpages by hosting your own API.
✔️ Unlimited tokens
✔️ Host your own API
❌ Platform access
❌ Official support

Hobby

$25

Instantly extract clean data from documents and webpages with our hosted API.
✔️ 1M tokens
✔️ API access
✔️ Platform access
✔️ Files up to 20 MB
✔️ Email support

Scale

$220

Instantly extract clean data from documents and webpages with our hosted API.
✔️ 10M tokens
✔️ API access
✔️ Platform access
✔️ Files up to 20 MB
✔️ Email support

Enterprise

Contact us

Book a chat to discuss custom projects, integrations, on-prem support, and pricing.
✔️ Unlimited tokens
✔️ No filesize limits
✔️ On-premise support
Machine learning has a large environmental impact, so we contribute 2% of our revenue to CO₂ removal. Ask how our offset program can help you meet your sustainability goals.

Frequently Asked Questions