Platform Quickstart

⚠️ Note: The cloud-based API is currently being deprecated in favour of local installation. Please upgrade to the latest version of thepipe-api and follow the instructions in the README (opens in a new tab) to set up the local API. The cloud-based API will be removed in the next release.

The thepi.pe platform provides a user-friendly interface for scraping and extracting data from various sources. This guide will walk you through the main features of the platform.

Scraping

Scraping Interface

The scraping interface allows you to extract data from websites, PDFs, and other sources.

Upload files or enter URLs in the designated area.
Choose your scraping options:
- Text Only: Extract only text content
- AI Extraction: Use AI to analyze layout and extract structured content
Select a chunking method:
- By Document
- By Page
- By Section
- Semantic
Click "Scrape" to start the process.

The scraped data will appear in the table on the right. You can view the full API response by clicking "View API Response".

Structured Extraction

Extraction Interface

The extraction interface helps you extract structured data from your scraped content.

Upload files or enter URLs as in the scraping interface.
Define your schema:
- Add fields and specify their types (string, int, float, bool)
- View the JSON schema by clicking "View Schema"
Configure advanced options:
- Select an AI model to use for extraction
- Choose a chunking method. Each chunk will be passed through the LLM as a prompt.
- Enable/disable Text Only and AI Extraction
- Allow multiple extractions per chunk if you expect to extract multiple items from each chunk
Click "Extract" to start the process.

Advanced Options

The extracted data will appear in the table on the right. You can download the results as a CSV file.

CSV

Job History

The job history section in the dashboard shows your recent scraping and extraction jobs. For each job, you can see:

Endpoint used (scrape or extract)
Source (file or URL)
Date and time
Tokens used
Status code
Any errors encountered

API Integration

While the platform provides a user-friendly interface, you can also integrate thepi.pe directly into your applications using our API. Refer to the API documentation for details on how to make requests programmatically.

Setup