Platform

Platform Quickstart

The thepi.pe platform provides a user-friendly interface for scraping and extracting data from various sources. This guide will walk you through the main features of the platform.

Scraping

Scraping Interface

The scraping interface allows you to extract data from websites, PDFs, and other sources.

  1. Upload files or enter URLs in the designated area.
  2. Choose your scraping options:
    • Text Only: Extract only text content
    • AI Extraction: Use AI to analyze layout and extract structured content
  3. Select a chunking method:
    • By Document
    • By Page
    • By Section
    • Semantic
  4. Click "Scrape" to start the process.

The scraped data will appear in the table on the right. You can view the full API response by clicking "View API Response".

Structured Extraction

Extraction Interface

The extraction interface helps you extract structured data from your scraped content.

  1. Upload files or enter URLs as in the scraping interface.
  2. Define your schema:
    • Add fields and specify their types (string, int, float, bool)
    • View the JSON schema by clicking "View Schema"
  3. Configure advanced options:
    • Select an AI model to use for extraction
    • Choose a chunking method. Each chunk will be passed through the LLM as a prompt.
    • Enable/disable Text Only and AI Extraction
    • Allow multiple extractions per chunk if you expect to extract multiple items from each chunk
  4. Click "Extract" to start the process.

Advanced Options

The extracted data will appear in the table on the right. You can download the results as a CSV file.

CSV

Job History

Job History

The job history section in the dashboard shows your recent scraping and extraction jobs. For each job, you can see:

  • Endpoint used (scrape or extract)
  • Source (file or URL)
  • Date and time
  • Tokens used
  • Status code
  • Any errors encountered

API Integration

While the platform provides a user-friendly interface, you can also integrate thepi.pe directly into your applications using our API. Refer to the API documentation for details on how to make requests programmatically.