Transform

Extract structured data from PDFs using schema definitions

Get API Key

Introduction

The transform endpoint allows you to extract structured data from any PDF document by defining what information you need using a simple schema.

Instead of building complex PDF parsing logic, you can define your data extraction needs once and process thousands of documents consistently. The API handles all the complex extraction work, delivering clean, structured JSON data ready for your application.

How It Works

1

Define Your Schema

Create a YAML-like schema that describes what data you want to extract from your PDF.

2

Upload Your PDF

Send your PDF file along with the schema to the transform endpoint.

3

Get Structured Data

Receive a structured JSON response that matches your schema definition.

4

Integrate & Scale

Use the extracted data in your application and process thousands of documents with the same schema.

Endpoint Details

POST https://pdf-toolkit-apis.p.rapidapi.com/transform

Headers

Key Value
Content-Type multipart/form-data
x-rapidapi-host pdf-toolkit-apis.p.rapidapi.com
x-rapidapi-key YOUR_RAPIDAPI_KEY

Request Body

Parameter Type Description Constraints
file File The PDF file to extract data from Size: 0-10240 KB
schema String The schema definition in YAML-like format Required
start_page Integer Page to start extraction from Min: 0, Optional
end_page Integer Page to end extraction at Min: 0, Optional
language String The language for extraction Values: 'en' or 'es', Default: 'en', Optional

Schema Definition

The schema defines what data to extract from your PDF document. It uses a simple YAML-like format where each line represents a field to extract, and indentation indicates the hierarchy of nested fields.

Schema Format
- root_field: Description of the root field
  - nested_field_1: Description of nested field 1
  - nested_field_2: Description of nested field 2
    - deeply_nested_field: Description of deeply nested field
- another_root_field: Description of another root field

For lists or arrays of items, you can define the structure once, and it will be applied to all matching items in the document.

Example Usage

Here's how to use the transform endpoint to extract data from an invoice PDF:

cURL Example
curl --request POST \
  --url https://pdf-toolkit-apis.p.rapidapi.com/transform \
  --header 'Content-Type: multipart/form-data' \
  --header 'x-rapidapi-host: pdf-toolkit-apis.p.rapidapi.com' \
  --header 'x-rapidapi-key: YOUR_RAPIDAPI_KEY' \
  --form 'file=@/path/to/invoice.pdf' \
  --form 'schema=- invoice:
    - number: Invoice number or ID
    - date: Invoice date
    - items: List of invoice line items
      - description: Item description
      - quantity: Quantity purchased
      - price: Price per unit
      - amount: Total amount
    - subtotal: Pre-tax amount
    - tax: Tax amount
    - total: Invoice total amount'
Response (JSON)
{
  "invoice": {
    "number": "INV-2023-456",
    "date": "2023-10-15",
    "items": [
      {
        "description": "Software License",
        "quantity": 2,
        "price": 750.00,
        "amount": 1500.00
      },
      {
        "description": "Support Hours",
        "quantity": 5,
        "price": 100.00,
        "amount": 500.00
      }
    ],
    "subtotal": 2000.00,
    "tax": 180.00,
    "total": 2180.00
  }
}

Use Cases

Financial Documents

Extract data from invoices, receipts, and purchase orders to automate your accounts payable workflow.

  • Invoice numbers, dates, and payment terms
  • Line items with descriptions, quantities, prices, and amounts
  • Subtotals, taxes, and total amounts
  • Vendor and customer information
Invoice Schema Example
- invoice:
    - number: Invoice number or ID
    - date: Invoice date
    - due_date: Payment due date
    - items: List of invoice line items
      - description: Item description
      - quantity: Quantity purchased
      - unit_price: Price per unit
      - amount: Total item amount
    - subtotal: Pre-tax amount
    - tax: Tax amount
    - total: Invoice total amount
    - vendor:
      - name: Vendor name
      - address: Vendor address
      - contact: Contact information

Legal Documents

Parse contracts and legal documents to extract key information for analysis and comparison.

  • Contract parties and signatories
  • Key clauses and provisions
  • Effective dates and term lengths
  • Obligations and requirements
Contract Schema Example
- contract:
    - title: Contract title or name
    - effective_date: When the contract begins
    - termination_date: When the contract ends
    - parties:
      - party_name: Name of each party
      - party_type: Type (client, provider, etc.)
    - key_provisions:
      - provision_type: Type of provision
      - provision_text: Text of provision

Research & Data Science

Transform charts, tables, and research findings into structured data for analysis.

  • Data from tables and charts
  • Statistical findings and results
  • Methodology descriptions
  • Research conclusions
Research Schema Example
- research:
    - title: Research paper title
    - authors: List of authors
    - data_points:
      - chart_title: Title of chart or table
      - labels: X-axis or row labels
      - values: Data values
      - units: Units of measurement
    - findings: Key research findings