The transform
endpoint allows you to extract structured data from any PDF document by defining what information you need using a simple schema.
Instead of building complex PDF parsing logic, you can define your data extraction needs once and process thousands of documents consistently. The API handles all the complex extraction work, delivering clean, structured JSON data ready for your application.
Create a YAML-like schema that describes what data you want to extract from your PDF.
Send your PDF file along with the schema to the transform endpoint.
Receive a structured JSON response that matches your schema definition.
Use the extracted data in your application and process thousands of documents with the same schema.
Key | Value |
---|---|
Content-Type | multipart/form-data |
x-rapidapi-host | pdf-toolkit-apis.p.rapidapi.com |
x-rapidapi-key | YOUR_RAPIDAPI_KEY |
Parameter | Type | Description | Constraints |
---|---|---|---|
file | File | The PDF file to extract data from | Size: 0-10240 KB |
schema | String | The schema definition in YAML-like format | Required |
start_page | Integer | Page to start extraction from | Min: 0, Optional |
end_page | Integer | Page to end extraction at | Min: 0, Optional |
language | String | The language for extraction | Values: 'en' or 'es', Default: 'en', Optional |
The schema defines what data to extract from your PDF document. It uses a simple YAML-like format where each line represents a field to extract, and indentation indicates the hierarchy of nested fields.
- root_field: Description of the root field
- nested_field_1: Description of nested field 1
- nested_field_2: Description of nested field 2
- deeply_nested_field: Description of deeply nested field
- another_root_field: Description of another root field
For lists or arrays of items, you can define the structure once, and it will be applied to all matching items in the document.
Here's how to use the transform
endpoint to extract data from an invoice PDF:
curl --request POST \
--url https://pdf-toolkit-apis.p.rapidapi.com/transform \
--header 'Content-Type: multipart/form-data' \
--header 'x-rapidapi-host: pdf-toolkit-apis.p.rapidapi.com' \
--header 'x-rapidapi-key: YOUR_RAPIDAPI_KEY' \
--form 'file=@/path/to/invoice.pdf' \
--form 'schema=- invoice:
- number: Invoice number or ID
- date: Invoice date
- items: List of invoice line items
- description: Item description
- quantity: Quantity purchased
- price: Price per unit
- amount: Total amount
- subtotal: Pre-tax amount
- tax: Tax amount
- total: Invoice total amount'
{
"invoice": {
"number": "INV-2023-456",
"date": "2023-10-15",
"items": [
{
"description": "Software License",
"quantity": 2,
"price": 750.00,
"amount": 1500.00
},
{
"description": "Support Hours",
"quantity": 5,
"price": 100.00,
"amount": 500.00
}
],
"subtotal": 2000.00,
"tax": 180.00,
"total": 2180.00
}
}
Extract data from invoices, receipts, and purchase orders to automate your accounts payable workflow.
- invoice:
- number: Invoice number or ID
- date: Invoice date
- due_date: Payment due date
- items: List of invoice line items
- description: Item description
- quantity: Quantity purchased
- unit_price: Price per unit
- amount: Total item amount
- subtotal: Pre-tax amount
- tax: Tax amount
- total: Invoice total amount
- vendor:
- name: Vendor name
- address: Vendor address
- contact: Contact information
Parse contracts and legal documents to extract key information for analysis and comparison.
- contract:
- title: Contract title or name
- effective_date: When the contract begins
- termination_date: When the contract ends
- parties:
- party_name: Name of each party
- party_type: Type (client, provider, etc.)
- key_provisions:
- provision_type: Type of provision
- provision_text: Text of provision
Transform charts, tables, and research findings into structured data for analysis.
- research:
- title: Research paper title
- authors: List of authors
- data_points:
- chart_title: Title of chart or table
- labels: X-axis or row labels
- values: Data values
- units: Units of measurement
- findings: Key research findings