PDF Toolkit

Introduction

The extract endpoint allows you to extract structured data from any PDF document using JSON schema definitions.

Instead of building complex PDF parsing logic, you can define your data extraction needs using standard JSON schema format and process thousands of documents consistently. The API handles all the complex extraction work, delivering clean, structured JSON data ready for your application.

How It Works

1

Define Your Schema

Create a JSON schema that describes what data you want to extract from your PDF.

2

Upload Your PDF

Send your PDF file along with the schema to the extract endpoint.

3

Get Structured Data

Receive a structured JSON response that matches your schema definition.

4

Integrate & Scale

Use the extracted data in your application and process thousands of documents with the same schema.

Endpoint Details

POST https://pdf-toolkit-apis.p.rapidapi.com/extract

Headers

Key	Value
Content-Type	multipart/form-data
x-rapidapi-host	pdf-toolkit-apis.p.rapidapi.com
x-rapidapi-key	YOUR_RAPIDAPI_KEY
Authorization	Bearer YOUR_AUTH_TOKEN

Request Body

Parameter	Type	Description	Constraints
file	File	The PDF file to extract data from	Size: 0-10240 KB, Required
schema	String	The JSON schema definition	Valid JSON, Required
start_page	Integer	Page to start extraction from	Min: 0, Optional
end_page	Integer	Page to end extraction at	Min: 0, Optional
language	String	The language for extraction	Values: 'en' or 'es', Default: 'en', Optional

JSON Schema Definition

The JSON schema defines what data to extract from your PDF document. It uses the standard JSON Schema format (draft-07) to specify the structure of data you expect to receive.

Schema Format

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "property_name": {
      "type": "string | number | object | array",
      "description": "Optional description of the property"
    }
  },
  "required": ["property_name"]
}

You can define complex nested objects, arrays, and specify data types for each field to ensure the extracted data matches your requirements.

Example Usage

Here's how to use the extract endpoint to extract data from an invoice PDF:

cURL Example

curl --request POST \
  --url https://pdf-toolkit-apis.p.rapidapi.com/extract \
  --header 'Content-Type: multipart/form-data' \
  --header 'x-rapidapi-host: pdf-toolkit-apis.p.rapidapi.com' \
  --header 'x-rapidapi-key: YOUR_RAPIDAPI_KEY' \
  --header 'Authorization: Bearer YOUR_AUTH_TOKEN' \
  --form 'file=@/path/to/invoice.pdf' \
  --form 'schema={
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
      "invoice": {
        "type": "object",
        "properties": {
          "items": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "total": { "type": "number" },
                "name": { "type": "string" }
              },
              "required": ["total", "name"]
            }
          },
          "total": { "type": "number" }
        },
        "required": ["items", "total"]
      }
    },
    "required": ["invoice"]
  }'

Response (JSON)

{
  "invoice": {
    "items": [
      {
        "name": "Web Design Services",
        "total": 2500.00
      },
      {
        "name": "Hosting (Annual)",
        "total": 1200.00
      },
      {
        "name": "SEO Package",
        "total": 4200.00
      },
      {
        "name": "Content Creation",
        "total": 1800.00
      },
      {
        "name": "Arabic Ceramic Vase - Arabic Ceramic Vase",
        "total": 3200.00
      }
    ],
    "total": 12900.00
  }
}

Use Cases

Financial Documents

Extract data from invoices, receipts, and financial reports with precise JSON schema definitions.

Invoice data including items, prices, and totals
Financial statements with structured data fields
Purchase orders and payment information

Invoice Schema Example

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "invoice": {
      "type": "object",
      "properties": {
        "number": { "type": "string" },
        "date": { "type": "string" },
        "due_date": { "type": "string" },
        "items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": { "type": "string" },
              "quantity": { "type": "number" },
              "price": { "type": "number" },
              "amount": { "type": "number" }
            }
          }
        },
        "subtotal": { "type": "number" },
        "tax": { "type": "number" },
        "total": { "type": "number" }
      }
    }
  }
}

Data Visualization

Extract categories, labels, and data from charts and graphs in PDF documents.

Chart categories and labels
Numerical data from visualizations
Data series and trends

Chart Data Schema Example

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "categories": {
      "type": "array",
      "items": {
        "type": "string",
        "description": "Unique label of the data, without percentages"
      }
    }
  },
  "required": [
    "categories"
  ]
}

Text Documents

Extract titles, sections, and paragraphs from text-heavy documents.

Document title and subtitle
Section headers and content
Paragraphs and text blocks

Document Schema Example

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "title": {
      "type": "string"
    },
    "subtitle": {
      "type": "string"
    },
    "paragraphs": {
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  },
  "required": [
    "title",
    "subtitle",
    "paragraphs"
  ]
}

Contents

API

Extract