Suparse
API Documentation

Document Extraction API Overview

This page provides a comprehensive overview of the API, including authentication, available endpoints, and a step-by-step guide to processing documents.

GET/api/v1/

Overview

Introduction

Welcome to the Suparse Document Processing API! Our goal is to provide a powerful yet simple interface to automate the extraction of structured data from your documents. Whether you're processing invoices, receipts, or bank statements, this API is designed to handle the entire lifecycle, from upload to data retrieval.

This guide will walk you through the essential concepts, including authentication, the available endpoints, and the asynchronous workflow you'll use to process your documents.

Authentication

All API requests must be authenticated using an API key. You can generate and manage your API keys from your user dashboard.

Provide your API key in the X-API-Key header with every request.

X-API-Key: pk_abcd1234_secretsecretsecretsecretsecret

Requests without a valid API key will fail with a 401 Unauthorized error.

API Endpoints

Here is a summary of the primary endpoints for managing your documents:

MethodEndpointDescription
POST/api/v1/documents/{doc_type}Uploads a new document for processing.
GET/api/v1/documents/Lists all your accessible documents with pagination and filtering.
GET/api/v1/documents/{document_id}/resultRetrieves the extracted data and status for a specific processed document.
POST/api/v1/documents/download_selectedDownloads extracted data for multiple documents in JSON, CSV, XLSX, or QuickBooks CSV format.
DELETE/api/v1/documents/{document_id}Deletes a document permanently

The Asynchronous Workflow

Document processing is an asynchronous operation. You upload a file and then check for the result later. This ensures that you get a fast response and that intensive processing happens in the background.

Here’s the typical flow:

[Client] --1. Upload File--> [API: 202 Accepted]

[Client] <--2. Poll for Result-- [API: 202 Processing / 200 OK / 404 Error]

Step 1: Upload the Document

Begin by making a POST request to the Upload Document endpoint with your file. If the request is valid, the API will immediately respond with a 202 Accepted status, a unique document_id, and a status of "queued".

Step 2: Poll for Processing Results

Because processing happens in the background, you must periodically check for the result using the Get Document Result endpoint.

Best Practices for Polling:

  • Check the Status Code:
    • A 202 Accepted response means the document is still processing. You should wait and try again.
    • A 200 OK response means processing is complete and the body contains your extracted data.
    • A 404 Not Found response indicates either the document ID is invalid or processing failed. The detail field will provide more information.
  • To avoid excessive requests, we recommend making first request 5 seconds after submitting the file for processing, then after every 3 seconds.

Step 3: Download or Use the Results

Once the status is 200 OK, the response body will contain your structured data. You can use this data directly or download it in bulk using the Download Selected Documents endpoint.

Step 4: Delete the Document (Optional)

After you have retrieved the results, you can make a DELETE request to the Delete Document endpoint. This action is irreversible.

Handling Responses & Errors

Understanding the HTTP status codes our API returns will help you build a robust integration.

  • 200 OK: The request was successful, and the response body contains the requested data.
  • 202 Accepted: Your document was successfully uploaded and is queued for processing. Poll for the result.
  • 204 No Content: Your DELETE request was successful.
  • 400 Bad Request: The request was malformed (e.g., invalid JSON, wrong file type, invalid UUID format).
  • 401 Unauthorized: Your X-API-Key is missing, invalid, or expired.
  • 403 Forbidden: You do not have permission to perform this action (e.g., insufficient credits).
  • 404 Not Found: The requested resource (like a document or its result) does not exist.
  • 429 Too Many Requests: You have exceeded the rate limit for an endpoint.
  • 500 Internal Server Error: An unexpected error occurred on our end. Please try again later.

Code Examples

import requests
import time
import os
import json

# --- Configuration ---
API_KEY = "pk_abcd1234_secretsecretsecretsecretsecret" # Replace with your actual API key
BASE_URL = "https://api.suparse.com/api/v1"
FILE_PATH = "/path/to/your/invoice.pdf" # Replace with your file path
DOC_TYPE = "invoice"

def upload_document(file_path, doc_type):
    """Uploads a document and returns its ID."""
    print(f"Uploading {file_path}...")
    url = f"{BASE_URL}/documents/{doc_type}"
    headers = {"X-API-Key": API_KEY}
    
    with open(file_path, "rb") as f:
        files = {"file": (os.path.basename(file_path), f)}
        response = requests.post(url, headers=headers, files=files)

    if response.status_code == 202:
        data = response.json()
        print(f"Upload successful. Document ID: {data['document_id']}")
        return data['document_id']
    else:
        print(f"Error during upload: {response.status_code} {response.text}")
        return None

def poll_for_result(document_id):
    """Polls for the processing result of a document."""
    url = f"{BASE_URL}/documents/{document_id}/result"
    headers = {"X-API-Key": API_KEY}
    
    max_attempts = 20 # Increased attempts for shorter polling interval
    delay = 5  # Initial 5-second delay

    for attempt in range(max_attempts):
        print(f"Polling for result (Attempt {attempt + 1}/{max_attempts})...")
        response = requests.get(url, headers=headers)
        
        if response.status_code == 200:
            print("Processing complete!")
            return response.json()
        elif response.status_code == 202:
            print(f"Status is 'processing', waiting for {delay} seconds...")
            time.sleep(delay)
            delay = 3 # Set all subsequent delays to 3 seconds
        else:
            print(f"Polling failed with status {response.status_code}: {response.text}")
            return None
    
    print("Polling timed out after maximum attempts.")
    return None

if __name__ == "__main__":
    doc_id = upload_document(FILE_PATH, DOC_TYPE)
    if doc_id:
        result = poll_for_result(doc_id)
        if result:
            print("\n--- Extracted Data ---")
            print(json.dumps(result, indent=2))