Core Concepts

ParseSphere provides three main capabilities:

  • Document Parsing — Extract text, tables, and metadata from documents (one-off processing)
  • Tabular Data Queries — Upload CSV/Excel files and ask questions in plain English
  • Document Search — Upload PDFs and documents, then search or chat with their contents

Understanding these concepts will help you pick the right approach for your use case.


Parse Jobs

Document parsing runs asynchronously because processing time varies based on file size, content, and whether OCR is needed. When you submit a document, you get back a parse_id to track progress.

Job Lifecycle

Queued

Waiting for available worker

Processing

Extracting text and analyzing content

Completed

Results ready for retrieval

Failed

Processing error occurred

Queued — Waiting for an available worker. Usually just a few seconds.

Processing — Actively extracting content. You'll see progress updates (0-100%) and status messages like "Extracting text" or "Running OCR".

Completed — Results ready at /v1/parses/{parse_id}. Includes extracted text, tables, and metadata.

Failed — Something went wrong. Common causes: corrupted files, password-protected documents, or unsupported formats.

Tracking Progress

GET/v1/parses/{parse_id}

Check processing status

bash
curl https://api.parsesphere.com/v1/parses/550e8400-e29b-41d4-a716-446655440000 \
-H "Authorization: Bearer sk_your_api_key"

Skip the polling

Pass a webhook_url when creating a parse to get notified automatically when processing finishes.

Result Caching

Parse results are cached based on the session_ttl parameter (default: 24 hours, minimum: 60 seconds). After expiration, you'll need to re-submit the document.

For documents you'll access repeatedly, set a longer TTL:

curl -X POST https://api.parsesphere.com/v1/parses \
  -H "Authorization: Bearer sk_your_api_key" \
  -F "file=@contract.pdf" \
  -F "session_ttl=7200"  # 2 hours

Workspaces

Workspaces are containers for your files. Upload data, then chat with it using natural language.

A single workspace can hold:

  • Tabular files (CSV, XLSX, XLS, Parquet) — queried via SQL behind the scenes
  • Documents (PDF, DOCX, PPTX, TXT) — searched using AI-powered semantic search

This means you can combine structured data and unstructured documents in the same workspace and ask questions across both.

When to Use Workspaces

Multi-file analysis — Query across multiple related files at once. Upload regional sales CSVs and ask "What's the total revenue across all regions?"

Document Q&A — Upload reports, contracts, or manuals and ask questions. "What are the payment terms in this contract?"

Ongoing analysis — Unlike parse jobs (which expire), workspace files stick around for as long as you need them.

Team collaboration — Share workspaces with your organization so others can query the same data.

Creating a Workspace

POST/v1/workspaces

Create a new workspace

bash
curl -X POST https://api.parsesphere.com/v1/workspaces \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
  "name": "Q4 Sales Analysis",
  "description": "Sales data and quarterly reports"
}'

Workspace Roles

Access to workspaces is controlled by roles:

RoleCan view & chatCan upload/delete filesCan manage workspace
Owner
Editor
Viewer

Viewers have implicit access to shared workspaces within the same organization.


Files

Files are what you upload to workspaces. ParseSphere automatically detects the file type and processes it accordingly.

File Categories

CategoryFile TypesWhat Happens
TabularCSV, XLSX, XLS, ParquetConverted to an optimized format for fast SQL queries
DocumentPDF, DOCX, PPTX, TXTSplit into chunks, embedded for semantic search

The category field in API responses tells you which type a file is.

Uploading Files

POST/v1/workspaces/{workspace_id}/files

Upload a file to your workspace

bash
curl -X POST https://api.parsesphere.com/v1/workspaces/a1b2c3d4-e5f6-7890-abcd-ef1234567890/files \
-H "Authorization: Bearer sk_your_api_key" \
-F "file=@sales_q4.csv"

File Processing

Like parse jobs, file processing is asynchronous:

Queued

Waiting for processing

Processing

Analyzing and indexing

Completed

Ready for queries

Failed

Processing error

For tabular files:

  1. Analyzes column structure and data types
  2. Extracts sample values to help the AI understand your data
  3. Converts to an optimized query format

For documents:

  1. Extracts text content from all pages
  2. Splits into semantic chunks
  3. Generates embeddings for search
  4. Extracts and indexes images (if present)

Information

Small files (under 5MB) typically process in seconds. Larger files or complex PDFs may take a minute or two.


Chatting with Your Data

Once files are processed, you can start asking questions. The chat understands both your tabular data and document contents.

How It Works

For tabular data, ParseSphere translates your question into SQL and runs it against your files.

For documents, it searches for relevant passages using semantic similarity, then synthesizes an answer.

For mixed workspaces, it automatically figures out the best approach based on your question.

Starting a Conversation

POST/v1/workspaces/{workspace_id}/chat

Ask a question

bash
curl -X POST https://api.parsesphere.com/v1/workspaces/a1b2c3d4-e5f6-7890-abcd-ef1234567890/chat \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
  "message": "What are the top 5 products by revenue?",
  "stream": false
}'

Follow-up Questions

Pass the conversation_id from the previous response to continue the conversation:

curl -X POST https://api.parsesphere.com/v1/workspaces/.../chat \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Break that down by region",
    "conversation_id": "conv-87654321-wxyz-abcd-efgh-ijklmnopqrst"
  }'

The AI remembers context from earlier in the conversation, so "that" refers to the top 5 products you just asked about.

Tips for Better Results

Be specific — "Show Q4 revenue by product category" beats "show sales"

Reference column names — If you know your CSV has a column called product_category, use that term

Start simple — Ask a straightforward question first, then drill down with follow-ups

Check the SQL — Add "include_execution_details": true to see the generated queries


What's Next?