Core Concepts
ParseSphere provides three main capabilities:
- Document Parsing — Extract text, tables, and metadata from documents (one-off processing)
- Tabular Data Queries — Upload CSV/Excel files and ask questions in plain English
- Document Search — Upload PDFs and documents, then search or chat with their contents
Understanding these concepts will help you pick the right approach for your use case.
Parse Jobs
Document parsing runs asynchronously because processing time varies based on file size, content, and whether OCR is needed. When you submit a document, you get back a parse_id to track progress.
Job Lifecycle
Waiting for available worker
Extracting text and analyzing content
Results ready for retrieval
Processing error occurred
Waiting for available worker
Extracting text and analyzing content
Results ready for retrieval
Processing error occurred
Queued — Waiting for an available worker. Usually just a few seconds.
Processing — Actively extracting content. You'll see progress updates (0-100%) and status messages like "Extracting text" or "Running OCR".
Completed — Results ready at /v1/parses/{parse_id}. Includes extracted text, tables, and metadata.
Failed — Something went wrong. Common causes: corrupted files, password-protected documents, or unsupported formats.
Tracking Progress
/v1/parses/{parse_id}Check processing status
curl https://api.parsesphere.com/v1/parses/550e8400-e29b-41d4-a716-446655440000 \
-H "Authorization: Bearer sk_your_api_key"Skip the polling
Pass a webhook_url when creating a parse to get notified automatically when processing finishes.
Result Caching
Parse results are cached based on the session_ttl parameter (default: 24 hours, minimum: 60 seconds). After expiration, you'll need to re-submit the document.
For documents you'll access repeatedly, set a longer TTL:
curl -X POST https://api.parsesphere.com/v1/parses \
-H "Authorization: Bearer sk_your_api_key" \
-F "file=@contract.pdf" \
-F "session_ttl=7200" # 2 hours
Workspaces
Workspaces are containers for your files. Upload data, then chat with it using natural language.
A single workspace can hold:
- Tabular files (CSV, XLSX, XLS, Parquet) — queried via SQL behind the scenes
- Documents (PDF, DOCX, PPTX, TXT) — searched using AI-powered semantic search
This means you can combine structured data and unstructured documents in the same workspace and ask questions across both.
When to Use Workspaces
Multi-file analysis — Query across multiple related files at once. Upload regional sales CSVs and ask "What's the total revenue across all regions?"
Document Q&A — Upload reports, contracts, or manuals and ask questions. "What are the payment terms in this contract?"
Ongoing analysis — Unlike parse jobs (which expire), workspace files stick around for as long as you need them.
Team collaboration — Share workspaces with your organization so others can query the same data.
Creating a Workspace
/v1/workspacesCreate a new workspace
curl -X POST https://api.parsesphere.com/v1/workspaces \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Q4 Sales Analysis",
"description": "Sales data and quarterly reports"
}'Workspace Roles
Access to workspaces is controlled by roles:
| Role | Can view & chat | Can upload/delete files | Can manage workspace |
|---|---|---|---|
| Owner | ✓ | ✓ | ✓ |
| Editor | ✓ | ✓ | — |
| Viewer | ✓ | — | — |
Viewers have implicit access to shared workspaces within the same organization.
Files
Files are what you upload to workspaces. ParseSphere automatically detects the file type and processes it accordingly.
File Categories
| Category | File Types | What Happens |
|---|---|---|
| Tabular | CSV, XLSX, XLS, Parquet | Converted to an optimized format for fast SQL queries |
| Document | PDF, DOCX, PPTX, TXT | Split into chunks, embedded for semantic search |
The category field in API responses tells you which type a file is.
Uploading Files
/v1/workspaces/{workspace_id}/filesUpload a file to your workspace
curl -X POST https://api.parsesphere.com/v1/workspaces/a1b2c3d4-e5f6-7890-abcd-ef1234567890/files \
-H "Authorization: Bearer sk_your_api_key" \
-F "file=@sales_q4.csv"File Processing
Like parse jobs, file processing is asynchronous:
Waiting for processing
Analyzing and indexing
Ready for queries
Processing error
Waiting for processing
Analyzing and indexing
Ready for queries
Processing error
For tabular files:
- Analyzes column structure and data types
- Extracts sample values to help the AI understand your data
- Converts to an optimized query format
For documents:
- Extracts text content from all pages
- Splits into semantic chunks
- Generates embeddings for search
- Extracts and indexes images (if present)
Information
Small files (under 5MB) typically process in seconds. Larger files or complex PDFs may take a minute or two.
Chatting with Your Data
Once files are processed, you can start asking questions. The chat understands both your tabular data and document contents.
How It Works
For tabular data, ParseSphere translates your question into SQL and runs it against your files.
For documents, it searches for relevant passages using semantic similarity, then synthesizes an answer.
For mixed workspaces, it automatically figures out the best approach based on your question.
Starting a Conversation
/v1/workspaces/{workspace_id}/chatAsk a question
curl -X POST https://api.parsesphere.com/v1/workspaces/a1b2c3d4-e5f6-7890-abcd-ef1234567890/chat \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"message": "What are the top 5 products by revenue?",
"stream": false
}'Follow-up Questions
Pass the conversation_id from the previous response to continue the conversation:
curl -X POST https://api.parsesphere.com/v1/workspaces/.../chat \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"message": "Break that down by region",
"conversation_id": "conv-87654321-wxyz-abcd-efgh-ijklmnopqrst"
}'
The AI remembers context from earlier in the conversation, so "that" refers to the top 5 products you just asked about.
Tips for Better Results
Be specific — "Show Q4 revenue by product category" beats "show sales"
Reference column names — If you know your CSV has a column called product_category, use that term
Start simple — Ask a straightforward question first, then drill down with follow-ups
Check the SQL — Add "include_execution_details": true to see the generated queries
What's Next?
- Document Parsing — Extraction options and supported formats
- Tabula — Deep dive into natural language data queries
- Rate Limits — API quotas and throttling
- Error Handling — Handling common error scenarios