Tabular Data & Querying

Query CSV, Excel, and Parquet files using natural language. Upload your data to a workspace, then ask questions in plain English.

Natural Language SQL

No SQL knowledge required. ParseSphere interprets your questions, generates optimized queries, and returns results with natural language explanations.

Workspaces Also Support Documents

In addition to tabular data, workspaces can hold PDF, DOCX, PPTX, TXT, and MD files for semantic search and RAG. See Core Concepts for details on document capabilities.


Creating a Workspace

Workspaces organize related files that you want to query together. Create a workspace before uploading files:

POST/v1/workspaces

Create a container for organizing related files

bash
curl -X POST https://api.parsesphere.com/v1/workspaces \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
  "name": "Sales Analysis",
  "description": "Q4 2024 sales data and customer metrics"
}'

Managing Workspaces

GET/v1/workspaces

List all workspaces you have access to

bash
curl https://api.parsesphere.com/v1/workspaces \
-H "Authorization: Bearer sk_your_api_key"
DELETE/v1/workspaces/{workspace_id}

Delete a workspace and all its files

bash
curl -X DELETE https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000 \
-H "Authorization: Bearer sk_your_api_key"

Important

Deleting a workspace permanently removes all files and conversation history. This action cannot be undone.


Uploading Tabular Files

Upload CSV, Excel, or Parquet files to your workspace for natural language querying:

POST/v1/workspaces/{workspace_id}/files

Upload a tabular file for processing

bash
curl -X POST https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/files \
-H "Authorization: Bearer sk_your_api_key" \
-F "file=@sales_data.csv"

Supported File Types

Upload CSV (.csv), Excel (.xlsx, .xls), or Parquet (.parquet) files up to 200 MB. ParseSphere automatically infers column types and optimizes data for fast queries.


File Processing

File uploads are processed asynchronously. ParseSphere analyzes your data structure, infers types, and optimizes for analytical queries.

Processing Lifecycle

Queued

Waiting for processing

Processing

Analyzing and optimizing

Completed

Ready for queries

Failed

Processing error

Check Processing Status

GET/v1/workspaces/{workspace_id}/files/{file_id}/status

Monitor file processing progress

bash
curl https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/files/880e8400-e29b-41d4-a716-446655440000/status \
-H "Authorization: Bearer sk_your_api_key"

Tip

Poll the status endpoint every 5 seconds until status is completed or failed. Processing time varies by file size and complexity.

Managing Files

GET/v1/workspaces/{workspace_id}/files

List all files in a workspace

bash
curl https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/files \
-H "Authorization: Bearer sk_your_api_key"

Chatting with Your Data

Once your files are processed, start a conversation and ask questions in natural language:

POST/v1/workspaces/{workspace_id}/chat

Send a conversational message to query your data

bash
curl -X POST https://api.parsesphere.com/v1/workspaces/550e8400-e29b-41d4-a716-446655440000/chat \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
  "message": "What are the top 5 products by revenue?",
  "stream": false
}'

Chat Parameters

message
REQUIREDString

Your question about the data in natural language (e.g., 'What are the top 5 products by revenue?')

conversation_id
OPTIONALUUID

Continue an existing conversation with full context. Omit to start a new conversation

stream
OPTIONALBoolean
Default: true

Enable Server-Sent Events (SSE) streaming for real-time responses

dataset_ids
OPTIONALArray

Limit query to specific tabular files. Omit to query all files in workspace

max_iterations
OPTIONALInteger
Default: 15

Maximum agent iterations for complex queries (1-20). Default is 15. Higher values enable deeper analysis

model
OPTIONALString

Override default model (e.g., 'claude-sonnet-4', 'gpt-4o')

include_execution_details
OPTIONALBoolean
Default: false

Include detailed SQL execution metadata in response for transparency and debugging

Example Questions

Be Specific

More specific questions produce better results. Reference actual column names when possible.

Simple Aggregations:

  • "What are the top 5 products by revenue?"
  • "How many customers made purchases last month?"
  • "What's the average order value?"

Comparisons:

  • "Compare sales between Q1 and Q2"
  • "Which products have above-average revenue?"
  • "Show revenue growth month over month"

Filtering:

  • "Show me all customers who made more than 10 purchases"
  • "What products were sold in December?"
  • "List orders over $1000"

Follow-up Questions:

  • "Now show me just the Electronics category"
  • "What about for Q4 instead?"
  • "Break that down by region"

Multi-File Queries:

  • "What's the profit margin on our top-selling products?" (requires sales + products files)
  • "Which customers bought Product A and Product B?" (requires orders + customers files)

Conversational Context

The chat endpoint maintains conversation history, allowing natural follow-up questions:

bash
curl -X POST https://api.parsesphere.com/v1/workspaces/550e8400/chat \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
  "message": "What are the top products by revenue?"
}'

# Response includes conversation_id for follow-ups

Conversation benefits:

  • Natural follow-up questions without repeating context
  • Agent remembers previous queries and results
  • Build complex analyses iteratively
  • All conversation history saved automatically

Response Structure

Chat responses include natural language content and optional execution details:


json
{
"message_id": "990e8400-e29b-41d4-a716-446655440000",
"conversation_id": "880e8400-e29b-41d4-a716-446655440000",
"role": "assistant",
"content": "The top 3 products are Widget A ($125K), Widget B ($98K), and Widget C ($87K).",
"created_at": "2025-01-03T12:00:00Z"
}

Response fields:

  • content: Natural language answer to your question
  • conversation_id: Use this to continue the conversation
  • execution_details: Optional SQL execution metadata (when include_execution_details: true)

File Scoping

By default, chat searches all tabular files in a workspace. Use dataset_ids to limit scope:

bash
# Query all tabular files (default)
curl -X POST https://api.parsesphere.com/v1/workspaces/550e8400/chat \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
  "message": "What are total sales across all products?",
  "stream": false
}'

When to specify files:

  • Improve query speed by limiting scope
  • Prevent unintended joins when files share column names
  • Query specific subsets of your data

When to use all files:

  • Enable cross-file joins and correlations
  • Let the AI discover relationships automatically
  • Ask questions that span multiple data sources

Conversation History

View and manage your chat conversations:

GET/v1/workspaces/{workspace_id}/conversations

List all conversations in a workspace

bash
curl https://api.parsesphere.com/v1/workspaces/550e8400/conversations?limit=20&offset=0 \
-H "Authorization: Bearer sk_your_api_key"

Get Conversation Messages

Retrieve full message history with optional execution details:

GET/v1/workspaces/{workspace_id}/conversations/{conversation_id}

Get conversation details with message history

bash
curl "https://api.parsesphere.com/v1/workspaces/550e8400/conversations/880e8400?include_tool_details=true" \
-H "Authorization: Bearer sk_your_api_key"

Conversation history includes:

  • Full message thread (user and assistant messages)
  • SQL execution details when requested
  • Token usage and performance metrics
  • Conversation metadata and status

Best Practices

Chat Like a Pro

Follow these tips to get the most accurate results from conversational queries.

1. Organize Workspaces Logically

Group related files that you'll query together:

  • ✓ "Q4 Sales Analysis" → sales.csv, products.csv, customers.csv
  • ✓ "Financial Reporting" → revenue.csv, expenses.csv, budgets.csv
  • ✗ "All Company Data" → too broad, unrelated files

2. Use Descriptive Column Names

The AI relies on column names to understand your data:

  • customer_name, order_date, total_revenue
  • col1, col2, value

3. Wait for Processing

Always verify file status is completed before chatting:

python
import requests
import time

# Wait for file to be ready
while True:
  response = requests.get(
      f"https://api.parsesphere.com/v1/workspaces/{workspace_id}/files/{file_id}/status",
      headers={"Authorization": f"Bearer {api_key}"}
  )
  status = response.json()
  
  if status["status"] == "completed":
      break
  elif status["status"] == "failed":
      raise Exception(f"Processing failed: {status['error_message']}")
  
  time.sleep(5)

# Now chat
response = requests.post(
  f"https://api.parsesphere.com/v1/workspaces/{workspace_id}/chat",
  headers={
      "Authorization": f"Bearer {api_key}",
      "Content-Type": "application/json"
  },
  json={"message": "What are the top products?", "stream": False}
)

4. Start Simple, Build Complexity

Start with straightforward questions, then ask follow-ups:

  1. "How many rows are in this file?"
  2. "What columns are available?"
  3. "Show me the first 5 rows"
  4. "What are the top products by revenue?" (builds on context)
  5. "Now show me just Electronics" (uses conversation history)

5. Use Conversation Context

Ask follow-up questions naturally:

  • "What about Q4?" (after asking about Q3)
  • "Break that down by region" (after seeing totals)
  • "Show me the bottom 5 instead" (refining previous query)

6. Review SQL Execution

Enable include_execution_details: true to see:

  • Understand how your question was interpreted
  • Debug unexpected results
  • Learn which columns and tables were used
  • Copy SQL for use in other tools
  • Monitor token usage and performance

7. Control Iterations

Adjust max_iterations based on query complexity:

  • Simple queries: 5-10 iterations (faster)
  • Complex analysis: 15 iterations (default, thorough)

What's Next?

Continue learning: