ShadAI Framework
  1. guides
ShadAI Framework
  • Index
  • README
  • Pricing
    • Pricing Plan
  • advanced
    • Best Practices
    • Tool Orchestration
    • Performance Optimization
    • Custom Tools
  • examples
    • Advanced Patterns
    • Custom Agent Examples
    • Market Research Examples
    • Multi-Document Analysis
    • Basic Query Examples
  • api-reference
    • Exceptions Reference
    • Engine Tool API
    • Shadai Client API Reference
    • Query Tool API
    • Summarize Tool API
    • Agent Tool API
    • Web Search Tool API
  • use-cases
    • Knowledge Synthesis
    • Research Assistant
    • Custom Workflows
    • Document Q&A
  • core-concepts
    • Architecture
    • Intelligent Agent
    • RAG System
    • Tools Overview
  • guides
    • Memory & Context
    • Streaming Responses
    • File Ingestion
    • Error Handling
    • Session Management
  • getting-started
    • Authentication
    • Your First Query
    • Quick Start
    • Installation
  1. guides

File Ingestion

Transform your documents into queryable knowledge with Shadai's intelligent ingestion system.

Quick Start#

Supported Formats#

FormatExtensionMax SizeNotes
PDF.pdf35 MBText extraction + OCR
Images.jpg, .jpeg35 MBOCR with vision models
Images.png, .webp35 MBOCR with vision models

How Ingestion Works#

Your Files → Parse → Chunk → Embed → Vector Store
                ↓        ↓       ↓          ↓
           Extract   Smart    Create    PostgreSQL
            Text     Split  Embeddings   +pgvector

Step 1: Parsing#

Extracts text from PDFs
OCR for images
Preserves structure and metadata

Step 2: Chunking#

Intelligent text splitting (4000 chars/chunk)
Maintains semantic coherence
Overlapping chunks for context

Step 3: Embedding#

Converts chunks to vector embeddings
Uses configured embedding model
Optimized for semantic search

Step 4: Storage#

Stores in PostgreSQL with pgvector
Indexed for fast retrieval
Linked to session

Basic Ingestion#

Single Folder#

Nested Folders#

Example structure:
all-docs/
├── reports/
│   ├── q1-2024.pdf
│   └── q2-2024.pdf
├── research/
│   ├── paper1.pdf
│   └── paper2.pdf
└── images/
    ├── chart1.png
    └── diagram.jpg
All files ingested automatically!

Ingestion Results#

Handling Results#

Incremental Ingestion#

Add documents to existing sessions:

Best Practices#

✅ Do This#

❌ Don't Do This#

Performance#

Ingestion Speed#

File SizeProcessing TimeNotes
< 1 MB5-10 secondsFast
1-5 MB10-30 secondsTypical
5-20 MB30-120 secondsSlower
20-35 MB2-5 minutesMaximum size
Factors affecting speed:
File size
Document complexity
Number of pages
Image content
Server load

Parallel Ingestion#

Troubleshooting#

File Not Ingested#

Check file size:
Check file format:

Parse Errors#

Corrupted PDFs:

Slow Ingestion#

Optimize:
Reduce file sizes
Compress PDFs
Remove unnecessary pages
Ingest in batches

Advanced Usage#

Progress Tracking#

Selective Ingestion#

Next Steps#

Streaming Responses - Handle query results
Session Management - Organize documents
Performance Optimization - Scale ingestion

Ready to query? → Your First Query
Modified at 2025-10-17 17:47:30
Previous
Streaming Responses
Next
Error Handling
Built with