Transform your documents into queryable knowledge with Shadai's intelligent ingestion system.Quick Start#
| Format | Extension | Max Size | Notes |
|---|
| PDF | .pdf | 35 MB | Text extraction + OCR |
| Images | .jpg, .jpeg | 35 MB | OCR with vision models |
| Images | .png, .webp | 35 MB | OCR with vision models |
How Ingestion Works#
Your Files → Parse → Chunk → Embed → Vector Store
↓ ↓ ↓ ↓
Extract Smart Create PostgreSQL
Text Split Embeddings +pgvector
Step 1: Parsing#
Preserves structure and metadata
Step 2: Chunking#
Intelligent text splitting (4000 chars/chunk)
Maintains semantic coherence
Overlapping chunks for context
Step 3: Embedding#
Converts chunks to vector embeddings
Uses configured embedding model
Optimized for semantic search
Step 4: Storage#
Stores in PostgreSQL with pgvector
Indexed for fast retrieval
Basic Ingestion#
Single Folder#
Nested Folders#
all-docs/
├── reports/
│ ├── q1-2024.pdf
│ └── q2-2024.pdf
├── research/
│ ├── paper1.pdf
│ └── paper2.pdf
└── images/
├── chart1.png
└── diagram.jpg
All files ingested automatically!Ingestion Results#
Handling Results#
Incremental Ingestion#
Add documents to existing sessions:Best Practices#
✅ Do This#
❌ Don't Do This#
Ingestion Speed#
| File Size | Processing Time | Notes |
|---|
| < 1 MB | 5-10 seconds | Fast |
| 1-5 MB | 10-30 seconds | Typical |
| 5-20 MB | 30-120 seconds | Slower |
| 20-35 MB | 2-5 minutes | Maximum size |
Parallel Ingestion#
Troubleshooting#
File Not Ingested#
Parse Errors#
Slow Ingestion#
Advanced Usage#
Progress Tracking#
Selective Ingestion#
Next Steps#
Modified at 2025-10-17 17:47:30