File Ingestion

Transform your documents into queryable knowledge with Shadai's intelligent ingestion system.

Quick Start

Supported Formats

Format	Extension	Max Size	Notes
PDF	`.pdf`	35 MB	Text extraction + OCR
Images	`.jpg`, `.jpeg`	35 MB	OCR with vision models
Images	`.png`, `.webp`	35 MB	OCR with vision models

How Ingestion Works

Your Files → Parse → Chunk → Embed → Vector Store
                ↓        ↓       ↓          ↓
           Extract   Smart    Create    PostgreSQL
            Text     Split  Embeddings   +pgvector

Step 1: Parsing

Extracts text from PDFs

OCR for images

Preserves structure and metadata

Step 2: Chunking

Intelligent text splitting (4000 chars/chunk)

Maintains semantic coherence

Overlapping chunks for context

Step 3: Embedding

Converts chunks to vector embeddings

Uses configured embedding model

Optimized for semantic search

Step 4: Storage

Stores in PostgreSQL with pgvector

Indexed for fast retrieval

Linked to session

Basic Ingestion

Single Folder

Nested Folders

Example structure:

all-docs/
├── reports/
│   ├── q1-2024.pdf
│   └── q2-2024.pdf
├── research/
│   ├── paper1.pdf
│   └── paper2.pdf
└── images/
    ├── chart1.png
    └── diagram.jpg

All files ingested automatically!

Ingestion Results

Handling Results

Incremental Ingestion

Add documents to existing sessions:

Best Practices

✅ Do This

❌ Don't Do This

Performance

Ingestion Speed

File Size	Processing Time	Notes
< 1 MB	5-10 seconds	Fast
1-5 MB	10-30 seconds	Typical
5-20 MB	30-120 seconds	Slower
20-35 MB	2-5 minutes	Maximum size

Factors affecting speed:

File size

Document complexity

Number of pages

Image content

Server load

Parallel Ingestion

Troubleshooting

File Not Ingested

Check file size:

Check file format:

Parse Errors

Corrupted PDFs:

Slow Ingestion

Optimize:

Reduce file sizes

Compress PDFs

Remove unnecessary pages

Ingest in batches

Advanced Usage

Progress Tracking

Selective Ingestion

Next Steps

Streaming Responses - Handle query results

Session Management - Organize documents

Performance Optimization - Scale ingestion

Ready to query? → Your First Query

Quick Start#

Supported Formats#

How Ingestion Works#

Step 1: Parsing#

Step 2: Chunking#

Step 3: Embedding#

Step 4: Storage#

Basic Ingestion#

Single Folder#

Nested Folders#

Ingestion Results#

Handling Results#

Incremental Ingestion#

Best Practices#

✅ Do This#

❌ Don't Do This#

Performance#

Ingestion Speed#

Parallel Ingestion#

Troubleshooting#

File Not Ingested#

Parse Errors#

Slow Ingestion#

Advanced Usage#

Progress Tracking#

Selective Ingestion#

Next Steps#