Architecture

Understanding Shadai's architecture helps you build better applications and troubleshoot effectively.

High-Level Overview

Your Application
     ↓
Shadai Python Client (SDK)
     ↓
REST API (Shadai Server)
     ↓
┌──────────────────────────────────┐
│  RAG Engine                      │
│  ├─ Document Processing          │
│  ├─ Vector Search                │
│  ├─ LLM Integration              │
│  └─ Memory Management            │
└──────────────────────────────────┘

Client-Server Architecture

Python Client (Your Code)

The Shadai client is what you interact with:

Responsibilities:

Session management

API communication

Response streaming

Error handling

Type safety

Shadai Server (Backend)

The server handles all heavy lifting:

Document Processing:

PDF parsing

Text extraction

Intelligent chunking

Embedding generation

RAG Pipeline:

Vector similarity search

Context retrieval

LLM prompting

Response generation

State Management:

Session persistence

Chat history

Document storage

Memory management

Core Components

1. Sessions

Sessions are isolated workspaces:

Key Features:

Persistent by default

Isolated from other sessions

Shareable across runs

Queryable independently

2. Tools

Tools are specialized capabilities:

Tool	Purpose	Use Case
Query	Search documents	"What does the contract say?"
Summarize	Overview of all docs	"Give me the executive summary"
Web Search	Current information	"Latest industry news"
Engine	Multi-tool orchestration	"Compare docs with trends"
Agent	Custom tool execution	"Analyze with my tools"

3. Memory System

Memory enables context-aware conversations:

How it works:

Stores user messages

Stores AI responses

Maintains conversation thread

Provides context to LLM

4. Vector Store

Documents are converted to searchable vectors:

Your PDF → Text Chunks → Embeddings → Vector Database

When you query:

Your Question → Embedding → Vector Search → Relevant Chunks → LLM → Answer

Data Flow

Document Ingestion

1. Upload File
   ↓
2. Server Processes
   - Extracts text
   - Splits into chunks
   - Creates embeddings
   ↓
3. Stores Vectors
   ↓
4. Returns Success

Your code:

Query Execution

1. Send Question
   ↓
2. Server Processes
   - Creates question embedding
   - Searches vectors
   - Retrieves top matches
   - Prompts LLM with context
   ↓
3. Streams Response
   ↓
4. Saves to Memory

Your code:

Communication Protocol

Request/Response Flow

Benefits of Streaming:

Immediate feedback

Lower perceived latency

Interruptible

Better UX

Authentication

Every request includes your API key:

Server validates and processes request.

Scalability & Performance

Client-Side

Fast:

Minimal dependencies

Async/await for concurrency

Efficient streaming

Type-safe operations

Example - Parallel Queries:

Server-Side

The server is optimized for:

Fast vector search

Efficient embedding generation

LLM response streaming

Concurrent request handling

You don't need to worry about:

Database optimization

Vector indexing

LLM provider management

Load balancing

Security Model

Authentication

API key-based authentication

Keys are account-specific

Can be rotated anytime

Revokable instantly

Data Isolation

Each session is isolated

Documents not shared between accounts

Chat history is private

Embeddings are account-specific

Transport Security

HTTPS/TLS encryption

Secure API communication

No credentials in URLs

Token-based auth

Limitations & Constraints

File Size

Maximum: 35MB per file

Larger files are skipped

Compress if needed

Session Limits

No hard limit on sessions

Organize sessions logically

Use temporal sessions for one-off queries

Rate Limiting

Fair use policy applies

Concurrent requests allowed

Contact support for enterprise needs

Memory

Conversation history accumulates

Clear periodically if needed

Affects token usage

Best Practices

✅ Do This

❌ Don't Do This

Troubleshooting

Slow Responses

Causes:

Large documents (more context to process)

Complex queries

High server load

Solutions:

Reduce context size

Simplify queries

Use temporal sessions for quick queries

Memory Issues

Causes:

Long conversation history

Many messages accumulated

Solutions:

Connection Errors

Causes:

Network issues

Invalid API key

Server maintenance

Solutions:

Verify internet connection

Check API key validity

Implement retry logic

Check status page

Next Steps

RAG System - Deep dive into RAG

Tools Overview - All available tools

Intelligent Agent - Agent architecture

Remember: The client is your interface. The server handles all complexity. Focus on building great applications!

High-Level Overview#

Client-Server Architecture#

Python Client (Your Code)#

Shadai Server (Backend)#

Core Components#

1. Sessions#

2. Tools#

3. Memory System#

4. Vector Store#

Data Flow#

Document Ingestion#

Query Execution#

Communication Protocol#

Request/Response Flow#

Authentication#

Scalability & Performance#

Client-Side#

Server-Side#

Security Model#

Authentication#

Data Isolation#

Transport Security#

Limitations & Constraints#

File Size#

Session Limits#

Rate Limiting#

Memory#

Best Practices#

✅ Do This#

❌ Don't Do This#

Troubleshooting#

Slow Responses#

Memory Issues#

Connection Errors#

Next Steps#

High-Level Overview

Client-Server Architecture

Python Client (Your Code)

Shadai Server (Backend)

Core Components

1. Sessions

2. Tools

3. Memory System

4. Vector Store

Data Flow

Document Ingestion

Query Execution

Communication Protocol

Request/Response Flow

Authentication

Scalability & Performance

Client-Side

Server-Side

Security Model

Authentication

Data Isolation

Transport Security

Limitations & Constraints

File Size

Session Limits

Rate Limiting

Memory

Best Practices

✅ Do This

❌ Don't Do This

Troubleshooting

Slow Responses

Memory Issues

Connection Errors

Next Steps