ShadAI Framework
  1. core-concepts
ShadAI Framework
  • Index
  • README
  • Pricing
    • Pricing Plan
  • advanced
    • Best Practices
    • Tool Orchestration
    • Performance Optimization
    • Custom Tools
  • examples
    • Advanced Patterns
    • Custom Agent Examples
    • Market Research Examples
    • Multi-Document Analysis
    • Basic Query Examples
  • api-reference
    • Exceptions Reference
    • Engine Tool API
    • Shadai Client API Reference
    • Query Tool API
    • Summarize Tool API
    • Agent Tool API
    • Web Search Tool API
  • use-cases
    • Knowledge Synthesis
    • Research Assistant
    • Custom Workflows
    • Document Q&A
  • core-concepts
    • Architecture
    • Intelligent Agent
    • RAG System
    • Tools Overview
  • guides
    • Memory & Context
    • Streaming Responses
    • File Ingestion
    • Error Handling
    • Session Management
  • getting-started
    • Authentication
    • Your First Query
    • Quick Start
    • Installation
  1. core-concepts

Architecture

Understanding Shadai's architecture helps you build better applications and troubleshoot effectively.

High-Level Overview#

Your Application
     ↓
Shadai Python Client (SDK)
     ↓
REST API (Shadai Server)
     ↓
┌──────────────────────────────────┐
│  RAG Engine                      │
│  ├─ Document Processing          │
│  ├─ Vector Search                │
│  ├─ LLM Integration              │
│  └─ Memory Management            │
└──────────────────────────────────┘

Client-Server Architecture#

Python Client (Your Code)#

The Shadai client is what you interact with:
Responsibilities:
Session management
API communication
Response streaming
Error handling
Type safety

Shadai Server (Backend)#

The server handles all heavy lifting:
Document Processing:
PDF parsing
Text extraction
Intelligent chunking
Embedding generation
RAG Pipeline:
Vector similarity search
Context retrieval
LLM prompting
Response generation
State Management:
Session persistence
Chat history
Document storage
Memory management

Core Components#

1. Sessions#

Sessions are isolated workspaces:
Key Features:
Persistent by default
Isolated from other sessions
Shareable across runs
Queryable independently

2. Tools#

Tools are specialized capabilities:
ToolPurposeUse Case
QuerySearch documents"What does the contract say?"
SummarizeOverview of all docs"Give me the executive summary"
Web SearchCurrent information"Latest industry news"
EngineMulti-tool orchestration"Compare docs with trends"
AgentCustom tool execution"Analyze with my tools"

3. Memory System#

Memory enables context-aware conversations:
How it works:
Stores user messages
Stores AI responses
Maintains conversation thread
Provides context to LLM

4. Vector Store#

Documents are converted to searchable vectors:
Your PDF → Text Chunks → Embeddings → Vector Database
When you query:
Your Question → Embedding → Vector Search → Relevant Chunks → LLM → Answer

Data Flow#

Document Ingestion#

1. Upload File
   ↓
2. Server Processes
   - Extracts text
   - Splits into chunks
   - Creates embeddings
   ↓
3. Stores Vectors
   ↓
4. Returns Success
Your code:

Query Execution#

1. Send Question
   ↓
2. Server Processes
   - Creates question embedding
   - Searches vectors
   - Retrieves top matches
   - Prompts LLM with context
   ↓
3. Streams Response
   ↓
4. Saves to Memory
Your code:

Communication Protocol#

Request/Response Flow#

Benefits of Streaming:
Immediate feedback
Lower perceived latency
Interruptible
Better UX

Authentication#

Every request includes your API key:
Server validates and processes request.

Scalability & Performance#

Client-Side#

Fast:
Minimal dependencies
Async/await for concurrency
Efficient streaming
Type-safe operations
Example - Parallel Queries:

Server-Side#

The server is optimized for:
Fast vector search
Efficient embedding generation
LLM response streaming
Concurrent request handling
You don't need to worry about:
Database optimization
Vector indexing
LLM provider management
Load balancing

Security Model#

Authentication#

API key-based authentication
Keys are account-specific
Can be rotated anytime
Revokable instantly

Data Isolation#

Each session is isolated
Documents not shared between accounts
Chat history is private
Embeddings are account-specific

Transport Security#

HTTPS/TLS encryption
Secure API communication
No credentials in URLs
Token-based auth

Limitations & Constraints#

File Size#

Maximum: 35MB per file
Larger files are skipped
Compress if needed

Session Limits#

No hard limit on sessions
Organize sessions logically
Use temporal sessions for one-off queries

Rate Limiting#

Fair use policy applies
Concurrent requests allowed
Contact support for enterprise needs

Memory#

Conversation history accumulates
Clear periodically if needed
Affects token usage

Best Practices#

✅ Do This#

❌ Don't Do This#

Troubleshooting#

Slow Responses#

Causes:
Large documents (more context to process)
Complex queries
High server load
Solutions:
Reduce context size
Simplify queries
Use temporal sessions for quick queries

Memory Issues#

Causes:
Long conversation history
Many messages accumulated
Solutions:

Connection Errors#

Causes:
Network issues
Invalid API key
Server maintenance
Solutions:
Verify internet connection
Check API key validity
Implement retry logic
Check status page

Next Steps#

RAG System - Deep dive into RAG
Tools Overview - All available tools
Intelligent Agent - Agent architecture

Remember: The client is your interface. The server handles all complexity. Focus on building great applications!
Modified at 2025-10-17 17:47:10
Previous
Document Q&A
Next
Intelligent Agent
Built with