RAG System Architecture for Electron Applications
RAG System Architecture for Electron Applications
Overview
This document describes a complete Retrieval-Augmented Generation (RAG) system implementation for Electron applications. The system provides local, privacy-first chat functionality with document content using embedded vector databases and local language models.
Architecture Components
┌─────────────────────────────────────────────────────────────┐
│ Electron Application │
├─────────────────────────────────────────────────────────────┤
│ React Frontend │
│ ├── Chat Interface Components │
│ ├── Settings Configuration │
│ └── Progress Indicators │
├─────────────────────────────────────────────────────────────┤
│ Service Layer │
│ ├── ChatService (Orchestration) │
│ ├── EmbeddingService (Local Model) │
│ ├── ChunkingService (Content Processing) │
│ └── VectorStoreService (LanceDB) │
├─────────────────────────────────────────────────────────────┤
│ Storage Layer │
│ ├── SQLite (Metadata & Chat History) │
│ ├── LanceDB (Vector Embeddings) │
│ └── Local Files (Documents) │
├─────────────────────────────────────────────────────────────┤
│ External Services │
│ └── Ollama (LLM for Response Generation) │
└─────────────────────────────────────────────────────────────┘
Core Services
1. EmbeddingService
Purpose: Generates vector embeddings using local transformer models.
Key Features:
- Uses
@xenova/transformers
for browser-compatible model execution - Default model:
all-MiniLM-L6-v2
(384 dimensions, ~25MB) - Automatic model downloading with progress tracking
- Singleton pattern for memory efficiency
- Batch processing support
interface EmbeddingConfig {
model: string; // Default: 'Xenova/all-MiniLM-L6-v2'
maxLength: number; // Default: 512 tokens
normalize: boolean; // Default: true
}
interface EmbeddingResult {
embedding: number[]; // 384-dimensional vector
text: string; // Source text
metadata?: Record<string, any>;
}
Implementation Details:
// Initialize with progress callback
await embeddingService.initialize((progress) => {
console.log(`${progress.status}: ${progress.loaded}/${progress.total}`);
});
// Generate embeddings
const result = await embeddingService.embedText("Sample text", metadata);
const batchResults = await embeddingService.embedBatch(textArray, metadataArray);
2. ChunkingService
Purpose: Splits large documents into semantically meaningful chunks for embedding.
Chunking Strategies:
- Speaker-based (Recommended for transcripts)
- Splits on speaker turn changes
- Maintains conversational context
- Fallback to time-based if chunks too large
- Time-based
- Fixed duration chunks with configurable overlap
- Good for consistent processing
- Overlap prevents context loss
- Hybrid
- Speaker-based with time constraints
- Best of both approaches
- Automatically splits large speaker segments
interface ChunkingConfig {
method: 'speaker' | 'time' | 'hybrid';
maxChunkSize: number; // seconds (default: 60)
chunkOverlap: number; // seconds (default: 10)
minChunkSize: number; // seconds (default: 5)
}
interface TextChunk {
id: string;
transcriptId: string;
text: string;
startTime: number;
endTime: number;
speaker?: string;
metadata: {
chunkIndex: number;
wordCount: number;
speakers: string[];
method: string;
};
}
3. VectorStoreService
Purpose: Stores and retrieves vector embeddings using LanceDB.
Key Features:
- Embedded LanceDB (no external database server)
- Vector similarity search with filtering
- Metadata-based queries (time range, speaker, etc.)
- Automatic schema management
- Bulk operations for efficiency
interface SearchOptions {
limit?: number; // Default: 5
minScore?: number; // Default: 0.0
transcriptId?: string; // Filter by source
speaker?: string; // Filter by speaker
timeRange?: { start: number; end: number };
}
interface SearchResult {
chunk: VectorChunk;
score: number; // Similarity score (0-1)
rank: number; // Result ranking
}
Vector Schema:
interface VectorChunk {
id: string;
transcriptId: string;
text: string;
vector: number[]; // 384-dimensional embedding
startTime: number;
endTime: number;
speaker?: string;
chunkIndex: number;
wordCount: number;
speakers: string[];
method: string;
createdAt: string;
}
4. ChatService
Purpose: Orchestrates the complete RAG pipeline and manages conversations.
Core Workflow:
- Document Processing → Chunking → Embedding → Vector Storage
- Query Processing → Embed Query → Vector Search → Context Building
- Response Generation → LLM Call → Response Processing → History Storage
interface ChatConfig {
contextChunks: number; // Default: 4
conversationMemoryLimit: number; // Default: 20
autoReembed: boolean; // Default: true
precomputeEmbeddings: boolean; // Default: true
chunkingMethod: 'speaker' | 'time' | 'hybrid';
maxChunkSize: number; // Default: 60s
chunkOverlap: number; // Default: 10s
}
RAG Pipeline Implementation
Document Processing Pipeline
async function processDocumentForChat(
document: Document,
segments: DocumentSegment[],
onProgress?: (progress: ProcessingProgress) => void
): Promise<void> {
// 1. Chunking Phase
onProgress?.({ stage: 'chunking', progress: 0, message: 'Analyzing structure...' });
const chunks = chunkingService.chunkDocument(document.id, segments, document.fullText);
// 2. Embedding Phase
const embeddings = [];
for (let i = 0; i < chunks.length; i++) {
const embedding = await embeddingService.embedText(chunks[i].text);
embeddings.push(embedding);
onProgress?.({
stage: 'embedding',
progress: i + 1,
total: chunks.length,
message: `Embedded chunk ${i + 1}/${chunks.length}`
});
}
// 3. Storage Phase
onProgress?.({ stage: 'storing', progress: 0, message: 'Storing vectors...' });
await vectorStoreService.storeChunks(chunks, embeddings);
onProgress?.({ stage: 'complete', progress: 100, message: 'Ready for chat!' });
}
Query Processing Pipeline
async function chatWithDocument(
documentId: string,
conversationId: string,
userMessage: string,
conversationHistory: ChatMessage[]
): Promise<ChatMessage> {
// 1. Generate query embedding
const questionEmbedding = await embeddingService.embedText(userMessage);
// 2. Retrieve relevant chunks
const searchResults = await vectorStoreService.searchSimilar(
questionEmbedding.embedding,
{
limit: config.contextChunks,
transcriptId: documentId,
minScore: 0.1
}
);
// 3. Build context from chunks and conversation history
const context = buildContext(searchResults, conversationHistory);
// 4. Generate response
const response = await generateResponse(context, userMessage, documentId);
// 5. Store and return message
const assistantMessage = {
id: generateId(),
role: 'assistant',
content: response,
timestamp: new Date().toISOString(),
metadata: { chunks: searchResults, processingTime: Date.now() - startTime }
};
await storeChatMessage(conversationId, assistantMessage);
return assistantMessage;
}
Context Building Strategy
function buildContext(
searchResults: SearchResult[],
conversationHistory: ChatMessage[]
): string {
let context = '';
// Add relevant document chunks
if (searchResults.length > 0) {
context += 'RELEVANT DOCUMENT CONTENT:\n\n';
searchResults.forEach((result, index) => {
const timeStamp = formatTime(result.chunk.startTime);
const speaker = result.chunk.speaker ? `[${result.chunk.speaker}]` : '';
context += `[${timeStamp}] ${speaker} ${result.chunk.text}\n\n`;
});
}
// Add recent conversation history
if (conversationHistory.length > 0) {
context += '\nRECENT CONVERSATION:\n\n';
const recentMessages = conversationHistory.slice(-6); // Last 3 exchanges
recentMessages.forEach(msg => {
context += `${msg.role.toUpperCase()}: ${msg.content}\n\n`;
});
}
return context;
}
Conversation Memory Management
Memory Compaction Strategy
interface ConversationMemory {
activeMessages: ChatMessage[]; // Recent messages (keep as-is)
compactedSummary: string; // Summarized older conversation
totalExchanges: number; // Track conversation length
}
async function manageConversationMemory(
conversationId: string,
messages: ChatMessage[],
memoryLimit: number
): Promise<ConversationMemory> {
if (messages.length <= memoryLimit) {
return {
activeMessages: messages,
compactedSummary: '',
totalExchanges: messages.length
};
}
// Keep recent messages
const keepCount = Math.floor(memoryLimit * 0.4); // Keep last 40%
const activeMessages = messages.slice(-keepCount);
// Compact older messages
const messagesToCompact = messages.slice(0, -keepCount);
const compactedSummary = await compactMessages(messagesToCompact);
return {
activeMessages,
compactedSummary,
totalExchanges: messages.length
};
}
async function compactMessages(messages: ChatMessage[]): Promise<string> {
const conversationText = messages
.map(msg => `${msg.role}: ${msg.content}`)
.join('\n');
const compactionPrompt = `
Summarize this conversation concisely, preserving key topics and conclusions:
${conversationText}
Summary (2-3 bullets max):
`;
const summary = await generateResponse(compactionPrompt, '', '');
return summary;
}
Enhanced Context Building with Memory
function buildContextWithMemory(
searchResults: SearchResult[],
memory: ConversationMemory
): string {
let context = '';
// Add document chunks
if (searchResults.length > 0) {
context += 'RELEVANT DOCUMENT CONTENT:\n\n';
searchResults.forEach(result => {
context += `${result.chunk.text}\n\n`;
});
}
// Add compacted conversation summary
if (memory.compactedSummary) {
context += 'CONVERSATION SUMMARY:\n\n';
context += `${memory.compactedSummary}\n\n`;
}
// Add recent messages
if (memory.activeMessages.length > 0) {
context += 'RECENT CONVERSATION:\n\n';
memory.activeMessages.forEach(msg => {
context += `${msg.role.toUpperCase()}: ${msg.content}\n\n`;
});
}
return context;
}
Configuration Management
Settings Integration
interface RAGSettings {
// Embedding Settings
embeddingModel: string;
// Chunking Settings
chunkingMethod: 'speaker' | 'time' | 'hybrid';
maxChunkSize: number; // 30-180 seconds
chunkOverlap: number; // 0-30 seconds
// Chat Settings
contextChunks: number; // 1-10 chunks
conversationMemoryLimit: number; // 5-50 messages
// Performance Settings
precomputeEmbeddings: boolean;
autoReembed: boolean;
}
// Database schema for settings
const defaultSettings = {
'ragEmbeddingModel': 'Xenova/all-MiniLM-L6-v2',
'ragChunkingMethod': 'speaker',
'ragMaxChunkSize': '60',
'ragChunkOverlap': '10',
'ragContextChunks': '4',
'ragMemoryLimit': '20',
'ragPrecomputeEmbeddings': 'true',
'ragAutoReembed': 'true'
};
Settings UI Components
// Example settings component
function RAGSettingsPanel() {
return (
<div className="space-y-6">
<h3>Chat & AI Settings</h3>
{/* Context Chunks Slider */}
<div>
<label>Context Chunks: {contextChunks}</label>
<input
type="range"
min="1"
max="10"
value={contextChunks}
onChange={(e) => {
setContextChunks(parseInt(e.target.value));
saveSetting('ragContextChunks', e.target.value);
}}
/>
<p className="help-text">
Number of relevant chunks to include in chat context
</p>
</div>
{/* Chunking Method Select */}
<div>
<label>Chunking Method</label>
<select value={chunkingMethod} onChange={handleMethodChange}>
<option value="speaker">Speaker-based</option>
<option value="time">Time-based</option>
<option value="hybrid">Hybrid</option>
</select>
</div>
{/* Memory Limit Slider */}
<div>
<label>Memory Limit: {memoryLimit} messages</label>
<input
type="range"
min="5"
max="50"
step="5"
value={memoryLimit}
onChange={handleMemoryLimitChange}
/>
</div>
</div>
);
}
Database Schema
SQLite Tables for Metadata
-- Chat conversations
CREATE TABLE IF NOT EXISTS chat_conversations (
id TEXT PRIMARY KEY,
document_id TEXT NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
summary TEXT,
total_exchanges INTEGER DEFAULT 0,
FOREIGN KEY (document_id) REFERENCES documents(id) ON DELETE CASCADE
);
-- Chat messages
CREATE TABLE IF NOT EXISTS chat_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
conversation_id TEXT NOT NULL,
role TEXT CHECK(role IN ('user', 'assistant')) NOT NULL,
content TEXT NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
metadata TEXT, -- JSON metadata
FOREIGN KEY (conversation_id) REFERENCES chat_conversations(id) ON DELETE CASCADE
);
-- Conversation memory (for compacted summaries)
CREATE TABLE IF NOT EXISTS conversation_memory (
conversation_id TEXT PRIMARY KEY,
compacted_summary TEXT,
active_message_count INTEGER DEFAULT 0,
last_compaction_at DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (conversation_id) REFERENCES chat_conversations(id) ON DELETE CASCADE
);
-- Settings
CREATE TABLE IF NOT EXISTS settings (
key TEXT PRIMARY KEY,
value TEXT NOT NULL,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
LanceDB Schema
// Vector table schema (handled automatically by LanceDB)
interface VectorTableSchema {
id: string; // Unique chunk identifier
document_id: string; // Source document reference
text: string; // Chunk text content
vector: number[]; // 384-dimensional embedding
start_time: number; // Time offset (for media)
end_time: number; // End time offset
speaker?: string; // Speaker identifier
chunk_index: number; // Chunk sequence number
word_count: number; // Text statistics
speakers: string[]; // All speakers in chunk
method: string; // Chunking method used
created_at: string; // Timestamp
}
Performance Optimization
Embedding Model Management
class EmbeddingModelManager {
private modelCache = new Map<string, Pipeline>();
async getModel(modelName: string): Promise<Pipeline> {
if (this.modelCache.has(modelName)) {
return this.modelCache.get(modelName)!;
}
const model = await pipeline('feature-extraction', modelName);
this.modelCache.set(modelName, model);
return model;
}
// Preload model on app startup
async warmup(modelName: string): Promise<void> {
await this.getModel(modelName);
}
// Memory cleanup
clearCache(): void {
this.modelCache.clear();
}
}
Batch Processing
async function batchProcessDocuments(
documents: Document[],
batchSize: number = 5
): Promise<void> {
for (let i = 0; i < documents.length; i += batchSize) {
const batch = documents.slice(i, i + batchSize);
await Promise.all(
batch.map(doc => processDocumentForChat(doc))
);
// Progress update
const progress = Math.min(i + batchSize, documents.length);
console.log(`Processed ${progress}/${documents.length} documents`);
}
}
Vector Search Optimization
// Implement search result caching
class SearchCache {
private cache = new Map<string, { results: SearchResult[]; timestamp: number }>();
private maxAge = 5 * 60 * 1000; // 5 minutes
get(query: string): SearchResult[] | null {
const cached = this.cache.get(query);
if (cached && Date.now() - cached.timestamp < this.maxAge) {
return cached.results;
}
return null;
}
set(query: string, results: SearchResult[]): void {
this.cache.set(query, { results, timestamp: Date.now() });
}
}
Error Handling & Recovery
Robust Service Initialization
async function initializeRAGSystem(retries: number = 3): Promise<boolean> {
for (let attempt = 1; attempt <= retries; attempt++) {
try {
// Initialize embedding service
await embeddingService.initialize();
// Initialize vector store
await vectorStoreService.initialize();
// Test functionality
await embeddingService.embedText("test");
return true;
} catch (error) {
console.error(`Initialization attempt ${attempt} failed:`, error);
if (attempt === retries) {
throw new Error(`Failed to initialize RAG system after ${retries} attempts`);
}
// Wait before retry
await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
}
}
return false;
}
Graceful Degradation
async function chatWithFallback(
documentId: string,
userMessage: string,
conversationHistory: ChatMessage[]
): Promise<ChatMessage> {
try {
// Try full RAG pipeline
return await chatWithDocument(documentId, userMessage, conversationHistory);
} catch (error) {
console.error('RAG pipeline failed, falling back to simple chat:', error);
try {
// Fallback: use document text directly (no vector search)
const document = await getDocument(documentId);
const simpleContext = document.fullText.slice(0, 2000); // First 2000 chars
return await generateSimpleResponse(simpleContext, userMessage);
} catch (fallbackError) {
console.error('Fallback also failed:', fallbackError);
// Final fallback: error message
return {
id: generateId(),
role: 'assistant',
content: 'I apologize, but I encountered an error processing your question. Please try again.',
timestamp: new Date().toISOString(),
metadata: { error: true }
};
}
}
}
Deployment Considerations
Electron Integration
// In main process (electron.js)
ipcMain.handle('rag-initialize', async () => {
try {
await ragSystem.initialize();
return { success: true };
} catch (error) {
return { success: false, error: error.message };
}
});
ipcMain.handle('rag-chat', async (event, { documentId, message, conversationId }) => {
try {
const response = await ragSystem.chat(documentId, message, conversationId);
return { success: true, response };
} catch (error) {
return { success: false, error: error.message };
}
});
Resource Management
// Monitor memory usage
function monitorResourceUsage(): void {
setInterval(() => {
const memoryUsage = process.memoryUsage();
const vectorStats = vectorStoreService.getStats();
console.log({
heapUsed: Math.round(memoryUsage.heapUsed / 1024 / 1024) + 'MB',
totalVectors: vectorStats.totalChunks,
cacheSize: embeddingService.getCacheSize()
});
// Cleanup if memory usage too high
if (memoryUsage.heapUsed > 500 * 1024 * 1024) { // 500MB
embeddingService.clearCache();
}
}, 30000); // Every 30 seconds
}
Data Directory Structure
app-data/
├── database/
│ ├── app.db # SQLite database
│ └── vectors/ # LanceDB files
│ ├── transcript_chunks.lance
│ └── .lance/
├── models/
│ └── embeddings/ # Downloaded models
│ └── Xenova--all-MiniLM-L6-v2/
├── temp/ # Temporary processing files
└── backups/ # Database backups
Security Considerations
Data Privacy
- All embeddings and vectors stored locally
- No data sent to external services except LLM queries
- User documents never leave the device
- Configurable data retention policies
Input Sanitization
function sanitizeInput(text: string): string {
// Remove potentially harmful content
return text
.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, '')
.replace(/javascript:/gi, '')
.trim()
.slice(0, 10000); // Limit length
}
Testing Strategy
Unit Tests
describe('EmbeddingService', () => {
test('should generate consistent embeddings', async () => {
const text = "Test document content";
const result1 = await embeddingService.embedText(text);
const result2 = await embeddingService.embedText(text);
expect(result1.embedding).toEqual(result2.embedding);
expect(result1.embedding).toHaveLength(384);
});
});
describe('ChunkingService', () => {
test('should create appropriate chunks', () => {
const segments = mockTranscriptSegments;
const chunks = chunkingService.chunkTranscript('test-id', segments);
expect(chunks.length).toBeGreaterThan(0);
expect(chunks[0].metadata.method).toBe('speaker');
});
});
Integration Tests
describe('RAG Pipeline', () => {
test('should process document and enable chat', async () => {
const document = mockDocument;
// Process document
await chatService.processDocumentForChat(document, []);
// Verify chunks stored
const stats = await vectorStoreService.getStats();
expect(stats.transcripts).toContain(document.id);
// Test chat
const response = await chatService.chatWithDocument(
document.id,
'conversation-1',
'What is this document about?',
[]
);
expect(response.content).toBeTruthy();
expect(response.role).toBe('assistant');
});
});
Migration and Upgrades
Schema Migrations
const migrations = [
{
version: 1,
up: async (db: Database) => {
await db.exec(`
CREATE TABLE chat_conversations (
id TEXT PRIMARY KEY,
document_id TEXT NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
`);
}
},
{
version: 2,
up: async (db: Database) => {
await db.exec(`
ALTER TABLE chat_conversations
ADD COLUMN summary TEXT;
`);
}
}
];
Model Updates
async function updateEmbeddingModel(
oldModel: string,
newModel: string
): Promise<void> {
// 1. Download new model
await embeddingService.downloadModel(newModel);
// 2. Re-embed existing documents
const documents = await getAllDocuments();
for (const doc of documents) {
await reprocessDocument(doc.id, newModel);
}
// 3. Update configuration
await updateSetting('embeddingModel', newModel);
// 4. Cleanup old model
await embeddingService.removeModel(oldModel);
}
This RAG system provides a complete, production-ready solution for adding AI chat capabilities to any Electron application while maintaining privacy and performance through local processing.