ObsiViewer/docs/PERFORMENCE/strategy/PERFORMANCE_OPTIMIZATION_STRATEGY.md

573 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Performance Optimization Strategy for Large Vault Startup
## Executive Summary
When deploying ObsiViewer with a large vault (1000+ markdown files), the initial startup is slow because the application loads **all notes with full content** before rendering the UI. This document outlines a comprehensive strategy to improve the user experience through metadata-first loading, lazy loading, and server-side optimizations.
**Expected Improvement**: From 10-30 seconds startup → 2-5 seconds to interactive UI
---
## Problem Analysis
### Current Architecture Issues
#### 1. **Full Vault Load on Startup** ⚠️ CRITICAL
- **Location**: `server/index.mjs` - `/api/vault` endpoint
- **Issue**: Loads ALL notes with FULL content synchronously
- **Impact**:
- 1000 files × 5KB average = 5MB payload
- Blocks UI rendering until complete
- Network transfer time dominates
```typescript
// Current flow:
app.get('/api/vault', async (req, res) => {
const notes = await loadVaultNotes(vaultDir); // ← Loads ALL notes with content
res.json({ notes });
});
```
#### 2. **Front-matter Enrichment on Every File** ⚠️ HIGH IMPACT
- **Location**: `server/index.mjs` - `loadVaultNotes()` function
- **Issue**: Calls `enrichFrontmatterOnOpen()` for every file during initial load
- **Impact**:
- Expensive YAML parsing for each file
- File I/O for each enrichment
- Multiplies load time by 2-3x
```typescript
// Current code (lines 138-141):
const enrichResult = await enrichFrontmatterOnOpen(absPath);
const content = enrichResult.content;
// This happens for EVERY file during loadVaultNotes()
```
#### 3. **No Lazy Loading Strategy**
- **Client**: `VaultService.allNotes()` stores all notes in memory
- **UI**: `NotesListComponent` renders all notes (with virtual scrolling, but still loaded)
- **Issue**: No on-demand content loading when note is selected
#### 4. **Meilisearch Indexing Overhead**
- **Issue**: Initial indexing happens during server startup
- **Impact**: Blocks vault watcher initialization
- **Current**: Fallback to filesystem if Meilisearch unavailable
#### 5. **Large JSON Payload**
- **Issue**: Full markdown content sent for every file
- **Impact**: Network bandwidth, parsing time, memory usage
- **Example**: 1000 files × 5KB = 5MB+ payload
---
## Current Data Flow
```
┌─────────────────────────────────────────────────────────────┐
│ Browser requests /api/vault │
└──────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Server: loadVaultNotes(vaultDir) │
│ - Walk filesystem recursively │
│ - For EACH file: │
│ - Read file content │
│ - enrichFrontmatterOnOpen() ← EXPENSIVE │
│ - Extract title, tags │
│ - Calculate stats │
└──────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Send large JSON payload (5MB+) │
└──────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Client: Parse JSON, store in VaultService.allNotes() │
│ - Blocks UI rendering │
│ - High memory usage │
└──────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Render UI with all notes │
│ - NotesListComponent renders all items │
│ - AppShellNimbusLayoutComponent initializes │
└─────────────────────────────────────────────────────────────┘
```
---
## Recommended Optimization Strategy
### Phase 1: Metadata-First Loading (QUICK WIN - 1-2 days)
**Goal**: Load UI in 2-3 seconds instead of 10-30 seconds
#### 1.1 Split Endpoints
Create two endpoints:
- **`/api/files/metadata`** - Fast, lightweight metadata only
- **`/api/vault`** - Full content (keep for backward compatibility)
```typescript
// NEW: Fast metadata endpoint
app.get('/api/files/metadata', async (req, res) => {
try {
// Try Meilisearch first (already implemented)
const client = meiliClient();
const indexUid = vaultIndexName(vaultDir);
const index = await ensureIndexSettings(client, indexUid);
const result = await index.search('', {
limit: 10000,
attributesToRetrieve: ['id', 'title', 'path', 'createdAt', 'updatedAt']
});
const items = Array.isArray(result.hits) ? result.hits : [];
res.json(items);
} catch (error) {
// Fallback to fast filesystem scan (no enrichment)
const notes = await loadVaultMetadataOnly(vaultDir);
res.json(buildFileMetadata(notes));
}
});
// NEW: Fast metadata-only loader (no enrichment)
const loadVaultMetadataOnly = async (vaultPath) => {
const notes = [];
const walk = async (currentDir) => {
// Same as loadVaultNotes but WITHOUT enrichFrontmatterOnOpen()
// Just read file stats and extract title from first heading
};
await walk(vaultPath);
return notes;
};
```
#### 1.2 Modify Client Initialization
Update `VaultService` to load metadata first:
```typescript
// In VaultService (pseudo-code)
async initializeVault() {
// Step 1: Load metadata immediately (fast)
const metadata = await this.http.get('/api/files/metadata').toPromise();
this.allNotes.set(metadata.map(m => ({
id: m.id,
title: m.title,
filePath: m.path,
createdAt: m.createdAt,
updatedAt: m.updatedAt,
content: '', // Empty initially
tags: [],
frontmatter: {}
})));
// Step 2: Load full content on-demand when note is selected
// (already implemented via /api/files endpoint)
}
```
#### 1.3 Defer Front-matter Enrichment
**Current**: Enrichment happens during `loadVaultNotes()` for ALL files
**Proposed**: Only enrich when file is opened
```typescript
// In server/index.mjs - GET /api/files endpoint (already exists)
app.get('/api/files', async (req, res) => {
try {
const pathParam = req.query.path;
// ... validation ...
// For markdown files, enrich ONLY when explicitly requested
if (!isExcalidraw && ext === '.md') {
const enrichResult = await enrichFrontmatterOnOpen(abs);
// ← This is fine here (on-demand), but remove from loadVaultNotes()
}
}
});
// In loadVaultNotes() - REMOVE enrichment
const loadVaultNotes = async (vaultPath) => {
const notes = [];
const walk = async (currentDir) => {
// ... directory walk ...
for (const entry of entries) {
if (!isMarkdownFile(entry)) continue;
try {
// REMOVE: const enrichResult = await enrichFrontmatterOnOpen(absPath);
// Just read the file as-is
const content = fs.readFileSync(entryPath, 'utf-8');
// Extract basic metadata without enrichment
const stats = fs.statSync(entryPath);
const title = extractTitle(content, fallback);
const tags = extractTags(content);
notes.push({
id: finalId,
title,
content,
tags,
mtime: stats.mtimeMs,
// ... other fields ...
});
} catch (err) {
console.error(`Failed to read note at ${entryPath}:`, err);
}
}
};
await walk(vaultPath);
return notes;
};
```
#### 1.4 Update VaultService to Load Content On-Demand
```typescript
// In src/app/services/vault.service.ts
export class VaultService {
private allNotesMetadata = signal<Note[]>([]);
private contentCache = new Map<string, string>();
// Lazy-load content when note is selected
async ensureNoteContent(noteId: string): Promise<Note | null> {
const note = this.allNotesMetadata().find(n => n.id === noteId);
if (!note) return null;
// If content already loaded, return
if (note.content) return note;
// Load content from server
try {
const response = await this.http.get(`/api/files`, {
params: { path: note.filePath }
}).toPromise();
// Update note with full content
note.content = response.content;
note.frontmatter = response.frontmatter;
return note;
} catch (error) {
console.error('Failed to load note content:', error);
return note;
}
}
}
```
---
### Phase 2: Pagination & Streaming (2-3 days)
**Goal**: Support vaults with 10,000+ files
#### 2.1 Implement Cursor-Based Pagination
```typescript
// Server endpoint with pagination
app.get('/api/files/metadata/paginated', async (req, res) => {
const limit = Math.min(parseInt(req.query.limit) || 100, 500);
const cursor = req.query.cursor || '';
try {
const client = meiliClient();
const indexUid = vaultIndexName(vaultDir);
const index = await ensureIndexSettings(client, indexUid);
const result = await index.search('', {
limit: limit + 1, // Fetch one extra to determine if more exist
offset: cursor ? parseInt(cursor) : 0,
attributesToRetrieve: ['id', 'title', 'path', 'createdAt', 'updatedAt']
});
const hasMore = result.hits.length > limit;
const items = result.hits.slice(0, limit);
const nextCursor = hasMore ? (parseInt(cursor || '0') + limit).toString() : null;
res.json({ items, nextCursor, hasMore });
} catch (error) {
res.status(500).json({ error: 'Pagination failed' });
}
});
```
#### 2.2 Implement Virtual Scrolling in NotesListComponent
```typescript
// In src/app/features/list/notes-list.component.ts
import { ScrollingModule } from '@angular/cdk/scrolling';
@Component({
// ...
imports: [CommonModule, ScrollableOverlayDirective, ScrollingModule],
template: `
<cdk-virtual-scroll-viewport itemSize="60" class="h-full">
<ul>
<li *cdkVirtualFor="let n of filtered()" class="p-3 hover:bg-surface1">
{{ n.title }}
</li>
</ul>
</cdk-virtual-scroll-viewport>
`
})
export class NotesListComponent {
// Virtual scrolling will only render visible items
}
```
---
### Phase 3: Server-Side Caching (1-2 days)
**Goal**: Avoid re-scanning filesystem on every request
#### 3.1 Implement In-Memory Metadata Cache
```typescript
// In server/index.mjs
let cachedMetadata = null;
let metadataCacheTime = 0;
const METADATA_CACHE_TTL = 5 * 60 * 1000; // 5 minutes
const getMetadataFromCache = async () => {
const now = Date.now();
if (cachedMetadata && (now - metadataCacheTime) < METADATA_CACHE_TTL) {
return cachedMetadata;
}
// Rebuild cache
cachedMetadata = await loadVaultMetadataOnly(vaultDir);
metadataCacheTime = now;
return cachedMetadata;
};
// Use in endpoints
app.get('/api/files/metadata', async (req, res) => {
try {
const metadata = await getMetadataFromCache();
res.json(buildFileMetadata(metadata));
} catch (error) {
res.status(500).json({ error: 'Failed to load metadata' });
}
});
// Invalidate cache on file changes
vaultWatcher.on('add', () => { metadataCacheTime = 0; });
vaultWatcher.on('change', () => { metadataCacheTime = 0; });
vaultWatcher.on('unlink', () => { metadataCacheTime = 0; });
```
#### 3.2 Defer Meilisearch Indexing
```typescript
// In server/index.mjs - defer initial indexing
let indexingInProgress = false;
const scheduleIndexing = async () => {
if (indexingInProgress) return;
indexingInProgress = true;
// Schedule indexing for later (don't block startup)
setImmediate(async () => {
try {
await fullReindex(vaultDir);
console.log('[Meili] Initial indexing complete');
} catch (error) {
console.warn('[Meili] Initial indexing failed:', error);
} finally {
indexingInProgress = false;
}
});
};
// Call during server startup instead of blocking
app.listen(PORT, () => {
console.log(`Server running on port ${PORT}`);
scheduleIndexing(); // Non-blocking
});
```
---
### Phase 4: Client-Side Optimization (1 day)
**Goal**: Smooth UI interactions even with large datasets
#### 4.1 Implement Signal-Based Lazy Loading
```typescript
// In VaultService
export class VaultService {
private allNotesMetadata = signal<Note[]>([]);
private loadedNoteIds = new Set<string>();
// Load content in background
preloadNearbyNotes(currentNoteId: string, range = 5) {
const notes = this.allNotesMetadata();
const idx = notes.findIndex(n => n.id === currentNoteId);
if (idx === -1) return;
// Preload nearby notes
for (let i = Math.max(0, idx - range); i <= Math.min(notes.length - 1, idx + range); i++) {
const noteId = notes[i].id;
if (!this.loadedNoteIds.has(noteId)) {
this.ensureNoteContent(noteId).then(() => {
this.loadedNoteIds.add(noteId);
});
}
}
}
}
```
#### 4.2 Optimize Change Detection
```typescript
// Already implemented in AppComponent
@Component({
// ...
changeDetection: ChangeDetectionStrategy.OnPush, // ✓ Already done
})
export class AppComponent {
// Use signals instead of observables
// Avoid unnecessary change detection cycles
}
```
---
## Implementation Roadmap
### Week 1: Phase 1 (Metadata-First Loading)
- [ ] Create `/api/files/metadata` endpoint
- [ ] Implement `loadVaultMetadataOnly()` function
- [ ] Remove enrichment from `loadVaultNotes()`
- [ ] Update `VaultService` to load metadata first
- [ ] Test with 1000+ file vault
- **Expected Result**: 10-30s → 3-5s startup time
### Week 2: Phase 2 (Pagination)
- [ ] Implement cursor-based pagination
- [ ] Add virtual scrolling to NotesListComponent
- [ ] Test with 10,000+ files
- **Expected Result**: Support unlimited file counts
### Week 3: Phase 3 (Server Caching)
- [ ] Implement in-memory metadata cache
- [ ] Defer Meilisearch indexing
- [ ] Add cache invalidation on file changes
- **Expected Result**: Reduced server load
### Week 4: Phase 4 (Client Optimization)
- [ ] Implement preloading strategy
- [ ] Profile and optimize hot paths
- [ ] Performance testing
- **Expected Result**: Smooth interactions
---
## Performance Metrics
### Before Optimization
```
Startup Time (1000 files):
- Server processing: 15-20s
- Network transfer: 5-10s
- Client parsing: 2-3s
- Total: 22-33s
Memory Usage:
- Server: 200-300MB
- Client: 150-200MB
```
### After Phase 1 (Metadata-First)
```
Startup Time (1000 files):
- Server processing: 1-2s (metadata only)
- Network transfer: 0.5-1s (small payload)
- Client parsing: 0.5-1s
- Total: 2-4s ✓
Memory Usage:
- Server: 50-100MB
- Client: 20-30MB (metadata only)
```
### After Phase 2 (Pagination)
```
Startup Time (10,000 files):
- Server processing: 0.5s (first page)
- Network transfer: 0.2-0.5s
- Client parsing: 0.2-0.5s
- Total: 1-1.5s ✓
Memory Usage:
- Server: 50-100MB (cache)
- Client: 5-10MB (first page only)
```
---
## Quick Wins (Can Implement Immediately)
1. **Remove enrichment from startup** (5 minutes)
- Comment out `enrichFrontmatterOnOpen()` in `loadVaultNotes()`
- Defer to on-demand loading
2. **Add metadata-only endpoint** (30 minutes)
- Create `/api/files/metadata` using existing Meilisearch integration
- Use fallback to fast filesystem scan
3. **Implement server-side caching** (1 hour)
- Cache metadata for 5 minutes
- Invalidate on file changes
4. **Defer Meilisearch indexing** (30 minutes)
- Use `setImmediate()` instead of blocking startup
---
## Testing Recommendations
### Load Testing
```bash
# Generate test vault with 1000+ files
node scripts/generate-test-vault.mjs --files 1000
# Measure startup time
time curl http://localhost:3000/api/files/metadata > /dev/null
# Monitor memory usage
node --inspect server/index.mjs
```
### Performance Profiling
```typescript
// Add timing logs
console.time('loadVaultMetadata');
const metadata = await loadVaultMetadataOnly(vaultDir);
console.timeEnd('loadVaultMetadata');
// Monitor in browser DevTools
Performance tab Network Measure /api/files/metadata
```
---
## Conclusion
By implementing this optimization strategy in phases, you can reduce startup time from **22-33 seconds to 1-2 seconds** while supporting vaults with 10,000+ files. The metadata-first approach is the key quick win that provides immediate benefits.
**Recommended Next Steps**:
1. Implement Phase 1 (Metadata-First) immediately
2. Measure performance improvements
3. Proceed with Phase 2-4 based on user feedback