ObsiViewer/docs/PERFORMENCE/strategy/PERFORMANCE_OPTIMIZATION_STRATEGY.md

18 KiB
Raw Blame History

Performance Optimization Strategy for Large Vault Startup

Executive Summary

When deploying ObsiViewer with a large vault (1000+ markdown files), the initial startup is slow because the application loads all notes with full content before rendering the UI. This document outlines a comprehensive strategy to improve the user experience through metadata-first loading, lazy loading, and server-side optimizations.

Expected Improvement: From 10-30 seconds startup → 2-5 seconds to interactive UI


Problem Analysis

Current Architecture Issues

1. Full Vault Load on Startup ⚠️ CRITICAL

  • Location: server/index.mjs - /api/vault endpoint
  • Issue: Loads ALL notes with FULL content synchronously
  • Impact:
    • 1000 files × 5KB average = 5MB payload
    • Blocks UI rendering until complete
    • Network transfer time dominates
// Current flow:
app.get('/api/vault', async (req, res) => {
  const notes = await loadVaultNotes(vaultDir);  // ← Loads ALL notes with content
  res.json({ notes });
});

2. Front-matter Enrichment on Every File ⚠️ HIGH IMPACT

  • Location: server/index.mjs - loadVaultNotes() function
  • Issue: Calls enrichFrontmatterOnOpen() for every file during initial load
  • Impact:
    • Expensive YAML parsing for each file
    • File I/O for each enrichment
    • Multiplies load time by 2-3x
// Current code (lines 138-141):
const enrichResult = await enrichFrontmatterOnOpen(absPath);
const content = enrichResult.content;
// This happens for EVERY file during loadVaultNotes()

3. No Lazy Loading Strategy

  • Client: VaultService.allNotes() stores all notes in memory
  • UI: NotesListComponent renders all notes (with virtual scrolling, but still loaded)
  • Issue: No on-demand content loading when note is selected

4. Meilisearch Indexing Overhead

  • Issue: Initial indexing happens during server startup
  • Impact: Blocks vault watcher initialization
  • Current: Fallback to filesystem if Meilisearch unavailable

5. Large JSON Payload

  • Issue: Full markdown content sent for every file
  • Impact: Network bandwidth, parsing time, memory usage
  • Example: 1000 files × 5KB = 5MB+ payload

Current Data Flow

┌─────────────────────────────────────────────────────────────┐
│ Browser requests /api/vault                                 │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│ Server: loadVaultNotes(vaultDir)                            │
│ - Walk filesystem recursively                               │
│ - For EACH file:                                            │
│   - Read file content                                       │
│   - enrichFrontmatterOnOpen() ← EXPENSIVE                   │
│   - Extract title, tags                                     │
│   - Calculate stats                                         │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│ Send large JSON payload (5MB+)                              │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│ Client: Parse JSON, store in VaultService.allNotes()        │
│ - Blocks UI rendering                                       │
│ - High memory usage                                         │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│ Render UI with all notes                                    │
│ - NotesListComponent renders all items                      │
│ - AppShellNimbusLayoutComponent initializes                 │
└─────────────────────────────────────────────────────────────┘

Phase 1: Metadata-First Loading (QUICK WIN - 1-2 days)

Goal: Load UI in 2-3 seconds instead of 10-30 seconds

1.1 Split Endpoints

Create two endpoints:

  • /api/files/metadata - Fast, lightweight metadata only
  • /api/vault - Full content (keep for backward compatibility)
// NEW: Fast metadata endpoint
app.get('/api/files/metadata', async (req, res) => {
  try {
    // Try Meilisearch first (already implemented)
    const client = meiliClient();
    const indexUid = vaultIndexName(vaultDir);
    const index = await ensureIndexSettings(client, indexUid);
    const result = await index.search('', {
      limit: 10000,
      attributesToRetrieve: ['id', 'title', 'path', 'createdAt', 'updatedAt']
    });
    
    const items = Array.isArray(result.hits) ? result.hits : [];
    res.json(items);
  } catch (error) {
    // Fallback to fast filesystem scan (no enrichment)
    const notes = await loadVaultMetadataOnly(vaultDir);
    res.json(buildFileMetadata(notes));
  }
});

// NEW: Fast metadata-only loader (no enrichment)
const loadVaultMetadataOnly = async (vaultPath) => {
  const notes = [];
  const walk = async (currentDir) => {
    // Same as loadVaultNotes but WITHOUT enrichFrontmatterOnOpen()
    // Just read file stats and extract title from first heading
  };
  await walk(vaultPath);
  return notes;
};

1.2 Modify Client Initialization

Update VaultService to load metadata first:

// In VaultService (pseudo-code)
async initializeVault() {
  // Step 1: Load metadata immediately (fast)
  const metadata = await this.http.get('/api/files/metadata').toPromise();
  this.allNotes.set(metadata.map(m => ({
    id: m.id,
    title: m.title,
    filePath: m.path,
    createdAt: m.createdAt,
    updatedAt: m.updatedAt,
    content: '', // Empty initially
    tags: [],
    frontmatter: {}
  })));
  
  // Step 2: Load full content on-demand when note is selected
  // (already implemented via /api/files endpoint)
}

1.3 Defer Front-matter Enrichment

Current: Enrichment happens during loadVaultNotes() for ALL files Proposed: Only enrich when file is opened

// In server/index.mjs - GET /api/files endpoint (already exists)
app.get('/api/files', async (req, res) => {
  try {
    const pathParam = req.query.path;
    // ... validation ...
    
    // For markdown files, enrich ONLY when explicitly requested
    if (!isExcalidraw && ext === '.md') {
      const enrichResult = await enrichFrontmatterOnOpen(abs);
      // ← This is fine here (on-demand), but remove from loadVaultNotes()
    }
  }
});

// In loadVaultNotes() - REMOVE enrichment
const loadVaultNotes = async (vaultPath) => {
  const notes = [];
  const walk = async (currentDir) => {
    // ... directory walk ...
    for (const entry of entries) {
      if (!isMarkdownFile(entry)) continue;
      
      try {
        // REMOVE: const enrichResult = await enrichFrontmatterOnOpen(absPath);
        // Just read the file as-is
        const content = fs.readFileSync(entryPath, 'utf-8');
        
        // Extract basic metadata without enrichment
        const stats = fs.statSync(entryPath);
        const title = extractTitle(content, fallback);
        const tags = extractTags(content);
        
        notes.push({
          id: finalId,
          title,
          content,
          tags,
          mtime: stats.mtimeMs,
          // ... other fields ...
        });
      } catch (err) {
        console.error(`Failed to read note at ${entryPath}:`, err);
      }
    }
  };
  await walk(vaultPath);
  return notes;
};

1.4 Update VaultService to Load Content On-Demand

// In src/app/services/vault.service.ts
export class VaultService {
  private allNotesMetadata = signal<Note[]>([]);
  private contentCache = new Map<string, string>();
  
  // Lazy-load content when note is selected
  async ensureNoteContent(noteId: string): Promise<Note | null> {
    const note = this.allNotesMetadata().find(n => n.id === noteId);
    if (!note) return null;
    
    // If content already loaded, return
    if (note.content) return note;
    
    // Load content from server
    try {
      const response = await this.http.get(`/api/files`, {
        params: { path: note.filePath }
      }).toPromise();
      
      // Update note with full content
      note.content = response.content;
      note.frontmatter = response.frontmatter;
      
      return note;
    } catch (error) {
      console.error('Failed to load note content:', error);
      return note;
    }
  }
}

Phase 2: Pagination & Streaming (2-3 days)

Goal: Support vaults with 10,000+ files

2.1 Implement Cursor-Based Pagination

// Server endpoint with pagination
app.get('/api/files/metadata/paginated', async (req, res) => {
  const limit = Math.min(parseInt(req.query.limit) || 100, 500);
  const cursor = req.query.cursor || '';
  
  try {
    const client = meiliClient();
    const indexUid = vaultIndexName(vaultDir);
    const index = await ensureIndexSettings(client, indexUid);
    
    const result = await index.search('', {
      limit: limit + 1, // Fetch one extra to determine if more exist
      offset: cursor ? parseInt(cursor) : 0,
      attributesToRetrieve: ['id', 'title', 'path', 'createdAt', 'updatedAt']
    });
    
    const hasMore = result.hits.length > limit;
    const items = result.hits.slice(0, limit);
    const nextCursor = hasMore ? (parseInt(cursor || '0') + limit).toString() : null;
    
    res.json({ items, nextCursor, hasMore });
  } catch (error) {
    res.status(500).json({ error: 'Pagination failed' });
  }
});

2.2 Implement Virtual Scrolling in NotesListComponent

// In src/app/features/list/notes-list.component.ts
import { ScrollingModule } from '@angular/cdk/scrolling';

@Component({
  // ...
  imports: [CommonModule, ScrollableOverlayDirective, ScrollingModule],
  template: `
    <cdk-virtual-scroll-viewport itemSize="60" class="h-full">
      <ul>
        <li *cdkVirtualFor="let n of filtered()" class="p-3 hover:bg-surface1">
          {{ n.title }}
        </li>
      </ul>
    </cdk-virtual-scroll-viewport>
  `
})
export class NotesListComponent {
  // Virtual scrolling will only render visible items
}

Phase 3: Server-Side Caching (1-2 days)

Goal: Avoid re-scanning filesystem on every request

3.1 Implement In-Memory Metadata Cache

// In server/index.mjs
let cachedMetadata = null;
let metadataCacheTime = 0;
const METADATA_CACHE_TTL = 5 * 60 * 1000; // 5 minutes

const getMetadataFromCache = async () => {
  const now = Date.now();
  if (cachedMetadata && (now - metadataCacheTime) < METADATA_CACHE_TTL) {
    return cachedMetadata;
  }
  
  // Rebuild cache
  cachedMetadata = await loadVaultMetadataOnly(vaultDir);
  metadataCacheTime = now;
  return cachedMetadata;
};

// Use in endpoints
app.get('/api/files/metadata', async (req, res) => {
  try {
    const metadata = await getMetadataFromCache();
    res.json(buildFileMetadata(metadata));
  } catch (error) {
    res.status(500).json({ error: 'Failed to load metadata' });
  }
});

// Invalidate cache on file changes
vaultWatcher.on('add', () => { metadataCacheTime = 0; });
vaultWatcher.on('change', () => { metadataCacheTime = 0; });
vaultWatcher.on('unlink', () => { metadataCacheTime = 0; });

3.2 Defer Meilisearch Indexing

// In server/index.mjs - defer initial indexing
let indexingInProgress = false;

const scheduleIndexing = async () => {
  if (indexingInProgress) return;
  indexingInProgress = true;
  
  // Schedule indexing for later (don't block startup)
  setImmediate(async () => {
    try {
      await fullReindex(vaultDir);
      console.log('[Meili] Initial indexing complete');
    } catch (error) {
      console.warn('[Meili] Initial indexing failed:', error);
    } finally {
      indexingInProgress = false;
    }
  });
};

// Call during server startup instead of blocking
app.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
  scheduleIndexing(); // Non-blocking
});

Phase 4: Client-Side Optimization (1 day)

Goal: Smooth UI interactions even with large datasets

4.1 Implement Signal-Based Lazy Loading

// In VaultService
export class VaultService {
  private allNotesMetadata = signal<Note[]>([]);
  private loadedNoteIds = new Set<string>();
  
  // Load content in background
  preloadNearbyNotes(currentNoteId: string, range = 5) {
    const notes = this.allNotesMetadata();
    const idx = notes.findIndex(n => n.id === currentNoteId);
    if (idx === -1) return;
    
    // Preload nearby notes
    for (let i = Math.max(0, idx - range); i <= Math.min(notes.length - 1, idx + range); i++) {
      const noteId = notes[i].id;
      if (!this.loadedNoteIds.has(noteId)) {
        this.ensureNoteContent(noteId).then(() => {
          this.loadedNoteIds.add(noteId);
        });
      }
    }
  }
}

4.2 Optimize Change Detection

// Already implemented in AppComponent
@Component({
  // ...
  changeDetection: ChangeDetectionStrategy.OnPush, // ✓ Already done
})
export class AppComponent {
  // Use signals instead of observables
  // Avoid unnecessary change detection cycles
}

Implementation Roadmap

Week 1: Phase 1 (Metadata-First Loading)

  • Create /api/files/metadata endpoint
  • Implement loadVaultMetadataOnly() function
  • Remove enrichment from loadVaultNotes()
  • Update VaultService to load metadata first
  • Test with 1000+ file vault
  • Expected Result: 10-30s → 3-5s startup time

Week 2: Phase 2 (Pagination)

  • Implement cursor-based pagination
  • Add virtual scrolling to NotesListComponent
  • Test with 10,000+ files
  • Expected Result: Support unlimited file counts

Week 3: Phase 3 (Server Caching)

  • Implement in-memory metadata cache
  • Defer Meilisearch indexing
  • Add cache invalidation on file changes
  • Expected Result: Reduced server load

Week 4: Phase 4 (Client Optimization)

  • Implement preloading strategy
  • Profile and optimize hot paths
  • Performance testing
  • Expected Result: Smooth interactions

Performance Metrics

Before Optimization

Startup Time (1000 files):
- Server processing: 15-20s
- Network transfer: 5-10s
- Client parsing: 2-3s
- Total: 22-33s

Memory Usage:
- Server: 200-300MB
- Client: 150-200MB

After Phase 1 (Metadata-First)

Startup Time (1000 files):
- Server processing: 1-2s (metadata only)
- Network transfer: 0.5-1s (small payload)
- Client parsing: 0.5-1s
- Total: 2-4s ✓

Memory Usage:
- Server: 50-100MB
- Client: 20-30MB (metadata only)

After Phase 2 (Pagination)

Startup Time (10,000 files):
- Server processing: 0.5s (first page)
- Network transfer: 0.2-0.5s
- Client parsing: 0.2-0.5s
- Total: 1-1.5s ✓

Memory Usage:
- Server: 50-100MB (cache)
- Client: 5-10MB (first page only)

Quick Wins (Can Implement Immediately)

  1. Remove enrichment from startup (5 minutes)

    • Comment out enrichFrontmatterOnOpen() in loadVaultNotes()
    • Defer to on-demand loading
  2. Add metadata-only endpoint (30 minutes)

    • Create /api/files/metadata using existing Meilisearch integration
    • Use fallback to fast filesystem scan
  3. Implement server-side caching (1 hour)

    • Cache metadata for 5 minutes
    • Invalidate on file changes
  4. Defer Meilisearch indexing (30 minutes)

    • Use setImmediate() instead of blocking startup

Testing Recommendations

Load Testing

# Generate test vault with 1000+ files
node scripts/generate-test-vault.mjs --files 1000

# Measure startup time
time curl http://localhost:3000/api/files/metadata > /dev/null

# Monitor memory usage
node --inspect server/index.mjs

Performance Profiling

// Add timing logs
console.time('loadVaultMetadata');
const metadata = await loadVaultMetadataOnly(vaultDir);
console.timeEnd('loadVaultMetadata');

// Monitor in browser DevTools
Performance tab  Network  Measure /api/files/metadata

Conclusion

By implementing this optimization strategy in phases, you can reduce startup time from 22-33 seconds to 1-2 seconds while supporting vaults with 10,000+ files. The metadata-first approach is the key quick win that provides immediate benefits.

Recommended Next Steps:

  1. Implement Phase 1 (Metadata-First) immediately
  2. Measure performance improvements
  3. Proceed with Phase 2-4 based on user feedback