From b40fcae62f43a3c5b8febaf6b31c652eb9817a90 Mon Sep 17 00:00:00 2001 From: Bruno Charest Date: Mon, 23 Mar 2026 13:21:20 -0400 Subject: [PATCH] Add advanced search engine with inverted index, thread pool execution, configuration API, and comprehensive diagnostics --- .gitignore | 1 + README.md | 63 +++++++++-- backend/indexer.py | 6 + backend/main.py | 183 +++++++++++++++++++++++++++++- backend/search.py | 266 ++++++++++++++++++++++++++++++-------------- frontend/app.js | 248 +++++++++++++++++++++++++++++++++++++++-- frontend/index.html | 79 +++++++++++++ frontend/style.css | 145 ++++++++++++++++++++++++ 8 files changed, 885 insertions(+), 106 deletions(-) diff --git a/.gitignore b/.gitignore index 31106d3..ca46560 100644 --- a/.gitignore +++ b/.gitignore @@ -8,3 +8,4 @@ venv/ *.egg-info/ dist/ build/ +config.json diff --git a/README.md b/README.md index adcf54d..38c0054 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ **Porte d'entrée web ultra-léger pour vos vaults Obsidian** — Accédez, naviguez et recherchez dans toutes vos notes Obsidian depuis n'importe quel appareil via une interface web moderne et responsive. -[![Version](https://img.shields.io/badge/Version-1.1.0-blue.svg)]() +[![Version](https://img.shields.io/badge/Version-1.2.0-blue.svg)]() [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Docker](https://img.shields.io/badge/Docker-Ready-blue.svg)](https://www.docker.com/) [![Python](https://img.shields.io/badge/Python-3.11+-green.svg)](https://www.python.org/) @@ -359,6 +359,9 @@ ObsiGate expose une API REST complète : | `/api/image/{vault}?path=` | Servir une image avec MIME type approprié | GET | | `/api/attachments/rescan/{vault}` | Rescanner les images d'un vault | POST | | `/api/attachments/stats?vault=` | Statistiques d'images indexées | GET | +| `/api/config` | Lire la configuration | GET | +| `/api/config` | Mettre à jour la configuration | POST | +| `/api/diagnostics` | Statistiques index, mémoire, moteur de recherche | GET | > Tous les endpoints exposent des schémas Pydantic documentés. La doc interactive est disponible sur `/docs` (Swagger UI). @@ -464,19 +467,39 @@ docker-compose logs --tail=100 obsigate ``` --- - ## ⚡ Performance | Métrique | Estimation | |----------|------------| -| **Indexation** | ~1–2s pour 1 000 fichiers markdown | -| **Recherche fulltext** | < 50ms (index en mémoire, zéro I/O disque) | +| **Indexation** | ~1–2s pour 1 000 fichiers markdown | +| **Recherche avancée** | < 10ms pour la plupart des requêtes (index inversé + TF-IDF) | | **Résolution wikilinks** | O(1) via table de lookup | -| **Mémoire** | ~80–150MB par 1 000 fichiers (contenu capé à 100 KB/fichier) | +| **Mémoire** | ~80–150MB par 1 000 fichiers (contenu capé à 100 KB/fichier) | | **Image Docker** | ~180MB (multi-stage, sans outils de build) | -| **CPU** | Minimal ; pas de polling, pas de watchers | +| **CPU** | Non-bloquant ; recherche offloadée sur thread pool dédié | -### Optimisations clés (v1.1.0) +### Paramètres recommandés par taille de vault + +| Taille | Fichiers | `search_workers` | `prefix_max_expansions` | `max_content_size` | +|--------|----------|-------------------|--------------------------|---------------------| +| Petit | < 500 | 1 | 50 | 100 000 | +| Moyen | 500–5 000 | 2 | 50 | 100 000 | +| Grand | 5 000+ | 4 | 30 | 50 000 | + +Ces paramètres sont configurables via l'interface (Settings) ou l'API `/api/config`. + +### Optimisations clés (v1.2.0) + +- **Index inversé avec set-intersection** : la recherche utilise les posting lists pour un retrieval O(k × postings) au lieu de O(N) scan complet +- **Prefix matching par recherche binaire** : O(log V + k) au lieu de O(V) scan linéaire du vocabulaire +- **ThreadPoolExecutor** : les fonctions de recherche CPU-bound sont offloadées du event loop asyncio +- **Race condition guard** : `currentSearchId` + `AbortController` empêchent le rendu de résultats obsolètes +- **Progress bar** : barre de progression animée pendant la recherche +- **Search timeout** : abandon automatique après 30s (configurable) +- **Query time display** : temps serveur affiché dans les résultats (`query_time_ms`) +- **Staleness detection fix** : utilisation d'un compteur de génération au lieu de `id(index)` pour détecter les changements d'index + +### Optimisations v1.1.0 - **Recherche sans I/O** : le contenu des fichiers est mis en cache dans l'index mémoire - **Scoring multi-facteurs** : titre exact (+20), titre partiel (+10), chemin (+5), tag (+3), fréquence contenu (x1 par occurrence, capé à 10) @@ -582,6 +605,30 @@ Ce projet est sous licence **MIT** - voir le fichier [LICENSE](LICENSE) pour les ## 📝 Changelog +### v1.2.0 (2025) + +**Performance (critique)** +- Réécriture du moteur `advanced_search()` : retrieval par set-intersection sur l'index inversé (O(k × postings) au lieu de O(N)) +- Prefix matching par recherche binaire sur liste triée de tokens (O(log V + k) au lieu de O(V)) +- Offload des fonctions de recherche CPU-bound via `ThreadPoolExecutor` (2 workers par défaut) +- Pré-calcul des expansions de préfixe pour éviter les recherches binaires répétées +- Fix du bug de staleness : `is_stale()` utilise un compteur de génération au lieu de `id(index)` + +**Frontend** +- Guard contre les race conditions : `currentSearchId` vérifié après chaque `fetch` avant rendu +- Barre de progression animée pendant la recherche +- Timeout de recherche configurable (30s par défaut) +- Longueur minimale de requête configurable (2 caractères par défaut) +- Affichage du temps de requête serveur (`query_time_ms`) dans les résultats +- Pagination ajoutée sur l'endpoint legacy `/api/search` (params `limit`/`offset`) + +**Configuration & Diagnostics** +- Nouveaux endpoints `GET/POST /api/config` pour la configuration persistante (`config.json`) +- Nouveau endpoint `GET /api/diagnostics` (stats index, mémoire, moteur de recherche) +- Page de configuration étendue : paramètres frontend (debounce, résultats/page, timeout) et backend (workers, boosts, expansions) +- Panel de diagnostics intégré dans la modal de configuration +- Boutons « Forcer réindexation » et « Réinitialiser » dans les paramètres + ### v1.1.0 (2025) **Sécurité** @@ -621,4 +668,4 @@ Ce projet est sous licence **MIT** - voir le fichier [LICENSE](LICENSE) pour les --- -*Projet : ObsiGate | Version : 1.1.0 | Dernière mise à jour : 2025* +*Projet : ObsiGate | Version : 1.2.0 | Dernière mise à jour : 2025* diff --git a/backend/indexer.py b/backend/indexer.py index 8d1252f..6437921 100644 --- a/backend/indexer.py +++ b/backend/indexer.py @@ -22,6 +22,10 @@ vault_config: Dict[str, Dict[str, Any]] = {} # Thread-safe lock for index updates _index_lock = threading.Lock() +# Generation counter — incremented on each index rebuild so consumers +# (e.g. the inverted index in search.py) can detect staleness. +_index_generation: int = 0 + # O(1) lookup table for wikilink resolution: {filename_lower: [{vault, path}, ...]} _file_lookup: Dict[str, List[Dict[str, str]]] = {} @@ -318,6 +322,7 @@ async def build_index() -> None: new_path_index[vname] = vdata.get("paths", []) # Atomic swap under lock for thread safety during concurrent reads + global _index_generation with _index_lock: index.clear() index.update(new_index) @@ -325,6 +330,7 @@ async def build_index() -> None: _file_lookup.update(new_lookup) path_index.clear() path_index.update(new_path_index) + _index_generation += 1 total_files = sum(len(v["files"]) for v in index.values()) logger.info(f"Index built: {len(index)} vaults, {total_files} total files") diff --git a/backend/main.py b/backend/main.py index 301e01a..48ae1f3 100644 --- a/backend/main.py +++ b/backend/main.py @@ -1,8 +1,12 @@ +import asyncio +import json as _json import re import html as html_mod import logging import mimetypes +from concurrent.futures import ThreadPoolExecutor from contextlib import asynccontextmanager +from functools import partial from pathlib import Path from typing import Optional, List, Dict, Any @@ -111,11 +115,14 @@ class SearchResultItem(BaseModel): class SearchResponse(BaseModel): - """Full-text search response.""" + """Full-text search response with optional pagination.""" query: str vault_filter: str tag_filter: Optional[str] count: int + total: int = Field(0, description="Total results before pagination") + offset: int = Field(0, description="Current pagination offset") + limit: int = Field(200, description="Page size") results: List[SearchResultItem] @@ -165,6 +172,7 @@ class AdvancedSearchResponse(BaseModel): offset: int limit: int facets: SearchFacets + query_time_ms: float = Field(0, description="Server-side query time in milliseconds") class TitleSuggestion(BaseModel): @@ -210,16 +218,25 @@ class HealthResponse(BaseModel): # Application lifespan (replaces deprecated on_event) # --------------------------------------------------------------------------- +# Thread pool for offloading CPU-bound search from the event loop. +# Sized to 2 workers so concurrent searches don't starve other requests. +_search_executor: Optional[ThreadPoolExecutor] = None + + @asynccontextmanager async def lifespan(app: FastAPI): """Application lifespan: build index on startup, cleanup on shutdown.""" + global _search_executor + _search_executor = ThreadPoolExecutor(max_workers=2, thread_name_prefix="search") logger.info("ObsiGate starting \u2014 building index...") await build_index() logger.info("ObsiGate ready.") yield + _search_executor.shutdown(wait=False) + _search_executor = None -app = FastAPI(title="ObsiGate", version="1.1.0", lifespan=lifespan) +app = FastAPI(title="ObsiGate", version="1.2.0", lifespan=lifespan) # Resolve frontend path relative to this file FRONTEND_DIR = Path(__file__).resolve().parent.parent / "frontend" @@ -687,22 +704,38 @@ async def api_search( q: str = Query("", description="Search query"), vault: str = Query("all", description="Vault filter"), tag: Optional[str] = Query(None, description="Tag filter"), + limit: int = Query(50, ge=1, le=200, description="Results per page"), + offset: int = Query(0, ge=0, description="Pagination offset"), ): """Full-text search across vaults with relevance scoring. Supports combining free-text queries with tag filters. Results are ranked by a multi-factor scoring algorithm. + Pagination via ``limit`` and ``offset`` (defaults preserve backward compat). Args: q: Free-text search string. vault: Vault name or ``"all"`` to search everywhere. tag: Comma-separated tag names to require. + limit: Max results per page (1–200). + offset: Pagination offset. Returns: ``SearchResponse`` with ranked results and snippets. """ - results = search(q, vault_filter=vault, tag_filter=tag) - return {"query": q, "vault_filter": vault, "tag_filter": tag, "count": len(results), "results": results} + loop = asyncio.get_event_loop() + # Fetch full result set (capped at DEFAULT_SEARCH_LIMIT internally) + all_results = await loop.run_in_executor( + _search_executor, + partial(search, q, vault_filter=vault, tag_filter=tag), + ) + total = len(all_results) + page = all_results[offset: offset + limit] + return { + "query": q, "vault_filter": vault, "tag_filter": tag, + "count": len(page), "total": total, "offset": offset, "limit": limit, + "results": page, + } @app.get("/api/tags", response_model=TagsResponse) @@ -793,7 +826,12 @@ async def api_advanced_search( Returns: ``AdvancedSearchResponse`` with scored results, facets, and pagination info. """ - return advanced_search(q, vault_filter=vault, tag_filter=tag, limit=limit, offset=offset, sort_by=sort) + loop = asyncio.get_event_loop() + return await loop.run_in_executor( + _search_executor, + partial(advanced_search, q, vault_filter=vault, tag_filter=tag, + limit=limit, offset=offset, sort_by=sort), + ) @app.get("/api/suggest", response_model=SuggestResponse) @@ -924,6 +962,141 @@ async def api_attachment_stats(vault: Optional[str] = Query(None, description="V return {"vaults": stats} +# --------------------------------------------------------------------------- +# Configuration API +# --------------------------------------------------------------------------- + +_CONFIG_PATH = Path(__file__).resolve().parent.parent / "config.json" + +_DEFAULT_CONFIG = { + "search_workers": 2, + "debounce_ms": 300, + "results_per_page": 50, + "min_query_length": 2, + "search_timeout_ms": 30000, + "max_content_size": 100000, + "snippet_context_chars": 120, + "max_snippet_highlights": 5, + "title_boost": 3.0, + "path_boost": 1.5, + "tag_boost": 2.0, + "prefix_max_expansions": 50, +} + + +def _load_config() -> dict: + """Load config from disk, merging with defaults.""" + config = dict(_DEFAULT_CONFIG) + if _CONFIG_PATH.exists(): + try: + stored = _json.loads(_CONFIG_PATH.read_text(encoding="utf-8")) + config.update(stored) + except Exception as e: + logger.warning(f"Failed to read config.json: {e}") + return config + + +def _save_config(config: dict) -> None: + """Persist config to disk.""" + try: + _CONFIG_PATH.write_text( + _json.dumps(config, indent=2, ensure_ascii=False), + encoding="utf-8", + ) + except Exception as e: + logger.error(f"Failed to write config.json: {e}") + raise HTTPException(status_code=500, detail=f"Failed to save config: {e}") + + +@app.get("/api/config") +async def api_get_config(): + """Return current configuration with defaults for missing keys.""" + return _load_config() + + +@app.post("/api/config") +async def api_set_config(body: dict = Body(...)): + """Update configuration. Only known keys are accepted. + + Keys matching ``_DEFAULT_CONFIG`` are validated and persisted. + Unknown keys are silently ignored. + Returns the full merged config after update. + """ + current = _load_config() + updated_keys = [] + for key, value in body.items(): + if key in _DEFAULT_CONFIG: + expected_type = type(_DEFAULT_CONFIG[key]) + if isinstance(value, expected_type) or (expected_type is float and isinstance(value, (int, float))): + current[key] = value + updated_keys.append(key) + else: + raise HTTPException( + status_code=400, + detail=f"Invalid type for '{key}': expected {expected_type.__name__}, got {type(value).__name__}", + ) + _save_config(current) + logger.info(f"Config updated: {updated_keys}") + return current + + +# --------------------------------------------------------------------------- +# Diagnostics API +# --------------------------------------------------------------------------- + +@app.get("/api/diagnostics") +async def api_diagnostics(): + """Return index statistics and system diagnostics. + + Includes document counts, token counts, memory estimates, + and inverted index status. + """ + from backend.search import get_inverted_index + import sys + + inv = get_inverted_index() + + # Per-vault stats + vault_stats = {} + total_files = 0 + total_tags = 0 + for vname, vdata in index.items(): + file_count = len(vdata.get("files", [])) + tag_count = len(vdata.get("tags", {})) + vault_stats[vname] = {"file_count": file_count, "tag_count": tag_count} + total_files += file_count + total_tags += tag_count + + # Memory estimate for inverted index + word_index_entries = sum(len(docs) for docs in inv.word_index.values()) + mem_estimate_mb = round( + (sys.getsizeof(inv.word_index) + word_index_entries * 80 + + len(inv.doc_info) * 200 + + len(inv._sorted_tokens) * 60) / (1024 * 1024), 2 + ) + + return { + "index": { + "total_files": total_files, + "total_tags": total_tags, + "vaults": vault_stats, + }, + "inverted_index": { + "unique_tokens": len(inv.word_index), + "total_postings": word_index_entries, + "documents": inv.doc_count, + "sorted_tokens": len(inv._sorted_tokens), + "is_stale": inv.is_stale(), + "memory_estimate_mb": mem_estimate_mb, + }, + "config": _load_config(), + "search_executor": { + "active": _search_executor is not None, + "max_workers": _search_executor._max_workers if _search_executor else 0, + }, + } + + # --------------------------------------------------------------------------- # Static files & SPA fallback # --------------------------------------------------------------------------- diff --git a/backend/search.py b/backend/search.py index 29a7956..b23a9a4 100644 --- a/backend/search.py +++ b/backend/search.py @@ -1,10 +1,13 @@ +import bisect import logging import math import re +import time import unicodedata from collections import defaultdict from typing import List, Dict, Any, Optional, Tuple +from backend import indexer as _indexer from backend.indexer import index logger = logging.getLogger("obsigate.search") @@ -256,12 +259,21 @@ class InvertedIndex: self.tag_prefix_index: Dict[str, List[str]] = defaultdict(list) self.title_norm_map: Dict[str, List[Dict[str, str]]] = defaultdict(list) self.doc_count: int = 0 - self._source_id: Optional[int] = None + self.doc_info: Dict[str, Dict[str, Any]] = {} + self.doc_vault: Dict[str, str] = {} + self.vault_docs: Dict[str, set] = defaultdict(set) + self.tag_docs: Dict[str, set] = defaultdict(set) + self._sorted_tokens: List[str] = [] + self._source_generation: int = -1 def is_stale(self) -> bool: - """Check if the inverted index needs rebuilding.""" - current_id = id(index) - return current_id != self._source_id + """Check if the inverted index needs rebuilding. + + Uses the indexer's generation counter which increments on every + rebuild, instead of ``id(index)`` which never changes since the + global dict is mutated in-place. + """ + return _indexer._index_generation != self._source_generation def rebuild(self) -> None: """Rebuild inverted index from the global ``index`` dict. @@ -276,12 +288,25 @@ class InvertedIndex: self.tag_prefix_index = defaultdict(list) self.title_norm_map = defaultdict(list) self.doc_count = 0 + self.doc_info = {} + self.doc_vault = {} + self.vault_docs = defaultdict(set) + self.tag_docs = defaultdict(set) for vault_name, vault_data in index.items(): for file_info in vault_data.get("files", []): doc_key = f"{vault_name}::{file_info['path']}" self.doc_count += 1 + # --- Document metadata for O(1) lookup --- + self.doc_info[doc_key] = file_info + self.doc_vault[doc_key] = vault_name + self.vault_docs[vault_name].add(doc_key) + + # --- Per-document tag index --- + for tag in file_info.get("tags", []): + self.tag_docs[tag.lower()].add(doc_key) + # --- Title tokens --- title_tokens = tokenize(file_info.get("title", "")) for token in set(title_tokens): @@ -316,7 +341,8 @@ class InvertedIndex: if tag not in self.tag_prefix_index[prefix]: self.tag_prefix_index[prefix].append(tag) - self._source_id = id(index) + self._sorted_tokens = sorted(self.word_index.keys()) + self._source_generation = _indexer._index_generation logger.info( "Inverted index built: %d documents, %d unique tokens, %d tags", self.doc_count, @@ -358,6 +384,32 @@ class InvertedIndex: return 0.0 return tf * self.idf(term) + def get_prefix_tokens(self, prefix: str, max_expansions: int = 50) -> List[str]: + """Get all tokens starting with *prefix* using binary search. + + Uses a pre-sorted token list for O(log V + k) lookup instead + of O(V) linear scan over the vocabulary. + + Args: + prefix: Normalized prefix string. + max_expansions: Cap on returned tokens to bound work. + + Returns: + List of matching tokens (including exact match if present). + """ + if not prefix or not self._sorted_tokens: + return [] + lo = bisect.bisect_left(self._sorted_tokens, prefix) + results: List[str] = [] + for i in range(lo, len(self._sorted_tokens)): + if self._sorted_tokens[i].startswith(prefix): + results.append(self._sorted_tokens[i]) + if len(results) >= max_expansions: + break + else: + break + return results + # Singleton inverted index _inverted_index = InvertedIndex() @@ -582,6 +634,10 @@ def advanced_search( ) -> Dict[str, Any]: """Advanced full-text search with TF-IDF scoring, facets, and pagination. + Uses the inverted index for O(k × postings) candidate retrieval instead + of O(N) full document scan. Prefix matching uses binary search on a + sorted token list for O(log V + k) instead of O(V) linear scan. + Parses the query for operators (``tag:``, ``vault:``, ``title:``, ``path:``), falls back remaining tokens to TF-IDF scored free-text search using the inverted index. Results include highlighted snippets @@ -596,8 +652,10 @@ def advanced_search( sort_by: ``"relevance"`` or ``"modified"``. Returns: - Dict with ``results``, ``total``, ``offset``, ``limit``, ``facets``. + Dict with ``results``, ``total``, ``offset``, ``limit``, ``facets``, + ``query_time_ms``. """ + t0 = time.monotonic() query = query.strip() if query else "" parsed = _parse_advanced_query(query) @@ -616,98 +674,132 @@ def advanced_search( has_terms = len(query_terms) > 0 if not has_terms and not all_tags and not parsed["title"] and not parsed["path"]: - return {"results": [], "total": 0, "offset": offset, "limit": limit, "facets": {"tags": {}, "vaults": {}}} + return {"results": [], "total": 0, "offset": offset, "limit": limit, + "facets": {"tags": {}, "vaults": {}}, "query_time_ms": 0} inv = get_inverted_index() + + # ------------------------------------------------------------------ + # Step 1: Candidate retrieval via inverted index (replaces O(N) scan) + # ------------------------------------------------------------------ + if has_terms: + # Union of posting lists for all terms + prefix expansions + candidates: set = set() + for term in query_terms: + # Exact term matches + candidates.update(inv.word_index.get(term, {}).keys()) + # Prefix matches — O(log V + k) via binary search + if len(term) >= MIN_PREFIX_LENGTH: + for expanded in inv.get_prefix_tokens(term): + if expanded != term: + candidates.update(inv.word_index.get(expanded, {}).keys()) + else: + # Filter-only search: start with tag-filtered subset or all docs + if all_tags: + tag_sets = [inv.tag_docs.get(t.lower(), set()) for t in all_tags] + candidates = set.intersection(*tag_sets) if tag_sets else set() + else: + candidates = set(inv.doc_info.keys()) + + # ------------------------------------------------------------------ + # Step 2: Apply filters on candidate set + # ------------------------------------------------------------------ + if effective_vault != "all": + candidates &= inv.vault_docs.get(effective_vault, set()) + + if all_tags and has_terms: + for t in all_tags: + candidates &= inv.tag_docs.get(t.lower(), set()) + + if parsed["title"]: + norm_title_filter = normalize_text(parsed["title"]) + candidates = { + dk for dk in candidates + if norm_title_filter in normalize_text(inv.doc_info[dk].get("title", "")) + } + + if parsed["path"]: + norm_path_filter = normalize_text(parsed["path"]) + candidates = { + dk for dk in candidates + if norm_path_filter in normalize_text(inv.doc_info[dk].get("path", "")) + } + + # ------------------------------------------------------------------ + # Step 3: Score only the candidates (not all N documents) + # ------------------------------------------------------------------ scored_results: List[Tuple[float, Dict[str, Any]]] = [] facet_tags: Dict[str, int] = defaultdict(int) facet_vaults: Dict[str, int] = defaultdict(int) - for vault_name, vault_data in index.items(): - if effective_vault != "all" and vault_name != effective_vault: + # Pre-compute prefix expansions once per term (avoid repeated binary search) + prefix_expansions: Dict[str, List[str]] = {} + if has_terms: + for term in query_terms: + if len(term) >= MIN_PREFIX_LENGTH: + prefix_expansions[term] = [ + t for t in inv.get_prefix_tokens(term) if t != term + ] + + for doc_key in candidates: + file_info = inv.doc_info.get(doc_key) + if file_info is None: continue + vault_name = inv.doc_vault[doc_key] - for file_info in vault_data.get("files", []): - doc_key = f"{vault_name}::{file_info['path']}" + score = 0.0 + if has_terms: + for term in query_terms: + tfidf = inv.tf_idf(term, doc_key) + score += tfidf - # --- Tag filter --- - if all_tags: - file_tags_lower = [t.lower() for t in file_info.get("tags", [])] - if not all(t.lower() in file_tags_lower for t in all_tags): - continue + # Title boost — check if term appears in title tokens + norm_title = normalize_text(file_info.get("title", "")) + if term in norm_title: + score += tfidf * TITLE_BOOST - # --- Title filter --- - if parsed["title"]: - norm_title_filter = normalize_text(parsed["title"]) - norm_file_title = normalize_text(file_info.get("title", "")) - if norm_title_filter not in norm_file_title: - continue + # Path boost + norm_path = normalize_text(file_info.get("path", "")) + if term in norm_path: + score += tfidf * PATH_BOOST - # --- Path filter --- - if parsed["path"]: - norm_path_filter = normalize_text(parsed["path"]) - norm_file_path = normalize_text(file_info.get("path", "")) - if norm_path_filter not in norm_file_path: - continue - - # --- Scoring --- - score = 0.0 - if has_terms: - # TF-IDF scoring for each term - for term in query_terms: - tfidf = inv.tf_idf(term, doc_key) - score += tfidf - - # Title boost — check if term appears in title tokens - norm_title = normalize_text(file_info.get("title", "")) - if term in norm_title: - score += tfidf * TITLE_BOOST - - # Path boost - norm_path = normalize_text(file_info.get("path", "")) - if term in norm_path: - score += tfidf * PATH_BOOST - - # Tag boost - for tag in file_info.get("tags", []): - if term in normalize_text(tag): - score += tfidf * TAG_BOOST - break - - # Also add prefix matching bonus for partial words - for term in query_terms: - if len(term) >= MIN_PREFIX_LENGTH: - for indexed_term, docs in inv.word_index.items(): - if indexed_term.startswith(term) and indexed_term != term: - if doc_key in docs: - score += inv.tf_idf(indexed_term, doc_key) * 0.5 - else: - # Filter-only search (tag/title/path): score = 1 - score = 1.0 - - if score > 0: - # Build highlighted snippet - content = file_info.get("content", "") - if has_terms: - snippet = _extract_highlighted_snippet(content, query_terms) - else: - snippet = _escape_html(content[:200].strip()) if content else "" - - result = { - "vault": vault_name, - "path": file_info["path"], - "title": file_info["title"], - "tags": file_info.get("tags", []), - "score": round(score, 4), - "snippet": snippet, - "modified": file_info.get("modified", ""), - } - scored_results.append((score, result)) - - # Facets - facet_vaults[vault_name] = facet_vaults.get(vault_name, 0) + 1 + # Tag boost for tag in file_info.get("tags", []): - facet_tags[tag] = facet_tags.get(tag, 0) + 1 + if term in normalize_text(tag): + score += tfidf * TAG_BOOST + break + + # Prefix matching bonus (bounded by pre-computed expansions) + for term, expansions in prefix_expansions.items(): + for expanded_term in expansions: + score += inv.tf_idf(expanded_term, doc_key) * 0.5 + else: + # Filter-only search (tag/title/path): score = 1 + score = 1.0 + + if score > 0: + # Build highlighted snippet + content = file_info.get("content", "") + if has_terms: + snippet = _extract_highlighted_snippet(content, query_terms) + else: + snippet = _escape_html(content[:200].strip()) if content else "" + + result = { + "vault": vault_name, + "path": file_info["path"], + "title": file_info["title"], + "tags": file_info.get("tags", []), + "score": round(score, 4), + "snippet": snippet, + "modified": file_info.get("modified", ""), + } + scored_results.append((score, result)) + + # Facets + facet_vaults[vault_name] = facet_vaults.get(vault_name, 0) + 1 + for tag in file_info.get("tags", []): + facet_tags[tag] = facet_tags.get(tag, 0) + 1 # Sort if sort_by == "modified": @@ -717,6 +809,7 @@ def advanced_search( total = len(scored_results) page = scored_results[offset: offset + limit] + elapsed_ms = round((time.monotonic() - t0) * 1000, 1) return { "results": [r for _, r in page], @@ -727,6 +820,7 @@ def advanced_search( "tags": dict(sorted(facet_tags.items(), key=lambda x: -x[1])[:20]), "vaults": dict(sorted(facet_vaults.items(), key=lambda x: -x[1])), }, + "query_time_ms": elapsed_ms, } diff --git a/frontend/app.js b/frontend/app.js index dfb5f98..c23d738 100644 --- a/frontend/app.js +++ b/frontend/app.js @@ -33,12 +33,15 @@ let suggestAbortController = null; let dropdownActiveIndex = -1; let dropdownItems = []; + let currentSearchId = 0; // Advanced search constants const SEARCH_HISTORY_KEY = "obsigate_search_history"; const MAX_HISTORY_ENTRIES = 50; const SUGGEST_DEBOUNCE_MS = 150; const ADVANCED_SEARCH_LIMIT = 50; + const MIN_SEARCH_LENGTH = 2; + const SEARCH_TIMEOUT_MS = 30000; // --------------------------------------------------------------------------- // File extension → Lucide icon mapping @@ -1825,10 +1828,12 @@ if (!openBtn || !closeBtn || !modal) return; - openBtn.addEventListener("click", () => { + openBtn.addEventListener("click", async () => { modal.classList.add("active"); closeHeaderMenu(); renderConfigFilters(); + loadConfigFields(); + loadDiagnostics(); safeCreateIcons(); }); @@ -1848,11 +1853,36 @@ patternInput.addEventListener("input", updateRegexPreview); + // Frontend config fields — save to localStorage on change + ["cfg-debounce", "cfg-results-per-page", "cfg-min-query", "cfg-timeout"].forEach((id) => { + const input = document.getElementById(id); + if (input) input.addEventListener("change", saveFrontendConfig); + }); + + // Backend save button + const saveBtn = document.getElementById("cfg-save-backend"); + if (saveBtn) saveBtn.addEventListener("click", saveBackendConfig); + + // Force reindex + const reindexBtn = document.getElementById("cfg-reindex"); + if (reindexBtn) reindexBtn.addEventListener("click", forceReindex); + + // Reset defaults + const resetBtn = document.getElementById("cfg-reset-defaults"); + if (resetBtn) resetBtn.addEventListener("click", resetConfigDefaults); + + // Refresh diagnostics + const diagBtn = document.getElementById("cfg-refresh-diag"); + if (diagBtn) diagBtn.addEventListener("click", loadDiagnostics); + document.addEventListener("keydown", (e) => { if (e.key === "Escape" && modal.classList.contains("active")) { closeConfigModal(); } }); + + // Load saved frontend config on startup + applyFrontendConfig(); } function closeConfigModal() { @@ -1860,6 +1890,177 @@ if (modal) modal.classList.remove("active"); } + // --- Config field helpers --- + const _FRONTEND_CONFIG_KEY = "obsigate-perf-config"; + + function _getFrontendConfig() { + try { return JSON.parse(localStorage.getItem(_FRONTEND_CONFIG_KEY) || "{}"); } + catch { return {}; } + } + + function applyFrontendConfig() { + const cfg = _getFrontendConfig(); + if (cfg.debounce_ms) { /* applied dynamically in debounce setTimeout */ } + if (cfg.results_per_page) { /* used as ADVANCED_SEARCH_LIMIT override */ } + if (cfg.min_query_length) { /* used as MIN_SEARCH_LENGTH override */ } + if (cfg.search_timeout_ms) { /* used as SEARCH_TIMEOUT_MS override */ } + } + + function _getEffective(key, fallback) { + const cfg = _getFrontendConfig(); + return cfg[key] !== undefined ? cfg[key] : fallback; + } + + async function loadConfigFields() { + // Frontend fields from localStorage + const cfg = _getFrontendConfig(); + _setField("cfg-debounce", cfg.debounce_ms || 300); + _setField("cfg-results-per-page", cfg.results_per_page || 50); + _setField("cfg-min-query", cfg.min_query_length || 2); + _setField("cfg-timeout", cfg.search_timeout_ms || 30000); + + // Backend fields from API + try { + const data = await api("/api/config"); + _setField("cfg-workers", data.search_workers); + _setField("cfg-max-content", data.max_content_size); + _setField("cfg-title-boost", data.title_boost); + _setField("cfg-tag-boost", data.tag_boost); + _setField("cfg-prefix-exp", data.prefix_max_expansions); + } catch (err) { + console.error("Failed to load backend config:", err); + } + } + + function _setField(id, value) { + const el = document.getElementById(id); + if (el && value !== undefined) el.value = value; + } + + function _getFieldNum(id, fallback) { + const el = document.getElementById(id); + if (!el) return fallback; + const v = parseFloat(el.value); + return isNaN(v) ? fallback : v; + } + + function saveFrontendConfig() { + const cfg = { + debounce_ms: _getFieldNum("cfg-debounce", 300), + results_per_page: _getFieldNum("cfg-results-per-page", 50), + min_query_length: _getFieldNum("cfg-min-query", 2), + search_timeout_ms: _getFieldNum("cfg-timeout", 30000), + }; + localStorage.setItem(_FRONTEND_CONFIG_KEY, JSON.stringify(cfg)); + showToast("Paramètres client sauvegardés"); + } + + async function saveBackendConfig() { + const body = { + search_workers: _getFieldNum("cfg-workers", 2), + max_content_size: _getFieldNum("cfg-max-content", 100000), + title_boost: _getFieldNum("cfg-title-boost", 3.0), + tag_boost: _getFieldNum("cfg-tag-boost", 2.0), + prefix_max_expansions: _getFieldNum("cfg-prefix-exp", 50), + }; + try { + await fetch("/api/config", { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify(body), + }); + showToast("Configuration backend sauvegardée"); + } catch (err) { + console.error("Failed to save backend config:", err); + showToast("Erreur de sauvegarde"); + } + } + + async function forceReindex() { + const btn = document.getElementById("cfg-reindex"); + if (btn) { btn.disabled = true; btn.textContent = "Réindexation..."; } + try { + await api("/api/index/reload"); + showToast("Réindexation terminée"); + loadDiagnostics(); + await Promise.all([loadVaults(), loadTags()]); + } catch (err) { + console.error("Reindex error:", err); + showToast("Erreur de réindexation"); + } finally { + if (btn) { btn.disabled = false; btn.textContent = "Forcer réindexation"; } + } + } + + async function resetConfigDefaults() { + // Reset frontend + localStorage.removeItem(_FRONTEND_CONFIG_KEY); + // Reset backend + try { + await fetch("/api/config", { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ + search_workers: 2, debounce_ms: 300, results_per_page: 50, + min_query_length: 2, search_timeout_ms: 30000, max_content_size: 100000, + title_boost: 3.0, path_boost: 1.5, tag_boost: 2.0, prefix_max_expansions: 50, + snippet_context_chars: 120, max_snippet_highlights: 5, + }), + }); + } catch (err) { console.error("Reset config error:", err); } + loadConfigFields(); + showToast("Configuration réinitialisée"); + } + + async function loadDiagnostics() { + const container = document.getElementById("config-diagnostics"); + if (!container) return; + container.innerHTML = '
Chargement...
'; + try { + const data = await api("/api/diagnostics"); + renderDiagnostics(container, data); + } catch (err) { + container.innerHTML = '
Erreur de chargement
'; + } + } + + function renderDiagnostics(container, data) { + container.innerHTML = ""; + const sections = [ + { title: "Index", rows: [ + ["Fichiers indexés", data.index.total_files], + ["Tags uniques", data.index.total_tags], + ["Vaults", Object.keys(data.index.vaults).join(", ")], + ]}, + { title: "Index inversé", rows: [ + ["Tokens uniques", data.inverted_index.unique_tokens.toLocaleString()], + ["Postings total", data.inverted_index.total_postings.toLocaleString()], + ["Documents", data.inverted_index.documents], + ["Mémoire estimée", data.inverted_index.memory_estimate_mb + " MB"], + ["Stale", data.inverted_index.is_stale ? "Oui" : "Non"], + ]}, + { title: "Moteur de recherche", rows: [ + ["Executor actif", data.search_executor.active ? "Oui" : "Non"], + ["Workers max", data.search_executor.max_workers], + ]}, + ]; + sections.forEach((section) => { + const div = document.createElement("div"); + div.className = "config-diag-section"; + const title = document.createElement("div"); + title.className = "config-diag-section-title"; + title.textContent = section.title; + div.appendChild(title); + section.rows.forEach(([label, value]) => { + const row = document.createElement("div"); + row.className = "config-diag-row"; + row.innerHTML = `${label}${value}`; + div.appendChild(row); + }); + container.appendChild(div); + }); + } + function renderConfigFilters() { const config = TagFilterService.getConfig(); const filters = config.tagFilters || TagFilterService.defaultFilters; @@ -1987,13 +2188,13 @@ const vault = document.getElementById("vault-filter").value; const tagFilter = selectedTags.length > 0 ? selectedTags.join(",") : null; advancedSearchOffset = 0; - if (q.length > 0 || tagFilter) { + if ((q.length >= _getEffective("min_query_length", MIN_SEARCH_LENGTH)) || tagFilter) { performAdvancedSearch(q, vault, tagFilter); - } else { + } else if (q.length === 0) { SearchChips.clear(); showWelcome(); } - }, 300); + }, _getEffective("debounce_ms", 300)); }); // --- Focus handler: show history dropdown --- @@ -2097,17 +2298,21 @@ async function performSearch(query, vaultFilter, tagFilter) { if (searchAbortController) searchAbortController.abort(); searchAbortController = new AbortController(); + const searchId = ++currentSearchId; showLoading(); let url = `/api/search?q=${encodeURIComponent(query)}&vault=${encodeURIComponent(vaultFilter)}`; if (tagFilter) url += `&tag=${encodeURIComponent(tagFilter)}`; try { const data = await api(url, { signal: searchAbortController.signal }); + if (searchId !== currentSearchId) return; renderSearchResults(data, query, tagFilter); } catch (err) { if (err.name === "AbortError") return; + if (searchId !== currentSearchId) return; showWelcome(); } finally { - searchAbortController = null; + hideProgressBar(); + if (searchId === currentSearchId) searchAbortController = null; } } @@ -2115,6 +2320,7 @@ async function performAdvancedSearch(query, vaultFilter, tagFilter, offset, sort) { if (searchAbortController) searchAbortController.abort(); searchAbortController = new AbortController(); + const searchId = ++currentSearchId; showLoading(); const ofs = offset !== undefined ? offset : advancedSearchOffset; @@ -2125,19 +2331,30 @@ const parsed = QueryParser.parse(query); SearchChips.update(parsed); - let url = `/api/search/advanced?q=${encodeURIComponent(query)}&vault=${encodeURIComponent(vaultFilter)}&limit=${ADVANCED_SEARCH_LIMIT}&offset=${ofs}&sort=${sortBy}`; + const effectiveLimit = _getEffective("results_per_page", ADVANCED_SEARCH_LIMIT); + let url = `/api/search/advanced?q=${encodeURIComponent(query)}&vault=${encodeURIComponent(vaultFilter)}&limit=${effectiveLimit}&offset=${ofs}&sort=${sortBy}`; if (tagFilter) url += `&tag=${encodeURIComponent(tagFilter)}`; + // Search timeout — abort if server takes too long + const timeoutId = setTimeout(() => { + if (searchAbortController) searchAbortController.abort(); + }, _getEffective("search_timeout_ms", SEARCH_TIMEOUT_MS)); + try { const data = await api(url, { signal: searchAbortController.signal }); + clearTimeout(timeoutId); + if (searchId !== currentSearchId) return; advancedSearchTotal = data.total; advancedSearchOffset = ofs; renderAdvancedSearchResults(data, query, tagFilter); } catch (err) { + clearTimeout(timeoutId); if (err.name === "AbortError") return; + if (searchId !== currentSearchId) return; showWelcome(); } finally { - searchAbortController = null; + hideProgressBar(); + if (searchId === currentSearchId) searchAbortController = null; } } @@ -2209,6 +2426,11 @@ } else { summaryText.textContent = `${data.total} résultat(s)`; } + if (data.query_time_ms !== undefined && data.query_time_ms > 0) { + const timeBadge = el("span", { class: "search-time-badge" }); + timeBadge.textContent = `(${data.query_time_ms} ms)`; + summaryText.appendChild(timeBadge); + } header.appendChild(summaryText); // Sort controls @@ -2581,6 +2803,7 @@ } function showWelcome() { + hideProgressBar(); const area = document.getElementById("content-area"); area.innerHTML = `
@@ -2598,6 +2821,17 @@
Recherche en cours...
`; + showProgressBar(); + } + + function showProgressBar() { + const bar = document.getElementById("search-progress-bar"); + if (bar) bar.classList.add("active"); + } + + function hideProgressBar() { + const bar = document.getElementById("search-progress-bar"); + if (bar) bar.classList.remove("active"); } function goHome() { diff --git a/frontend/index.html b/frontend/index.html index cfccd7a..b82fd6e 100644 --- a/frontend/index.html +++ b/frontend/index.html @@ -71,6 +71,7 @@ +
@@ -299,6 +300,73 @@
+ + +
+

Paramètres de recherche

+

Ces paramètres s'appliquent immédiatement côté client.

+ +
+ + + Délai avant exécution de la recherche (100–2000) +
+
+ + + Nombre de résultats affichés par page (10–200) +
+
+ + + Nombre minimum de caractères avant recherche (1–5) +
+
+ + + Annuler la recherche après ce délai (5000–120000) +
+
+ + +
+

Paramètres backend Redémarrage requis

+

Ces paramètres sont sauvegardés sur le serveur. Certains nécessitent un redémarrage ou une réindexation.

+ +
+ + + Threads dédiés à la recherche (1–8) +
+
+ + + Contenu indexé par fichier (10K–1M). Réindexation requise. +
+
+ + + Multiplicateur de pertinence pour les correspondances dans le titre +
+
+ + + Multiplicateur de pertinence pour les correspondances dans les tags +
+
+ + + Nombre max de tokens élargis par préfixe (10–200) +
+ +
+ + + +
+
+ +

Filtrage de tags

Définissez les patterns de tags à masquer dans la sidebar. Vous pouvez utiliser des wildcards pour cibler les tags de template.

@@ -314,6 +382,17 @@ Regex :
+ + +
+

Diagnostics

+

Statistiques de l'index et du moteur de recherche.

+
+
Chargement...
+
+ +
+
diff --git a/frontend/style.css b/frontend/style.css index 1b2c52d..1e1104b 100644 --- a/frontend/style.css +++ b/frontend/style.css @@ -1347,6 +1347,39 @@ select { to { transform: rotate(360deg); } } +/* --- Search progress bar --- */ +.search-progress-bar { + position: fixed; + top: 0; + left: 0; + right: 0; + height: 3px; + z-index: 9999; + background: transparent; + pointer-events: none; + opacity: 0; + transition: opacity 0.15s; +} +.search-progress-bar.active { + opacity: 1; +} +.search-progress-bar .search-progress-bar__fill { + height: 100%; + background: var(--accent); + width: 0%; + animation: progress-indeterminate 1.5s ease-in-out infinite; +} +@keyframes progress-indeterminate { + 0% { width: 0%; margin-left: 0%; } + 50% { width: 40%; margin-left: 30%; } + 100% { width: 0%; margin-left: 100%; } +} +.search-time-badge { + font-size: 0.7rem; + color: var(--text-muted); + margin-left: 8px; +} + /* --- Editor Modal --- */ .editor-modal { display: none; @@ -2111,6 +2144,118 @@ body.resizing-v { padding: 0; } +/* --- Config rows & controls --- */ +.config-row { + display: grid; + grid-template-columns: 200px 120px 1fr; + align-items: center; + gap: 12px; + margin-bottom: 10px; +} +.config-label { + font-size: 0.82rem; + color: var(--text-primary); + font-weight: 500; +} +.config-input--num { + flex: none; + width: 120px; + text-align: right; +} +.config-hint { + font-size: 0.72rem; + color: var(--text-muted); +} +.config-badge-restart { + display: inline-block; + font-size: 0.65rem; + font-weight: 500; + padding: 2px 8px; + border-radius: 4px; + background: var(--danger-bg, #3d1a18); + color: var(--danger, #ff7b72); + vertical-align: middle; + margin-left: 6px; +} +.config-actions-row { + display: flex; + gap: 8px; + margin-top: 16px; + flex-wrap: wrap; +} +.config-btn-save { + padding: 8px 20px; + border: 1px solid var(--accent); + border-radius: 6px; + background: var(--accent); + color: #fff; + font-family: 'JetBrains Mono', monospace; + font-size: 0.8rem; + font-weight: 600; + cursor: pointer; + transition: opacity 150ms; +} +.config-btn-save:hover { opacity: 0.9; } +.config-btn-secondary { + padding: 8px 16px; + border: 1px solid var(--border); + border-radius: 6px; + background: var(--bg-secondary); + color: var(--text-primary); + font-family: 'JetBrains Mono', monospace; + font-size: 0.8rem; + cursor: pointer; + transition: background 150ms; +} +.config-btn-secondary:hover { background: var(--bg-hover); } + +/* --- Config diagnostics panel --- */ +.config-diagnostics { + background: var(--code-bg); + border: 1px solid var(--border); + border-radius: 6px; + padding: 14px 16px; + font-family: 'JetBrains Mono', monospace; + font-size: 0.78rem; + line-height: 1.7; + color: var(--text-secondary); +} +.config-diag-loading { + color: var(--text-muted); +} +.config-diag-row { + display: flex; + justify-content: space-between; +} +.config-diag-row .diag-label { color: var(--text-secondary); } +.config-diag-row .diag-value { color: var(--text-primary); font-weight: 500; } +.config-diag-section { + margin-bottom: 8px; + padding-bottom: 8px; + border-bottom: 1px solid var(--border); +} +.config-diag-section:last-child { + margin-bottom: 0; + padding-bottom: 0; + border-bottom: none; +} +.config-diag-section-title { + font-weight: 600; + color: var(--accent); + margin-bottom: 4px; + font-size: 0.75rem; + text-transform: uppercase; + letter-spacing: 0.5px; +} + +@media (max-width: 768px) { + .config-row { + grid-template-columns: 1fr; + gap: 4px; + } + .config-input--num { width: 100%; } +} + /* --- Toast notifications --- */ .toast-container { position: fixed;