Add advanced search engine with inverted index, thread pool execution, configuration API, and comprehensive diagnostics

This commit is contained in:
Bruno Charest 2026-03-23 13:21:20 -04:00
parent af7d1c0d2e
commit b40fcae62f
8 changed files with 885 additions and 106 deletions

1
.gitignore vendored
View File

@ -8,3 +8,4 @@ venv/
*.egg-info/
dist/
build/
config.json

View File

@ -2,7 +2,7 @@
**Porte d'entrée web ultra-léger pour vos vaults Obsidian** — Accédez, naviguez et recherchez dans toutes vos notes Obsidian depuis n'importe quel appareil via une interface web moderne et responsive.
[![Version](https://img.shields.io/badge/Version-1.1.0-blue.svg)]()
[![Version](https://img.shields.io/badge/Version-1.2.0-blue.svg)]()
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Docker](https://img.shields.io/badge/Docker-Ready-blue.svg)](https://www.docker.com/)
[![Python](https://img.shields.io/badge/Python-3.11+-green.svg)](https://www.python.org/)
@ -359,6 +359,9 @@ ObsiGate expose une API REST complète :
| `/api/image/{vault}?path=` | Servir une image avec MIME type approprié | GET |
| `/api/attachments/rescan/{vault}` | Rescanner les images d'un vault | POST |
| `/api/attachments/stats?vault=` | Statistiques d'images indexées | GET |
| `/api/config` | Lire la configuration | GET |
| `/api/config` | Mettre à jour la configuration | POST |
| `/api/diagnostics` | Statistiques index, mémoire, moteur de recherche | GET |
> Tous les endpoints exposent des schémas Pydantic documentés. La doc interactive est disponible sur `/docs` (Swagger UI).
@ -464,19 +467,39 @@ docker-compose logs --tail=100 obsigate
```
---
## ⚡ Performance
| Métrique | Estimation |
|----------|------------|
| **Indexation** | ~12s pour 1000 fichiers markdown |
| **Recherche fulltext** | < 50ms (index en mémoire, zéro I/O disque) |
| **Indexation** | ~12s pour 1 000 fichiers markdown |
| **Recherche avancée** | < 10ms pour la plupart des requêtes (index inversé + TF-IDF) |
| **Résolution wikilinks** | O(1) via table de lookup |
| **Mémoire** | ~80150MB par 1000 fichiers (contenu capé à 100KB/fichier) |
| **Mémoire** | ~80150MB par 1 000 fichiers (contenu capé à 100 KB/fichier) |
| **Image Docker** | ~180MB (multi-stage, sans outils de build) |
| **CPU** | Minimal ; pas de polling, pas de watchers |
| **CPU** | Non-bloquant ; recherche offloadée sur thread pool dédié |
### Optimisations clés (v1.1.0)
### Paramètres recommandés par taille de vault
| Taille | Fichiers | `search_workers` | `prefix_max_expansions` | `max_content_size` |
|--------|----------|-------------------|--------------------------|---------------------|
| Petit | < 500 | 1 | 50 | 100 000 |
| Moyen | 5005 000 | 2 | 50 | 100 000 |
| Grand | 5 000+ | 4 | 30 | 50 000 |
Ces paramètres sont configurables via l'interface (Settings) ou l'API `/api/config`.
### Optimisations clés (v1.2.0)
- **Index inversé avec set-intersection** : la recherche utilise les posting lists pour un retrieval O(k × postings) au lieu de O(N) scan complet
- **Prefix matching par recherche binaire** : O(log V + k) au lieu de O(V) scan linéaire du vocabulaire
- **ThreadPoolExecutor** : les fonctions de recherche CPU-bound sont offloadées du event loop asyncio
- **Race condition guard** : `currentSearchId` + `AbortController` empêchent le rendu de résultats obsolètes
- **Progress bar** : barre de progression animée pendant la recherche
- **Search timeout** : abandon automatique après 30s (configurable)
- **Query time display** : temps serveur affiché dans les résultats (`query_time_ms`)
- **Staleness detection fix** : utilisation d'un compteur de génération au lieu de `id(index)` pour détecter les changements d'index
### Optimisations v1.1.0
- **Recherche sans I/O** : le contenu des fichiers est mis en cache dans l'index mémoire
- **Scoring multi-facteurs** : titre exact (+20), titre partiel (+10), chemin (+5), tag (+3), fréquence contenu (x1 par occurrence, capé à 10)
@ -582,6 +605,30 @@ Ce projet est sous licence **MIT** - voir le fichier [LICENSE](LICENSE) pour les
## 📝 Changelog
### v1.2.0 (2025)
**Performance (critique)**
- Réécriture du moteur `advanced_search()` : retrieval par set-intersection sur l'index inversé (O(k × postings) au lieu de O(N))
- Prefix matching par recherche binaire sur liste triée de tokens (O(log V + k) au lieu de O(V))
- Offload des fonctions de recherche CPU-bound via `ThreadPoolExecutor` (2 workers par défaut)
- Pré-calcul des expansions de préfixe pour éviter les recherches binaires répétées
- Fix du bug de staleness : `is_stale()` utilise un compteur de génération au lieu de `id(index)`
**Frontend**
- Guard contre les race conditions : `currentSearchId` vérifié après chaque `fetch` avant rendu
- Barre de progression animée pendant la recherche
- Timeout de recherche configurable (30s par défaut)
- Longueur minimale de requête configurable (2 caractères par défaut)
- Affichage du temps de requête serveur (`query_time_ms`) dans les résultats
- Pagination ajoutée sur l'endpoint legacy `/api/search` (params `limit`/`offset`)
**Configuration & Diagnostics**
- Nouveaux endpoints `GET/POST /api/config` pour la configuration persistante (`config.json`)
- Nouveau endpoint `GET /api/diagnostics` (stats index, mémoire, moteur de recherche)
- Page de configuration étendue : paramètres frontend (debounce, résultats/page, timeout) et backend (workers, boosts, expansions)
- Panel de diagnostics intégré dans la modal de configuration
- Boutons « Forcer réindexation » et « Réinitialiser » dans les paramètres
### v1.1.0 (2025)
**Sécurité**
@ -621,4 +668,4 @@ Ce projet est sous licence **MIT** - voir le fichier [LICENSE](LICENSE) pour les
---
*Projet : ObsiGate | Version : 1.1.0 | Dernière mise à jour : 2025*
*Projet : ObsiGate | Version : 1.2.0 | Dernière mise à jour : 2025*

View File

@ -22,6 +22,10 @@ vault_config: Dict[str, Dict[str, Any]] = {}
# Thread-safe lock for index updates
_index_lock = threading.Lock()
# Generation counter — incremented on each index rebuild so consumers
# (e.g. the inverted index in search.py) can detect staleness.
_index_generation: int = 0
# O(1) lookup table for wikilink resolution: {filename_lower: [{vault, path}, ...]}
_file_lookup: Dict[str, List[Dict[str, str]]] = {}
@ -318,6 +322,7 @@ async def build_index() -> None:
new_path_index[vname] = vdata.get("paths", [])
# Atomic swap under lock for thread safety during concurrent reads
global _index_generation
with _index_lock:
index.clear()
index.update(new_index)
@ -325,6 +330,7 @@ async def build_index() -> None:
_file_lookup.update(new_lookup)
path_index.clear()
path_index.update(new_path_index)
_index_generation += 1
total_files = sum(len(v["files"]) for v in index.values())
logger.info(f"Index built: {len(index)} vaults, {total_files} total files")

View File

@ -1,8 +1,12 @@
import asyncio
import json as _json
import re
import html as html_mod
import logging
import mimetypes
from concurrent.futures import ThreadPoolExecutor
from contextlib import asynccontextmanager
from functools import partial
from pathlib import Path
from typing import Optional, List, Dict, Any
@ -111,11 +115,14 @@ class SearchResultItem(BaseModel):
class SearchResponse(BaseModel):
"""Full-text search response."""
"""Full-text search response with optional pagination."""
query: str
vault_filter: str
tag_filter: Optional[str]
count: int
total: int = Field(0, description="Total results before pagination")
offset: int = Field(0, description="Current pagination offset")
limit: int = Field(200, description="Page size")
results: List[SearchResultItem]
@ -165,6 +172,7 @@ class AdvancedSearchResponse(BaseModel):
offset: int
limit: int
facets: SearchFacets
query_time_ms: float = Field(0, description="Server-side query time in milliseconds")
class TitleSuggestion(BaseModel):
@ -210,16 +218,25 @@ class HealthResponse(BaseModel):
# Application lifespan (replaces deprecated on_event)
# ---------------------------------------------------------------------------
# Thread pool for offloading CPU-bound search from the event loop.
# Sized to 2 workers so concurrent searches don't starve other requests.
_search_executor: Optional[ThreadPoolExecutor] = None
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Application lifespan: build index on startup, cleanup on shutdown."""
global _search_executor
_search_executor = ThreadPoolExecutor(max_workers=2, thread_name_prefix="search")
logger.info("ObsiGate starting \u2014 building index...")
await build_index()
logger.info("ObsiGate ready.")
yield
_search_executor.shutdown(wait=False)
_search_executor = None
app = FastAPI(title="ObsiGate", version="1.1.0", lifespan=lifespan)
app = FastAPI(title="ObsiGate", version="1.2.0", lifespan=lifespan)
# Resolve frontend path relative to this file
FRONTEND_DIR = Path(__file__).resolve().parent.parent / "frontend"
@ -687,22 +704,38 @@ async def api_search(
q: str = Query("", description="Search query"),
vault: str = Query("all", description="Vault filter"),
tag: Optional[str] = Query(None, description="Tag filter"),
limit: int = Query(50, ge=1, le=200, description="Results per page"),
offset: int = Query(0, ge=0, description="Pagination offset"),
):
"""Full-text search across vaults with relevance scoring.
Supports combining free-text queries with tag filters.
Results are ranked by a multi-factor scoring algorithm.
Pagination via ``limit`` and ``offset`` (defaults preserve backward compat).
Args:
q: Free-text search string.
vault: Vault name or ``"all"`` to search everywhere.
tag: Comma-separated tag names to require.
limit: Max results per page (1200).
offset: Pagination offset.
Returns:
``SearchResponse`` with ranked results and snippets.
"""
results = search(q, vault_filter=vault, tag_filter=tag)
return {"query": q, "vault_filter": vault, "tag_filter": tag, "count": len(results), "results": results}
loop = asyncio.get_event_loop()
# Fetch full result set (capped at DEFAULT_SEARCH_LIMIT internally)
all_results = await loop.run_in_executor(
_search_executor,
partial(search, q, vault_filter=vault, tag_filter=tag),
)
total = len(all_results)
page = all_results[offset: offset + limit]
return {
"query": q, "vault_filter": vault, "tag_filter": tag,
"count": len(page), "total": total, "offset": offset, "limit": limit,
"results": page,
}
@app.get("/api/tags", response_model=TagsResponse)
@ -793,7 +826,12 @@ async def api_advanced_search(
Returns:
``AdvancedSearchResponse`` with scored results, facets, and pagination info.
"""
return advanced_search(q, vault_filter=vault, tag_filter=tag, limit=limit, offset=offset, sort_by=sort)
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
_search_executor,
partial(advanced_search, q, vault_filter=vault, tag_filter=tag,
limit=limit, offset=offset, sort_by=sort),
)
@app.get("/api/suggest", response_model=SuggestResponse)
@ -924,6 +962,141 @@ async def api_attachment_stats(vault: Optional[str] = Query(None, description="V
return {"vaults": stats}
# ---------------------------------------------------------------------------
# Configuration API
# ---------------------------------------------------------------------------
_CONFIG_PATH = Path(__file__).resolve().parent.parent / "config.json"
_DEFAULT_CONFIG = {
"search_workers": 2,
"debounce_ms": 300,
"results_per_page": 50,
"min_query_length": 2,
"search_timeout_ms": 30000,
"max_content_size": 100000,
"snippet_context_chars": 120,
"max_snippet_highlights": 5,
"title_boost": 3.0,
"path_boost": 1.5,
"tag_boost": 2.0,
"prefix_max_expansions": 50,
}
def _load_config() -> dict:
"""Load config from disk, merging with defaults."""
config = dict(_DEFAULT_CONFIG)
if _CONFIG_PATH.exists():
try:
stored = _json.loads(_CONFIG_PATH.read_text(encoding="utf-8"))
config.update(stored)
except Exception as e:
logger.warning(f"Failed to read config.json: {e}")
return config
def _save_config(config: dict) -> None:
"""Persist config to disk."""
try:
_CONFIG_PATH.write_text(
_json.dumps(config, indent=2, ensure_ascii=False),
encoding="utf-8",
)
except Exception as e:
logger.error(f"Failed to write config.json: {e}")
raise HTTPException(status_code=500, detail=f"Failed to save config: {e}")
@app.get("/api/config")
async def api_get_config():
"""Return current configuration with defaults for missing keys."""
return _load_config()
@app.post("/api/config")
async def api_set_config(body: dict = Body(...)):
"""Update configuration. Only known keys are accepted.
Keys matching ``_DEFAULT_CONFIG`` are validated and persisted.
Unknown keys are silently ignored.
Returns the full merged config after update.
"""
current = _load_config()
updated_keys = []
for key, value in body.items():
if key in _DEFAULT_CONFIG:
expected_type = type(_DEFAULT_CONFIG[key])
if isinstance(value, expected_type) or (expected_type is float and isinstance(value, (int, float))):
current[key] = value
updated_keys.append(key)
else:
raise HTTPException(
status_code=400,
detail=f"Invalid type for '{key}': expected {expected_type.__name__}, got {type(value).__name__}",
)
_save_config(current)
logger.info(f"Config updated: {updated_keys}")
return current
# ---------------------------------------------------------------------------
# Diagnostics API
# ---------------------------------------------------------------------------
@app.get("/api/diagnostics")
async def api_diagnostics():
"""Return index statistics and system diagnostics.
Includes document counts, token counts, memory estimates,
and inverted index status.
"""
from backend.search import get_inverted_index
import sys
inv = get_inverted_index()
# Per-vault stats
vault_stats = {}
total_files = 0
total_tags = 0
for vname, vdata in index.items():
file_count = len(vdata.get("files", []))
tag_count = len(vdata.get("tags", {}))
vault_stats[vname] = {"file_count": file_count, "tag_count": tag_count}
total_files += file_count
total_tags += tag_count
# Memory estimate for inverted index
word_index_entries = sum(len(docs) for docs in inv.word_index.values())
mem_estimate_mb = round(
(sys.getsizeof(inv.word_index) + word_index_entries * 80
+ len(inv.doc_info) * 200
+ len(inv._sorted_tokens) * 60) / (1024 * 1024), 2
)
return {
"index": {
"total_files": total_files,
"total_tags": total_tags,
"vaults": vault_stats,
},
"inverted_index": {
"unique_tokens": len(inv.word_index),
"total_postings": word_index_entries,
"documents": inv.doc_count,
"sorted_tokens": len(inv._sorted_tokens),
"is_stale": inv.is_stale(),
"memory_estimate_mb": mem_estimate_mb,
},
"config": _load_config(),
"search_executor": {
"active": _search_executor is not None,
"max_workers": _search_executor._max_workers if _search_executor else 0,
},
}
# ---------------------------------------------------------------------------
# Static files & SPA fallback
# ---------------------------------------------------------------------------

View File

@ -1,10 +1,13 @@
import bisect
import logging
import math
import re
import time
import unicodedata
from collections import defaultdict
from typing import List, Dict, Any, Optional, Tuple
from backend import indexer as _indexer
from backend.indexer import index
logger = logging.getLogger("obsigate.search")
@ -256,12 +259,21 @@ class InvertedIndex:
self.tag_prefix_index: Dict[str, List[str]] = defaultdict(list)
self.title_norm_map: Dict[str, List[Dict[str, str]]] = defaultdict(list)
self.doc_count: int = 0
self._source_id: Optional[int] = None
self.doc_info: Dict[str, Dict[str, Any]] = {}
self.doc_vault: Dict[str, str] = {}
self.vault_docs: Dict[str, set] = defaultdict(set)
self.tag_docs: Dict[str, set] = defaultdict(set)
self._sorted_tokens: List[str] = []
self._source_generation: int = -1
def is_stale(self) -> bool:
"""Check if the inverted index needs rebuilding."""
current_id = id(index)
return current_id != self._source_id
"""Check if the inverted index needs rebuilding.
Uses the indexer's generation counter which increments on every
rebuild, instead of ``id(index)`` which never changes since the
global dict is mutated in-place.
"""
return _indexer._index_generation != self._source_generation
def rebuild(self) -> None:
"""Rebuild inverted index from the global ``index`` dict.
@ -276,12 +288,25 @@ class InvertedIndex:
self.tag_prefix_index = defaultdict(list)
self.title_norm_map = defaultdict(list)
self.doc_count = 0
self.doc_info = {}
self.doc_vault = {}
self.vault_docs = defaultdict(set)
self.tag_docs = defaultdict(set)
for vault_name, vault_data in index.items():
for file_info in vault_data.get("files", []):
doc_key = f"{vault_name}::{file_info['path']}"
self.doc_count += 1
# --- Document metadata for O(1) lookup ---
self.doc_info[doc_key] = file_info
self.doc_vault[doc_key] = vault_name
self.vault_docs[vault_name].add(doc_key)
# --- Per-document tag index ---
for tag in file_info.get("tags", []):
self.tag_docs[tag.lower()].add(doc_key)
# --- Title tokens ---
title_tokens = tokenize(file_info.get("title", ""))
for token in set(title_tokens):
@ -316,7 +341,8 @@ class InvertedIndex:
if tag not in self.tag_prefix_index[prefix]:
self.tag_prefix_index[prefix].append(tag)
self._source_id = id(index)
self._sorted_tokens = sorted(self.word_index.keys())
self._source_generation = _indexer._index_generation
logger.info(
"Inverted index built: %d documents, %d unique tokens, %d tags",
self.doc_count,
@ -358,6 +384,32 @@ class InvertedIndex:
return 0.0
return tf * self.idf(term)
def get_prefix_tokens(self, prefix: str, max_expansions: int = 50) -> List[str]:
"""Get all tokens starting with *prefix* using binary search.
Uses a pre-sorted token list for O(log V + k) lookup instead
of O(V) linear scan over the vocabulary.
Args:
prefix: Normalized prefix string.
max_expansions: Cap on returned tokens to bound work.
Returns:
List of matching tokens (including exact match if present).
"""
if not prefix or not self._sorted_tokens:
return []
lo = bisect.bisect_left(self._sorted_tokens, prefix)
results: List[str] = []
for i in range(lo, len(self._sorted_tokens)):
if self._sorted_tokens[i].startswith(prefix):
results.append(self._sorted_tokens[i])
if len(results) >= max_expansions:
break
else:
break
return results
# Singleton inverted index
_inverted_index = InvertedIndex()
@ -582,6 +634,10 @@ def advanced_search(
) -> Dict[str, Any]:
"""Advanced full-text search with TF-IDF scoring, facets, and pagination.
Uses the inverted index for O(k × postings) candidate retrieval instead
of O(N) full document scan. Prefix matching uses binary search on a
sorted token list for O(log V + k) instead of O(V) linear scan.
Parses the query for operators (``tag:``, ``vault:``, ``title:``,
``path:``), falls back remaining tokens to TF-IDF scored free-text
search using the inverted index. Results include highlighted snippets
@ -596,8 +652,10 @@ def advanced_search(
sort_by: ``"relevance"`` or ``"modified"``.
Returns:
Dict with ``results``, ``total``, ``offset``, ``limit``, ``facets``.
Dict with ``results``, ``total``, ``offset``, ``limit``, ``facets``,
``query_time_ms``.
"""
t0 = time.monotonic()
query = query.strip() if query else ""
parsed = _parse_advanced_query(query)
@ -616,44 +674,81 @@ def advanced_search(
has_terms = len(query_terms) > 0
if not has_terms and not all_tags and not parsed["title"] and not parsed["path"]:
return {"results": [], "total": 0, "offset": offset, "limit": limit, "facets": {"tags": {}, "vaults": {}}}
return {"results": [], "total": 0, "offset": offset, "limit": limit,
"facets": {"tags": {}, "vaults": {}}, "query_time_ms": 0}
inv = get_inverted_index()
# ------------------------------------------------------------------
# Step 1: Candidate retrieval via inverted index (replaces O(N) scan)
# ------------------------------------------------------------------
if has_terms:
# Union of posting lists for all terms + prefix expansions
candidates: set = set()
for term in query_terms:
# Exact term matches
candidates.update(inv.word_index.get(term, {}).keys())
# Prefix matches — O(log V + k) via binary search
if len(term) >= MIN_PREFIX_LENGTH:
for expanded in inv.get_prefix_tokens(term):
if expanded != term:
candidates.update(inv.word_index.get(expanded, {}).keys())
else:
# Filter-only search: start with tag-filtered subset or all docs
if all_tags:
tag_sets = [inv.tag_docs.get(t.lower(), set()) for t in all_tags]
candidates = set.intersection(*tag_sets) if tag_sets else set()
else:
candidates = set(inv.doc_info.keys())
# ------------------------------------------------------------------
# Step 2: Apply filters on candidate set
# ------------------------------------------------------------------
if effective_vault != "all":
candidates &= inv.vault_docs.get(effective_vault, set())
if all_tags and has_terms:
for t in all_tags:
candidates &= inv.tag_docs.get(t.lower(), set())
if parsed["title"]:
norm_title_filter = normalize_text(parsed["title"])
candidates = {
dk for dk in candidates
if norm_title_filter in normalize_text(inv.doc_info[dk].get("title", ""))
}
if parsed["path"]:
norm_path_filter = normalize_text(parsed["path"])
candidates = {
dk for dk in candidates
if norm_path_filter in normalize_text(inv.doc_info[dk].get("path", ""))
}
# ------------------------------------------------------------------
# Step 3: Score only the candidates (not all N documents)
# ------------------------------------------------------------------
scored_results: List[Tuple[float, Dict[str, Any]]] = []
facet_tags: Dict[str, int] = defaultdict(int)
facet_vaults: Dict[str, int] = defaultdict(int)
for vault_name, vault_data in index.items():
if effective_vault != "all" and vault_name != effective_vault:
# Pre-compute prefix expansions once per term (avoid repeated binary search)
prefix_expansions: Dict[str, List[str]] = {}
if has_terms:
for term in query_terms:
if len(term) >= MIN_PREFIX_LENGTH:
prefix_expansions[term] = [
t for t in inv.get_prefix_tokens(term) if t != term
]
for doc_key in candidates:
file_info = inv.doc_info.get(doc_key)
if file_info is None:
continue
vault_name = inv.doc_vault[doc_key]
for file_info in vault_data.get("files", []):
doc_key = f"{vault_name}::{file_info['path']}"
# --- Tag filter ---
if all_tags:
file_tags_lower = [t.lower() for t in file_info.get("tags", [])]
if not all(t.lower() in file_tags_lower for t in all_tags):
continue
# --- Title filter ---
if parsed["title"]:
norm_title_filter = normalize_text(parsed["title"])
norm_file_title = normalize_text(file_info.get("title", ""))
if norm_title_filter not in norm_file_title:
continue
# --- Path filter ---
if parsed["path"]:
norm_path_filter = normalize_text(parsed["path"])
norm_file_path = normalize_text(file_info.get("path", ""))
if norm_path_filter not in norm_file_path:
continue
# --- Scoring ---
score = 0.0
if has_terms:
# TF-IDF scoring for each term
for term in query_terms:
tfidf = inv.tf_idf(term, doc_key)
score += tfidf
@ -674,13 +769,10 @@ def advanced_search(
score += tfidf * TAG_BOOST
break
# Also add prefix matching bonus for partial words
for term in query_terms:
if len(term) >= MIN_PREFIX_LENGTH:
for indexed_term, docs in inv.word_index.items():
if indexed_term.startswith(term) and indexed_term != term:
if doc_key in docs:
score += inv.tf_idf(indexed_term, doc_key) * 0.5
# Prefix matching bonus (bounded by pre-computed expansions)
for term, expansions in prefix_expansions.items():
for expanded_term in expansions:
score += inv.tf_idf(expanded_term, doc_key) * 0.5
else:
# Filter-only search (tag/title/path): score = 1
score = 1.0
@ -717,6 +809,7 @@ def advanced_search(
total = len(scored_results)
page = scored_results[offset: offset + limit]
elapsed_ms = round((time.monotonic() - t0) * 1000, 1)
return {
"results": [r for _, r in page],
@ -727,6 +820,7 @@ def advanced_search(
"tags": dict(sorted(facet_tags.items(), key=lambda x: -x[1])[:20]),
"vaults": dict(sorted(facet_vaults.items(), key=lambda x: -x[1])),
},
"query_time_ms": elapsed_ms,
}

View File

@ -33,12 +33,15 @@
let suggestAbortController = null;
let dropdownActiveIndex = -1;
let dropdownItems = [];
let currentSearchId = 0;
// Advanced search constants
const SEARCH_HISTORY_KEY = "obsigate_search_history";
const MAX_HISTORY_ENTRIES = 50;
const SUGGEST_DEBOUNCE_MS = 150;
const ADVANCED_SEARCH_LIMIT = 50;
const MIN_SEARCH_LENGTH = 2;
const SEARCH_TIMEOUT_MS = 30000;
// ---------------------------------------------------------------------------
// File extension → Lucide icon mapping
@ -1825,10 +1828,12 @@
if (!openBtn || !closeBtn || !modal) return;
openBtn.addEventListener("click", () => {
openBtn.addEventListener("click", async () => {
modal.classList.add("active");
closeHeaderMenu();
renderConfigFilters();
loadConfigFields();
loadDiagnostics();
safeCreateIcons();
});
@ -1848,11 +1853,36 @@
patternInput.addEventListener("input", updateRegexPreview);
// Frontend config fields — save to localStorage on change
["cfg-debounce", "cfg-results-per-page", "cfg-min-query", "cfg-timeout"].forEach((id) => {
const input = document.getElementById(id);
if (input) input.addEventListener("change", saveFrontendConfig);
});
// Backend save button
const saveBtn = document.getElementById("cfg-save-backend");
if (saveBtn) saveBtn.addEventListener("click", saveBackendConfig);
// Force reindex
const reindexBtn = document.getElementById("cfg-reindex");
if (reindexBtn) reindexBtn.addEventListener("click", forceReindex);
// Reset defaults
const resetBtn = document.getElementById("cfg-reset-defaults");
if (resetBtn) resetBtn.addEventListener("click", resetConfigDefaults);
// Refresh diagnostics
const diagBtn = document.getElementById("cfg-refresh-diag");
if (diagBtn) diagBtn.addEventListener("click", loadDiagnostics);
document.addEventListener("keydown", (e) => {
if (e.key === "Escape" && modal.classList.contains("active")) {
closeConfigModal();
}
});
// Load saved frontend config on startup
applyFrontendConfig();
}
function closeConfigModal() {
@ -1860,6 +1890,177 @@
if (modal) modal.classList.remove("active");
}
// --- Config field helpers ---
const _FRONTEND_CONFIG_KEY = "obsigate-perf-config";
function _getFrontendConfig() {
try { return JSON.parse(localStorage.getItem(_FRONTEND_CONFIG_KEY) || "{}"); }
catch { return {}; }
}
function applyFrontendConfig() {
const cfg = _getFrontendConfig();
if (cfg.debounce_ms) { /* applied dynamically in debounce setTimeout */ }
if (cfg.results_per_page) { /* used as ADVANCED_SEARCH_LIMIT override */ }
if (cfg.min_query_length) { /* used as MIN_SEARCH_LENGTH override */ }
if (cfg.search_timeout_ms) { /* used as SEARCH_TIMEOUT_MS override */ }
}
function _getEffective(key, fallback) {
const cfg = _getFrontendConfig();
return cfg[key] !== undefined ? cfg[key] : fallback;
}
async function loadConfigFields() {
// Frontend fields from localStorage
const cfg = _getFrontendConfig();
_setField("cfg-debounce", cfg.debounce_ms || 300);
_setField("cfg-results-per-page", cfg.results_per_page || 50);
_setField("cfg-min-query", cfg.min_query_length || 2);
_setField("cfg-timeout", cfg.search_timeout_ms || 30000);
// Backend fields from API
try {
const data = await api("/api/config");
_setField("cfg-workers", data.search_workers);
_setField("cfg-max-content", data.max_content_size);
_setField("cfg-title-boost", data.title_boost);
_setField("cfg-tag-boost", data.tag_boost);
_setField("cfg-prefix-exp", data.prefix_max_expansions);
} catch (err) {
console.error("Failed to load backend config:", err);
}
}
function _setField(id, value) {
const el = document.getElementById(id);
if (el && value !== undefined) el.value = value;
}
function _getFieldNum(id, fallback) {
const el = document.getElementById(id);
if (!el) return fallback;
const v = parseFloat(el.value);
return isNaN(v) ? fallback : v;
}
function saveFrontendConfig() {
const cfg = {
debounce_ms: _getFieldNum("cfg-debounce", 300),
results_per_page: _getFieldNum("cfg-results-per-page", 50),
min_query_length: _getFieldNum("cfg-min-query", 2),
search_timeout_ms: _getFieldNum("cfg-timeout", 30000),
};
localStorage.setItem(_FRONTEND_CONFIG_KEY, JSON.stringify(cfg));
showToast("Paramètres client sauvegardés");
}
async function saveBackendConfig() {
const body = {
search_workers: _getFieldNum("cfg-workers", 2),
max_content_size: _getFieldNum("cfg-max-content", 100000),
title_boost: _getFieldNum("cfg-title-boost", 3.0),
tag_boost: _getFieldNum("cfg-tag-boost", 2.0),
prefix_max_expansions: _getFieldNum("cfg-prefix-exp", 50),
};
try {
await fetch("/api/config", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
});
showToast("Configuration backend sauvegardée");
} catch (err) {
console.error("Failed to save backend config:", err);
showToast("Erreur de sauvegarde");
}
}
async function forceReindex() {
const btn = document.getElementById("cfg-reindex");
if (btn) { btn.disabled = true; btn.textContent = "Réindexation..."; }
try {
await api("/api/index/reload");
showToast("Réindexation terminée");
loadDiagnostics();
await Promise.all([loadVaults(), loadTags()]);
} catch (err) {
console.error("Reindex error:", err);
showToast("Erreur de réindexation");
} finally {
if (btn) { btn.disabled = false; btn.textContent = "Forcer réindexation"; }
}
}
async function resetConfigDefaults() {
// Reset frontend
localStorage.removeItem(_FRONTEND_CONFIG_KEY);
// Reset backend
try {
await fetch("/api/config", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
search_workers: 2, debounce_ms: 300, results_per_page: 50,
min_query_length: 2, search_timeout_ms: 30000, max_content_size: 100000,
title_boost: 3.0, path_boost: 1.5, tag_boost: 2.0, prefix_max_expansions: 50,
snippet_context_chars: 120, max_snippet_highlights: 5,
}),
});
} catch (err) { console.error("Reset config error:", err); }
loadConfigFields();
showToast("Configuration réinitialisée");
}
async function loadDiagnostics() {
const container = document.getElementById("config-diagnostics");
if (!container) return;
container.innerHTML = '<div class="config-diag-loading">Chargement...</div>';
try {
const data = await api("/api/diagnostics");
renderDiagnostics(container, data);
} catch (err) {
container.innerHTML = '<div class="config-diag-loading">Erreur de chargement</div>';
}
}
function renderDiagnostics(container, data) {
container.innerHTML = "";
const sections = [
{ title: "Index", rows: [
["Fichiers indexés", data.index.total_files],
["Tags uniques", data.index.total_tags],
["Vaults", Object.keys(data.index.vaults).join(", ")],
]},
{ title: "Index inversé", rows: [
["Tokens uniques", data.inverted_index.unique_tokens.toLocaleString()],
["Postings total", data.inverted_index.total_postings.toLocaleString()],
["Documents", data.inverted_index.documents],
["Mémoire estimée", data.inverted_index.memory_estimate_mb + " MB"],
["Stale", data.inverted_index.is_stale ? "Oui" : "Non"],
]},
{ title: "Moteur de recherche", rows: [
["Executor actif", data.search_executor.active ? "Oui" : "Non"],
["Workers max", data.search_executor.max_workers],
]},
];
sections.forEach((section) => {
const div = document.createElement("div");
div.className = "config-diag-section";
const title = document.createElement("div");
title.className = "config-diag-section-title";
title.textContent = section.title;
div.appendChild(title);
section.rows.forEach(([label, value]) => {
const row = document.createElement("div");
row.className = "config-diag-row";
row.innerHTML = `<span class="diag-label">${label}</span><span class="diag-value">${value}</span>`;
div.appendChild(row);
});
container.appendChild(div);
});
}
function renderConfigFilters() {
const config = TagFilterService.getConfig();
const filters = config.tagFilters || TagFilterService.defaultFilters;
@ -1987,13 +2188,13 @@
const vault = document.getElementById("vault-filter").value;
const tagFilter = selectedTags.length > 0 ? selectedTags.join(",") : null;
advancedSearchOffset = 0;
if (q.length > 0 || tagFilter) {
if ((q.length >= _getEffective("min_query_length", MIN_SEARCH_LENGTH)) || tagFilter) {
performAdvancedSearch(q, vault, tagFilter);
} else {
} else if (q.length === 0) {
SearchChips.clear();
showWelcome();
}
}, 300);
}, _getEffective("debounce_ms", 300));
});
// --- Focus handler: show history dropdown ---
@ -2097,17 +2298,21 @@
async function performSearch(query, vaultFilter, tagFilter) {
if (searchAbortController) searchAbortController.abort();
searchAbortController = new AbortController();
const searchId = ++currentSearchId;
showLoading();
let url = `/api/search?q=${encodeURIComponent(query)}&vault=${encodeURIComponent(vaultFilter)}`;
if (tagFilter) url += `&tag=${encodeURIComponent(tagFilter)}`;
try {
const data = await api(url, { signal: searchAbortController.signal });
if (searchId !== currentSearchId) return;
renderSearchResults(data, query, tagFilter);
} catch (err) {
if (err.name === "AbortError") return;
if (searchId !== currentSearchId) return;
showWelcome();
} finally {
searchAbortController = null;
hideProgressBar();
if (searchId === currentSearchId) searchAbortController = null;
}
}
@ -2115,6 +2320,7 @@
async function performAdvancedSearch(query, vaultFilter, tagFilter, offset, sort) {
if (searchAbortController) searchAbortController.abort();
searchAbortController = new AbortController();
const searchId = ++currentSearchId;
showLoading();
const ofs = offset !== undefined ? offset : advancedSearchOffset;
@ -2125,19 +2331,30 @@
const parsed = QueryParser.parse(query);
SearchChips.update(parsed);
let url = `/api/search/advanced?q=${encodeURIComponent(query)}&vault=${encodeURIComponent(vaultFilter)}&limit=${ADVANCED_SEARCH_LIMIT}&offset=${ofs}&sort=${sortBy}`;
const effectiveLimit = _getEffective("results_per_page", ADVANCED_SEARCH_LIMIT);
let url = `/api/search/advanced?q=${encodeURIComponent(query)}&vault=${encodeURIComponent(vaultFilter)}&limit=${effectiveLimit}&offset=${ofs}&sort=${sortBy}`;
if (tagFilter) url += `&tag=${encodeURIComponent(tagFilter)}`;
// Search timeout — abort if server takes too long
const timeoutId = setTimeout(() => {
if (searchAbortController) searchAbortController.abort();
}, _getEffective("search_timeout_ms", SEARCH_TIMEOUT_MS));
try {
const data = await api(url, { signal: searchAbortController.signal });
clearTimeout(timeoutId);
if (searchId !== currentSearchId) return;
advancedSearchTotal = data.total;
advancedSearchOffset = ofs;
renderAdvancedSearchResults(data, query, tagFilter);
} catch (err) {
clearTimeout(timeoutId);
if (err.name === "AbortError") return;
if (searchId !== currentSearchId) return;
showWelcome();
} finally {
searchAbortController = null;
hideProgressBar();
if (searchId === currentSearchId) searchAbortController = null;
}
}
@ -2209,6 +2426,11 @@
} else {
summaryText.textContent = `${data.total} résultat(s)`;
}
if (data.query_time_ms !== undefined && data.query_time_ms > 0) {
const timeBadge = el("span", { class: "search-time-badge" });
timeBadge.textContent = `(${data.query_time_ms} ms)`;
summaryText.appendChild(timeBadge);
}
header.appendChild(summaryText);
// Sort controls
@ -2581,6 +2803,7 @@
}
function showWelcome() {
hideProgressBar();
const area = document.getElementById("content-area");
area.innerHTML = `
<div class="welcome">
@ -2598,6 +2821,17 @@
<div class="loading-spinner"></div>
<div>Recherche en cours...</div>
</div>`;
showProgressBar();
}
function showProgressBar() {
const bar = document.getElementById("search-progress-bar");
if (bar) bar.classList.add("active");
}
function hideProgressBar() {
const bar = document.getElementById("search-progress-bar");
if (bar) bar.classList.remove("active");
}
function goHome() {

View File

@ -71,6 +71,7 @@
</script>
</head>
<body>
<div class="search-progress-bar" id="search-progress-bar"><div class="search-progress-bar__fill"></div></div>
<div class="app-container">
<!-- Header -->
@ -299,6 +300,73 @@
</div>
<div class="editor-body" id="config-body">
<div class="config-content">
<!-- Performance Settings — Frontend -->
<section class="config-section">
<h2>Paramètres de recherche</h2>
<p class="config-description">Ces paramètres s'appliquent immédiatement côté client.</p>
<div class="config-row">
<label class="config-label" for="cfg-debounce">Délai debounce (ms)</label>
<input type="number" id="cfg-debounce" class="config-input config-input--num" min="100" max="2000" step="50" value="300">
<span class="config-hint">Délai avant exécution de la recherche (1002000)</span>
</div>
<div class="config-row">
<label class="config-label" for="cfg-results-per-page">Résultats par page</label>
<input type="number" id="cfg-results-per-page" class="config-input config-input--num" min="10" max="200" step="10" value="50">
<span class="config-hint">Nombre de résultats affichés par page (10200)</span>
</div>
<div class="config-row">
<label class="config-label" for="cfg-min-query">Longueur min. requête</label>
<input type="number" id="cfg-min-query" class="config-input config-input--num" min="1" max="5" step="1" value="2">
<span class="config-hint">Nombre minimum de caractères avant recherche (15)</span>
</div>
<div class="config-row">
<label class="config-label" for="cfg-timeout">Timeout recherche (ms)</label>
<input type="number" id="cfg-timeout" class="config-input config-input--num" min="5000" max="120000" step="5000" value="30000">
<span class="config-hint">Annuler la recherche après ce délai (5000120000)</span>
</div>
</section>
<!-- Performance Settings — Backend -->
<section class="config-section">
<h2>Paramètres backend <span class="config-badge-restart">Redémarrage requis</span></h2>
<p class="config-description">Ces paramètres sont sauvegardés sur le serveur. Certains nécessitent un redémarrage ou une réindexation.</p>
<div class="config-row">
<label class="config-label" for="cfg-workers">Workers de recherche</label>
<input type="number" id="cfg-workers" class="config-input config-input--num" min="1" max="8" step="1" value="2">
<span class="config-hint">Threads dédiés à la recherche (18)</span>
</div>
<div class="config-row">
<label class="config-label" for="cfg-max-content">Taille max contenu (octets)</label>
<input type="number" id="cfg-max-content" class="config-input config-input--num" min="10000" max="1000000" step="10000" value="100000">
<span class="config-hint">Contenu indexé par fichier (10K1M). Réindexation requise.</span>
</div>
<div class="config-row">
<label class="config-label" for="cfg-title-boost">Boost titre</label>
<input type="number" id="cfg-title-boost" class="config-input config-input--num" min="0" max="10" step="0.5" value="3.0">
<span class="config-hint">Multiplicateur de pertinence pour les correspondances dans le titre</span>
</div>
<div class="config-row">
<label class="config-label" for="cfg-tag-boost">Boost tags</label>
<input type="number" id="cfg-tag-boost" class="config-input config-input--num" min="0" max="10" step="0.5" value="2.0">
<span class="config-hint">Multiplicateur de pertinence pour les correspondances dans les tags</span>
</div>
<div class="config-row">
<label class="config-label" for="cfg-prefix-exp">Expansions préfixe max</label>
<input type="number" id="cfg-prefix-exp" class="config-input config-input--num" min="10" max="200" step="10" value="50">
<span class="config-hint">Nombre max de tokens élargis par préfixe (10200)</span>
</div>
<div class="config-actions-row">
<button class="config-btn-save" id="cfg-save-backend">Sauvegarder</button>
<button class="config-btn-secondary" id="cfg-reindex">Forcer réindexation</button>
<button class="config-btn-secondary" id="cfg-reset-defaults">Réinitialiser</button>
</div>
</section>
<!-- Tag Filtering (existing) -->
<section class="config-section">
<h2>Filtrage de tags</h2>
<p class="config-description">Définissez les patterns de tags à masquer dans la sidebar. Vous pouvez utiliser des wildcards pour cibler les tags de template.</p>
@ -314,6 +382,17 @@
<small>Regex : <code id="config-regex-code"></code></small>
</div>
</section>
<!-- Diagnostics -->
<section class="config-section">
<h2>Diagnostics</h2>
<p class="config-description">Statistiques de l'index et du moteur de recherche.</p>
<div id="config-diagnostics" class="config-diagnostics">
<div class="config-diag-loading">Chargement...</div>
</div>
<button class="config-btn-secondary" id="cfg-refresh-diag" style="margin-top:8px">Rafraîchir</button>
</section>
</div>
</div>
</div>

View File

@ -1347,6 +1347,39 @@ select {
to { transform: rotate(360deg); }
}
/* --- Search progress bar --- */
.search-progress-bar {
position: fixed;
top: 0;
left: 0;
right: 0;
height: 3px;
z-index: 9999;
background: transparent;
pointer-events: none;
opacity: 0;
transition: opacity 0.15s;
}
.search-progress-bar.active {
opacity: 1;
}
.search-progress-bar .search-progress-bar__fill {
height: 100%;
background: var(--accent);
width: 0%;
animation: progress-indeterminate 1.5s ease-in-out infinite;
}
@keyframes progress-indeterminate {
0% { width: 0%; margin-left: 0%; }
50% { width: 40%; margin-left: 30%; }
100% { width: 0%; margin-left: 100%; }
}
.search-time-badge {
font-size: 0.7rem;
color: var(--text-muted);
margin-left: 8px;
}
/* --- Editor Modal --- */
.editor-modal {
display: none;
@ -2111,6 +2144,118 @@ body.resizing-v {
padding: 0;
}
/* --- Config rows & controls --- */
.config-row {
display: grid;
grid-template-columns: 200px 120px 1fr;
align-items: center;
gap: 12px;
margin-bottom: 10px;
}
.config-label {
font-size: 0.82rem;
color: var(--text-primary);
font-weight: 500;
}
.config-input--num {
flex: none;
width: 120px;
text-align: right;
}
.config-hint {
font-size: 0.72rem;
color: var(--text-muted);
}
.config-badge-restart {
display: inline-block;
font-size: 0.65rem;
font-weight: 500;
padding: 2px 8px;
border-radius: 4px;
background: var(--danger-bg, #3d1a18);
color: var(--danger, #ff7b72);
vertical-align: middle;
margin-left: 6px;
}
.config-actions-row {
display: flex;
gap: 8px;
margin-top: 16px;
flex-wrap: wrap;
}
.config-btn-save {
padding: 8px 20px;
border: 1px solid var(--accent);
border-radius: 6px;
background: var(--accent);
color: #fff;
font-family: 'JetBrains Mono', monospace;
font-size: 0.8rem;
font-weight: 600;
cursor: pointer;
transition: opacity 150ms;
}
.config-btn-save:hover { opacity: 0.9; }
.config-btn-secondary {
padding: 8px 16px;
border: 1px solid var(--border);
border-radius: 6px;
background: var(--bg-secondary);
color: var(--text-primary);
font-family: 'JetBrains Mono', monospace;
font-size: 0.8rem;
cursor: pointer;
transition: background 150ms;
}
.config-btn-secondary:hover { background: var(--bg-hover); }
/* --- Config diagnostics panel --- */
.config-diagnostics {
background: var(--code-bg);
border: 1px solid var(--border);
border-radius: 6px;
padding: 14px 16px;
font-family: 'JetBrains Mono', monospace;
font-size: 0.78rem;
line-height: 1.7;
color: var(--text-secondary);
}
.config-diag-loading {
color: var(--text-muted);
}
.config-diag-row {
display: flex;
justify-content: space-between;
}
.config-diag-row .diag-label { color: var(--text-secondary); }
.config-diag-row .diag-value { color: var(--text-primary); font-weight: 500; }
.config-diag-section {
margin-bottom: 8px;
padding-bottom: 8px;
border-bottom: 1px solid var(--border);
}
.config-diag-section:last-child {
margin-bottom: 0;
padding-bottom: 0;
border-bottom: none;
}
.config-diag-section-title {
font-weight: 600;
color: var(--accent);
margin-bottom: 4px;
font-size: 0.75rem;
text-transform: uppercase;
letter-spacing: 0.5px;
}
@media (max-width: 768px) {
.config-row {
grid-template-columns: 1fr;
gap: 4px;
}
.config-input--num { width: 100%; }
}
/* --- Toast notifications --- */
.toast-container {
position: fixed;