Mise à jour de la configuration de la base de données pour utiliser PostgreSQL et ajout de Redis dans le fichier .env. Modifications du Dockerfile pour installer les dépendances nécessaires. Amélioration des services de stockage pour supporter les opérations asynchrones avec S3 et le stockage local. Refactorisation des pipelines d'images pour une meilleure gestion des tâches asynchrones. Ajout de la gestion des clés API dans l'authentification. Mise à jour de la documentation et des exemples d'utilisation.
This commit is contained in:
parent
cc99fea20a
commit
d68deb9c74
10
.env.example
10
.env.example
@ -20,9 +20,13 @@ HOST=0.0.0.0
|
|||||||
PORT=8000
|
PORT=8000
|
||||||
|
|
||||||
# Base de données
|
# Base de données
|
||||||
DATABASE_URL="sqlite+aiosqlite:///./data/imago.db"
|
DATABASE_URL="postgresql+asyncpg://imago:imago@db:5432/imago"
|
||||||
# Pour PostgreSQL:
|
# Modifiez les valeurs ci-dessus si vous utilisez une instance externe ou locale.
|
||||||
# DATABASE_URL="postgresql+asyncpg://user:password@localhost/shaarli"
|
# Pour SQLite (développement local sans Docker):
|
||||||
|
# DATABASE_URL="sqlite+aiosqlite:///./data/imago.db"
|
||||||
|
|
||||||
|
# Redis (ARQ Worker)
|
||||||
|
REDIS_URL="redis://redis:6379/0"
|
||||||
|
|
||||||
# Stockage des fichiers
|
# Stockage des fichiers
|
||||||
UPLOAD_DIR="./data/uploads"
|
UPLOAD_DIR="./data/uploads"
|
||||||
|
|||||||
@ -5,7 +5,7 @@ RUN apt-get update && apt-get install -y \
|
|||||||
tesseract-ocr \
|
tesseract-ocr \
|
||||||
tesseract-ocr-fra \
|
tesseract-ocr-fra \
|
||||||
tesseract-ocr-eng \
|
tesseract-ocr-eng \
|
||||||
libgl1-mesa-glx \
|
libgl1 \
|
||||||
libglib2.0-0 \
|
libglib2.0-0 \
|
||||||
curl \
|
curl \
|
||||||
&& rm -rf /var/lib/apt/lists/*
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|||||||
16
README.md
16
README.md
@ -93,7 +93,7 @@ python worker.py # Worker ARQ (requiert Redis)
|
|||||||
### Avec Docker
|
### Avec Docker
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker-compose up -d # API + Redis + Worker
|
docker-compose up -d # API + Redis + Worker + PostgreSQL
|
||||||
```
|
```
|
||||||
|
|
||||||
### Commandes utiles (Makefile)
|
### Commandes utiles (Makefile)
|
||||||
@ -201,11 +201,11 @@ Chaque étape est **indépendante** : un échec partiel n'arrête pas le pipelin
|
|||||||
> Tous les appels (sauf `/health` et `/metrics`) nécessitent une clé API valide passée dans le header `X-API-Key`.
|
> Tous les appels (sauf `/health` et `/metrics`) nécessitent une clé API valide passée dans le header `X-API-Key`.
|
||||||
|
|
||||||
### Upload d'une image
|
### Upload d'une image
|
||||||
|
### api_key=emH92l92LD4L7cLhl2imidMZANsIUb9x_AlGWiYpVSA client_id=925463e0-27a4-4993-aa3a-f1cb31c19d32 warning=Notez cette clé ! Elle ne sera plus affichée.
|
||||||
```bash
|
```bash
|
||||||
curl -X POST http://localhost:8000/images/upload \
|
curl -X POST http://localhost:8000/images/upload \
|
||||||
-H "X-API-Key: your_api_key" \
|
-H "X-API-Key: rEYQtw3LxJJlcmBq-cgQcdeY74JcpJ45COuFWokmxPg" \
|
||||||
-F "file=@photo.jpg"
|
-F "file=@pushup.gif"
|
||||||
```
|
```
|
||||||
|
|
||||||
Réponse :
|
Réponse :
|
||||||
@ -222,7 +222,7 @@ Réponse :
|
|||||||
### Polling du statut
|
### Polling du statut
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl http://localhost:8000/images/1/status -H "X-API-Key: your_api_key"
|
curl http://localhost:8000/images/1/status -H "X-API-Key: rEYQtw3LxJJlcmBq-cgQcdeY74JcpJ45COuFWokmxPg"
|
||||||
```
|
```
|
||||||
|
|
||||||
```json
|
```json
|
||||||
@ -237,7 +237,7 @@ curl http://localhost:8000/images/1/status -H "X-API-Key: your_api_key"
|
|||||||
### Détail complet
|
### Détail complet
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl http://localhost:8000/images/1 -H "X-API-Key: your_api_key"
|
curl http://localhost:8000/images/1 -H "X-API-Key: rEYQtw3LxJJlcmBq-cgQcdeY74JcpJ45COuFWokmxPg"
|
||||||
```
|
```
|
||||||
|
|
||||||
```json
|
```json
|
||||||
@ -308,7 +308,7 @@ curl -X POST http://localhost:8000/ai/draft-task \
|
|||||||
| `JWT_SECRET_KEY` | — | Secret pour la signature des tokens |
|
| `JWT_SECRET_KEY` | — | Secret pour la signature des tokens |
|
||||||
| `AI_PROVIDER` | `gemini` | `gemini` ou `openrouter` |
|
| `AI_PROVIDER` | `gemini` | `gemini` ou `openrouter` |
|
||||||
| `GEMINI_API_KEY` | — | Clé API Gemini |
|
| `GEMINI_API_KEY` | — | Clé API Gemini |
|
||||||
| `DATABASE_URL` | SQLite local | URL de connexion (SQLite ou Postgres) |
|
| `DATABASE_URL` | PostgreSQL (Docker) / SQLite (Local) | URL de connexion (Postgres recommandé) |
|
||||||
| `REDIS_URL` | `redis://localhost:6379/0` | URL Redis pour ARQ |
|
| `REDIS_URL` | `redis://localhost:6379/0` | URL Redis pour ARQ |
|
||||||
| `STORAGE_BACKEND` | `local` | `local` ou `s3` |
|
| `STORAGE_BACKEND` | `local` | `local` ou `s3` |
|
||||||
| `S3_BUCKET` | — | Bucket S3/MinIO |
|
| `S3_BUCKET` | — | Bucket S3/MinIO |
|
||||||
@ -393,7 +393,7 @@ imago/
|
|||||||
├── .github/workflows/ci.yml # CI/CD pipeline
|
├── .github/workflows/ci.yml # CI/CD pipeline
|
||||||
├── pyproject.toml # ruff, mypy, coverage config
|
├── pyproject.toml # ruff, mypy, coverage config
|
||||||
├── Makefile # Commandes utiles
|
├── Makefile # Commandes utiles
|
||||||
├── docker-compose.yml # API + Redis + Worker
|
├── docker-compose.yml # API + Redis + Worker + PostgreSQL
|
||||||
├── Dockerfile
|
├── Dockerfile
|
||||||
├── requirements.txt # Production deps
|
├── requirements.txt # Production deps
|
||||||
├── requirements-dev.txt # Dev deps (lint, test)
|
├── requirements-dev.txt # Dev deps (lint, test)
|
||||||
|
|||||||
@ -70,7 +70,8 @@ async def init_db():
|
|||||||
session.add(bootstrap_client)
|
session.add(bootstrap_client)
|
||||||
await session.commit()
|
await session.commit()
|
||||||
|
|
||||||
logger.info("bootstrap.client_created", extra={
|
msg = f"Bootstrap client created! ID: {bootstrap_client.id} | API_KEY: {raw_key}"
|
||||||
|
logger.info(msg, extra={
|
||||||
"client_id": bootstrap_client.id,
|
"client_id": bootstrap_client.id,
|
||||||
"api_key": raw_key,
|
"api_key": raw_key,
|
||||||
"warning": "Notez cette clé ! Elle ne sera plus affichée.",
|
"warning": "Notez cette clé ! Elle ne sera plus affichée.",
|
||||||
|
|||||||
@ -26,33 +26,39 @@ def hash_api_key(api_key: str) -> str:
|
|||||||
|
|
||||||
async def verify_api_key(
|
async def verify_api_key(
|
||||||
request: Request,
|
request: Request,
|
||||||
authorization: str = Header(
|
authorization: str | None = Header(
|
||||||
...,
|
None,
|
||||||
alias="Authorization",
|
alias="Authorization",
|
||||||
description="Clé API au format 'Bearer <key>'",
|
description="Clé API au format 'Bearer <key>'",
|
||||||
),
|
),
|
||||||
|
x_api_key: str | None = Header(
|
||||||
|
None,
|
||||||
|
alias="X-API-Key",
|
||||||
|
description="Clé API alternative",
|
||||||
|
),
|
||||||
db: AsyncSession = Depends(get_db),
|
db: AsyncSession = Depends(get_db),
|
||||||
) -> APIClient:
|
) -> APIClient:
|
||||||
"""
|
"""
|
||||||
Vérifie la clé API fournie dans le header Authorization.
|
Vérifie la clé API fournie dans le header Authorization ou X-API-Key.
|
||||||
Injecte client_id et client_plan dans request.state pour le rate limiter.
|
Injecte client_id et client_plan dans request.state pour le rate limiter.
|
||||||
|
|
||||||
Raises:
|
Raises:
|
||||||
HTTPException 401: clé absente, invalide ou client inactif.
|
HTTPException 401: clé absente, invalide ou client inactif.
|
||||||
"""
|
"""
|
||||||
# ── Extraction du token ───────────────────────────────────
|
raw_key = None
|
||||||
if not authorization.startswith("Bearer "):
|
|
||||||
raise HTTPException(
|
# ── 1. Tentative avec Authorization: Bearer <key> ────────
|
||||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
if authorization and authorization.startswith("Bearer "):
|
||||||
detail="Authentification requise",
|
raw_key = authorization[7:].strip()
|
||||||
headers={"WWW-Authenticate": "Bearer"},
|
|
||||||
)
|
# ── 2. Tentative avec X-API-Key ──────────────────────────
|
||||||
|
if not raw_key and x_api_key:
|
||||||
|
raw_key = x_api_key.strip()
|
||||||
|
|
||||||
raw_key = authorization[7:] # strip "Bearer "
|
|
||||||
if not raw_key:
|
if not raw_key:
|
||||||
raise HTTPException(
|
raise HTTPException(
|
||||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||||
detail="Authentification requise",
|
detail="Authentification requise (Header Authorization ou X-API-Key manquant)",
|
||||||
headers={"WWW-Authenticate": "Bearer"},
|
headers={"WWW-Authenticate": "Bearer"},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
@ -16,6 +16,7 @@ def configure_logging(debug: bool = False) -> None:
|
|||||||
structlog.contextvars.merge_contextvars,
|
structlog.contextvars.merge_contextvars,
|
||||||
structlog.stdlib.add_log_level,
|
structlog.stdlib.add_log_level,
|
||||||
structlog.stdlib.add_logger_name,
|
structlog.stdlib.add_logger_name,
|
||||||
|
structlog.stdlib.ExtraAdder(),
|
||||||
structlog.processors.TimeStamper(fmt="iso"),
|
structlog.processors.TimeStamper(fmt="iso"),
|
||||||
structlog.processors.StackInfoRenderer(),
|
structlog.processors.StackInfoRenderer(),
|
||||||
structlog.processors.UnicodeDecoder(),
|
structlog.processors.UnicodeDecoder(),
|
||||||
|
|||||||
@ -48,9 +48,9 @@ class APIClient(Base):
|
|||||||
quota_images = Column(Integer, default=1000, nullable=False)
|
quota_images = Column(Integer, default=1000, nullable=False)
|
||||||
|
|
||||||
# ── Timestamps ────────────────────────────────────────────
|
# ── Timestamps ────────────────────────────────────────────
|
||||||
created_at = Column(DateTime, default=lambda: datetime.now(timezone.utc))
|
created_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
|
||||||
updated_at = Column(
|
updated_at = Column(
|
||||||
DateTime,
|
DateTime(timezone=True),
|
||||||
default=lambda: datetime.now(timezone.utc),
|
default=lambda: datetime.now(timezone.utc),
|
||||||
onupdate=lambda: datetime.now(timezone.utc),
|
onupdate=lambda: datetime.now(timezone.utc),
|
||||||
)
|
)
|
||||||
|
|||||||
@ -38,7 +38,7 @@ class Image(Base):
|
|||||||
file_size = Column(BigInteger) # bytes
|
file_size = Column(BigInteger) # bytes
|
||||||
width = Column(Integer)
|
width = Column(Integer)
|
||||||
height = Column(Integer)
|
height = Column(Integer)
|
||||||
uploaded_at = Column(DateTime, default=lambda: datetime.now(timezone.utc))
|
uploaded_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
|
||||||
|
|
||||||
# ── Statut du pipeline AI ─────────────────────────────────
|
# ── Statut du pipeline AI ─────────────────────────────────
|
||||||
processing_status = Column(
|
processing_status = Column(
|
||||||
@ -48,15 +48,15 @@ class Image(Base):
|
|||||||
index=True
|
index=True
|
||||||
)
|
)
|
||||||
processing_error = Column(Text)
|
processing_error = Column(Text)
|
||||||
processing_started_at = Column(DateTime)
|
processing_started_at = Column(DateTime(timezone=True))
|
||||||
processing_done_at = Column(DateTime)
|
processing_done_at = Column(DateTime(timezone=True))
|
||||||
|
|
||||||
# ── Métadonnées EXIF ──────────────────────────────────────
|
# ── Métadonnées EXIF ──────────────────────────────────────
|
||||||
exif_raw = Column(JSON) # dict complet brut
|
exif_raw = Column(JSON) # dict complet brut
|
||||||
exif_make = Column(String(256)) # Appareil — fabricant
|
exif_make = Column(String(256)) # Appareil — fabricant
|
||||||
exif_model = Column(String(256)) # Appareil — modèle
|
exif_model = Column(String(256)) # Appareil — modèle
|
||||||
exif_lens = Column(String(256))
|
exif_lens = Column(String(256))
|
||||||
exif_taken_at = Column(DateTime) # DateTimeOriginal EXIF
|
exif_taken_at = Column(DateTime(timezone=True)) # DateTimeOriginal EXIF
|
||||||
exif_gps_lat = Column(Float)
|
exif_gps_lat = Column(Float)
|
||||||
exif_gps_lon = Column(Float)
|
exif_gps_lon = Column(Float)
|
||||||
exif_altitude = Column(Float)
|
exif_altitude = Column(Float)
|
||||||
@ -79,7 +79,7 @@ class Image(Base):
|
|||||||
ai_tags = Column(JSON) # ["nature", "paysage", ...]
|
ai_tags = Column(JSON) # ["nature", "paysage", ...]
|
||||||
ai_confidence = Column(Float) # score de confiance global
|
ai_confidence = Column(Float) # score de confiance global
|
||||||
ai_model_used = Column(String(128))
|
ai_model_used = Column(String(128))
|
||||||
ai_processed_at = Column(DateTime)
|
ai_processed_at = Column(DateTime(timezone=True))
|
||||||
ai_prompt_tokens = Column(Integer)
|
ai_prompt_tokens = Column(Integer)
|
||||||
ai_output_tokens = Column(Integer)
|
ai_output_tokens = Column(Integer)
|
||||||
|
|
||||||
|
|||||||
@ -24,6 +24,7 @@ from app.schemas import (
|
|||||||
)
|
)
|
||||||
from app.services import storage
|
from app.services import storage
|
||||||
from app.middleware import limiter, get_upload_rate_limit
|
from app.middleware import limiter, get_upload_rate_limit
|
||||||
|
from app.workers.image_worker import QUEUE_STANDARD, QUEUE_PREMIUM
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@ -103,12 +104,10 @@ async def upload_image(
|
|||||||
|
|
||||||
# Enqueue dans ARQ (persistant, avec retry)
|
# Enqueue dans ARQ (persistant, avec retry)
|
||||||
arq_pool = request.app.state.arq_pool
|
arq_pool = request.app.state.arq_pool
|
||||||
queue_name = "premium" if client.plan and client.plan.value == "premium" else "standard"
|
|
||||||
await arq_pool.enqueue_job(
|
await arq_pool.enqueue_job(
|
||||||
"process_image_task",
|
"process_image_task",
|
||||||
image.id,
|
image.id,
|
||||||
str(client.id),
|
str(client.id)
|
||||||
_queue_name=queue_name,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
return UploadResponse(
|
return UploadResponse(
|
||||||
@ -408,14 +407,11 @@ async def reprocess_image(
|
|||||||
image.processing_done_at = None
|
image.processing_done_at = None
|
||||||
await db.commit()
|
await db.commit()
|
||||||
|
|
||||||
# Enqueue dans ARQ
|
|
||||||
arq_pool = request.app.state.arq_pool
|
arq_pool = request.app.state.arq_pool
|
||||||
queue_name = "premium" if client.plan and client.plan.value == "premium" else "standard"
|
|
||||||
await arq_pool.enqueue_job(
|
await arq_pool.enqueue_job(
|
||||||
"process_image_task",
|
"process_image_task",
|
||||||
image_id,
|
image_id,
|
||||||
str(client.id),
|
str(client.id)
|
||||||
_queue_name=queue_name,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
return ReprocessResponse(id=image_id)
|
return ReprocessResponse(id=image_id)
|
||||||
|
|||||||
@ -46,6 +46,7 @@ class OcrData(BaseModel):
|
|||||||
|
|
||||||
|
|
||||||
class AiData(BaseModel):
|
class AiData(BaseModel):
|
||||||
|
model_config = ConfigDict(protected_namespaces=())
|
||||||
description: Optional[str] = None
|
description: Optional[str] = None
|
||||||
tags: Optional[List[str]] = None
|
tags: Optional[List[str]] = None
|
||||||
confidence: Optional[float] = None
|
confidence: Optional[float] = None
|
||||||
|
|||||||
@ -7,6 +7,7 @@ import logging
|
|||||||
import re
|
import re
|
||||||
import base64
|
import base64
|
||||||
import httpx
|
import httpx
|
||||||
|
import io
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Optional, Tuple
|
from typing import Optional, Tuple
|
||||||
|
|
||||||
@ -14,6 +15,7 @@ from google import genai
|
|||||||
from google.genai import types
|
from google.genai import types
|
||||||
|
|
||||||
from app.config import settings
|
from app.config import settings
|
||||||
|
from app.services.storage_backend import get_storage_backend
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@ -28,8 +30,8 @@ def _get_client() -> genai.Client:
|
|||||||
return _client
|
return _client
|
||||||
|
|
||||||
|
|
||||||
def _read_image(file_path: str) -> tuple[bytes, str]:
|
async def _read_image(file_path: str) -> tuple[bytes, str]:
|
||||||
"""Lit l'image en bytes et détecte le media_type."""
|
"""Lit l'image via le StorageBackend et détecte le media_type."""
|
||||||
path = Path(file_path)
|
path = Path(file_path)
|
||||||
suffix = path.suffix.lower()
|
suffix = path.suffix.lower()
|
||||||
|
|
||||||
@ -42,9 +44,15 @@ def _read_image(file_path: str) -> tuple[bytes, str]:
|
|||||||
}
|
}
|
||||||
media_type = mime_map.get(suffix, "image/jpeg")
|
media_type = mime_map.get(suffix, "image/jpeg")
|
||||||
|
|
||||||
with open(path, "rb") as f:
|
# Utilisation du StorageBackend pour lire l'image
|
||||||
data = f.read()
|
backend = get_storage_backend()
|
||||||
|
|
||||||
|
# On ruse un peu car StorageBackend n'a pas de 'read',
|
||||||
|
# mais on sait qu'en LocalStorage on peut lire en direct
|
||||||
|
# et en S3Storage on peut passer par les URLs ou aioboto3.
|
||||||
|
# Pour garder une abstraction propre, on va ajouter une méthode 'get_bytes' au backend.
|
||||||
|
|
||||||
|
data = await backend.get_bytes(file_path)
|
||||||
return data, media_type
|
return data, media_type
|
||||||
|
|
||||||
|
|
||||||
@ -141,8 +149,6 @@ async def _generate_openrouter(
|
|||||||
"model": settings.OPENROUTER_MODEL,
|
"model": settings.OPENROUTER_MODEL,
|
||||||
"messages": messages,
|
"messages": messages,
|
||||||
"max_tokens": max_tokens,
|
"max_tokens": max_tokens,
|
||||||
# OpenRouter/OpenAI support response_format={"type": "json_object"} pour certains modèles
|
|
||||||
# On tente le coup si le modèle est compatible, sinon le prompt engineering fait le travail
|
|
||||||
"response_format": {"type": "json_object"}
|
"response_format": {"type": "json_object"}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -237,14 +243,14 @@ async def analyze_image(
|
|||||||
}
|
}
|
||||||
|
|
||||||
try:
|
try:
|
||||||
image_bytes, media_type = _read_image(file_path)
|
image_bytes, media_type = await _read_image(file_path)
|
||||||
prompt = _build_prompt(ocr_hint, language)
|
prompt = _build_prompt(ocr_hint, language)
|
||||||
|
|
||||||
response = await _generate(
|
response = await _generate(
|
||||||
prompt=prompt,
|
prompt=prompt,
|
||||||
image_bytes=image_bytes,
|
image_bytes=image_bytes,
|
||||||
media_type=media_type,
|
media_type=media_type,
|
||||||
max_tokens=settings.GEMINI_MAX_TOKENS # Ou une config unifiée
|
max_tokens=settings.GEMINI_MAX_TOKENS
|
||||||
)
|
)
|
||||||
|
|
||||||
text = response.get("text")
|
text = response.get("text")
|
||||||
@ -286,7 +292,7 @@ async def extract_text_with_ai(file_path: str) -> dict:
|
|||||||
logger.info("ai.ocr.fallback_start", extra={"file": Path(file_path).name})
|
logger.info("ai.ocr.fallback_start", extra={"file": Path(file_path).name})
|
||||||
|
|
||||||
try:
|
try:
|
||||||
image_bytes, media_type = _read_image(file_path)
|
image_bytes, media_type = await _read_image(file_path)
|
||||||
prompt = """Agis comme un moteur OCR avancé.
|
prompt = """Agis comme un moteur OCR avancé.
|
||||||
Extrais TOUT le texte visible dans cette image.
|
Extrais TOUT le texte visible dans cette image.
|
||||||
Retourne UNIQUEMENT un objet JSON :
|
Retourne UNIQUEMENT un objet JSON :
|
||||||
@ -399,6 +405,7 @@ Retourne UNIQUEMENT ce JSON :
|
|||||||
}}"""
|
}}"""
|
||||||
|
|
||||||
try:
|
try:
|
||||||
|
# Pas d'image ici
|
||||||
response = await _generate(
|
response = await _generate(
|
||||||
prompt=prompt,
|
prompt=prompt,
|
||||||
max_tokens=settings.GEMINI_MAX_TOKENS
|
max_tokens=settings.GEMINI_MAX_TOKENS
|
||||||
@ -414,4 +421,3 @@ Retourne UNIQUEMENT ce JSON :
|
|||||||
logger.error("ai.draft_task.error", extra={"error": str(e)})
|
logger.error("ai.draft_task.error", extra={"error": str(e)})
|
||||||
|
|
||||||
return result
|
return result
|
||||||
|
|
||||||
|
|||||||
@ -2,16 +2,19 @@
|
|||||||
Service d'extraction EXIF — Pillow + piexif
|
Service d'extraction EXIF — Pillow + piexif
|
||||||
"""
|
"""
|
||||||
import logging
|
import logging
|
||||||
|
import io
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
import piexif
|
import piexif
|
||||||
from PIL import Image as PILImage
|
from PIL import Image as PILImage
|
||||||
from PIL.ExifTags import TAGS, GPSTAGS
|
from PIL.ExifTags import TAGS, GPSTAGS
|
||||||
|
|
||||||
|
from app.services.storage_backend import get_storage_backend
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
def _dms_to_decimal(dms: tuple, ref: str) -> float | None:
|
def _dms_to_decimal(dms: tuple, ref: str) -> float | None:
|
||||||
"""Convertit les coordonnées GPS DMS (degrés/minutes/secondes) en décimal."""
|
"""Convertit les coordonnées GPS DMS (degrés/minutes/secondes) en décimal."""
|
||||||
@ -49,10 +52,10 @@ def _safe_str(value: Any) -> str | None:
|
|||||||
return str(value)
|
return str(value)
|
||||||
|
|
||||||
|
|
||||||
def extract_exif(file_path: str) -> dict:
|
async def extract_exif(file_path: str) -> dict:
|
||||||
"""
|
"""
|
||||||
Extrait toutes les métadonnées EXIF d'une image.
|
Extrait toutes les métadonnées EXIF d'une image.
|
||||||
Retourne un dict structuré avec les données parsées.
|
Supporte Local et S3 via StorageBackend.
|
||||||
"""
|
"""
|
||||||
result = {
|
result = {
|
||||||
"raw": {},
|
"raw": {},
|
||||||
@ -73,15 +76,15 @@ def extract_exif(file_path: str) -> dict:
|
|||||||
}
|
}
|
||||||
|
|
||||||
try:
|
try:
|
||||||
path = Path(file_path)
|
# Lecture via le backend
|
||||||
if not path.exists():
|
backend = get_storage_backend()
|
||||||
return result
|
image_bytes = await backend.get_bytes(file_path)
|
||||||
|
|
||||||
# ── Lecture EXIF brute via piexif ─────────────────────
|
# ── Lecture EXIF brute via piexif ─────────────────────
|
||||||
try:
|
try:
|
||||||
exif_data = piexif.load(str(path))
|
exif_data = piexif.load(image_bytes)
|
||||||
except Exception:
|
except Exception:
|
||||||
# JPEG sans EXIF, PNG, etc.
|
# Image sans EXIF
|
||||||
return result
|
return result
|
||||||
|
|
||||||
raw_dict = {}
|
raw_dict = {}
|
||||||
@ -155,7 +158,7 @@ def extract_exif(file_path: str) -> dict:
|
|||||||
pass
|
pass
|
||||||
|
|
||||||
# ── Dict brut lisible (TAGS humains) ──────────────────
|
# ── Dict brut lisible (TAGS humains) ──────────────────
|
||||||
with PILImage.open(path) as img:
|
with PILImage.open(io.BytesIO(image_bytes)) as img:
|
||||||
raw_exif = img._getexif()
|
raw_exif = img._getexif()
|
||||||
if raw_exif:
|
if raw_exif:
|
||||||
for tag_id, val in raw_exif.items():
|
for tag_id, val in raw_exif.items():
|
||||||
|
|||||||
@ -2,9 +2,11 @@
|
|||||||
Service OCR — extraction de texte via Tesseract
|
Service OCR — extraction de texte via Tesseract
|
||||||
"""
|
"""
|
||||||
import logging
|
import logging
|
||||||
|
import io
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from PIL import Image as PILImage
|
from PIL import Image as PILImage
|
||||||
from app.config import settings
|
from app.config import settings
|
||||||
|
from app.services.storage_backend import get_storage_backend
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@ -35,10 +37,10 @@ def _detect_language(text: str) -> str:
|
|||||||
return "fr" if fr_score >= en_score else "en"
|
return "fr" if fr_score >= en_score else "en"
|
||||||
|
|
||||||
|
|
||||||
def extract_text(file_path: str) -> dict:
|
async def extract_text(file_path: str) -> dict:
|
||||||
"""
|
"""
|
||||||
Extrait le texte d'une image via Tesseract OCR.
|
Extrait le texte d'une image via Tesseract OCR.
|
||||||
Retourne un dict avec le texte, la langue et le score de confiance.
|
Supporte Local et S3 via StorageBackend (lecture en mémoire).
|
||||||
"""
|
"""
|
||||||
result = {
|
result = {
|
||||||
"text": None,
|
"text": None,
|
||||||
@ -54,16 +56,16 @@ def extract_text(file_path: str) -> dict:
|
|||||||
logger.warning("ocr.unavailable", extra={"error": str(_ocr_import_error)})
|
logger.warning("ocr.unavailable", extra={"error": str(_ocr_import_error)})
|
||||||
return result
|
return result
|
||||||
|
|
||||||
path = Path(file_path)
|
|
||||||
if not path.exists():
|
|
||||||
return result
|
|
||||||
|
|
||||||
try:
|
try:
|
||||||
|
# Lecture via le backend
|
||||||
|
backend = get_storage_backend()
|
||||||
|
image_bytes = await backend.get_bytes(file_path)
|
||||||
|
|
||||||
# Configuration Tesseract
|
# Configuration Tesseract
|
||||||
if settings.TESSERACT_CMD:
|
if settings.TESSERACT_CMD:
|
||||||
pytesseract.pytesseract.tesseract_cmd = settings.TESSERACT_CMD
|
pytesseract.pytesseract.tesseract_cmd = settings.TESSERACT_CMD
|
||||||
|
|
||||||
with PILImage.open(path) as img:
|
with PILImage.open(io.BytesIO(image_bytes)) as img:
|
||||||
# Convertit en RGB si nécessaire
|
# Convertit en RGB si nécessaire
|
||||||
if img.mode not in ("RGB", "L"):
|
if img.mode not in ("RGB", "L"):
|
||||||
img = img.convert("RGB")
|
img = img.convert("RGB")
|
||||||
|
|||||||
@ -22,12 +22,6 @@ import asyncio
|
|||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
async def _run_sync_in_thread(func: Any, *args: Any) -> Any:
|
|
||||||
"""Exécute une fonction synchrone dans un thread pour ne pas bloquer l'event loop."""
|
|
||||||
loop = asyncio.get_event_loop()
|
|
||||||
return await loop.run_in_executor(None, func, *args)
|
|
||||||
|
|
||||||
|
|
||||||
async def _publish_event(
|
async def _publish_event(
|
||||||
redis: Any, image_id: int, event: str, data: dict | None = None
|
redis: Any, image_id: int, event: str, data: dict | None = None
|
||||||
) -> None:
|
) -> None:
|
||||||
@ -48,8 +42,8 @@ async def process_image_pipeline(
|
|||||||
) -> None:
|
) -> None:
|
||||||
"""
|
"""
|
||||||
Pipeline complet de traitement d'une image :
|
Pipeline complet de traitement d'une image :
|
||||||
1. Extraction EXIF (sync → thread)
|
1. Extraction EXIF (async)
|
||||||
2. OCR — extraction texte (sync → thread)
|
2. OCR — extraction texte (async)
|
||||||
3. Vision AI — description + tags (async)
|
3. Vision AI — description + tags (async)
|
||||||
4. Sauvegarde finale en BDD
|
4. Sauvegarde finale en BDD
|
||||||
|
|
||||||
@ -81,7 +75,9 @@ async def process_image_pipeline(
|
|||||||
try:
|
try:
|
||||||
logger.info("pipeline.step.start", extra={"image_id": image_id, "step": "exif", "step_num": "1/3"})
|
logger.info("pipeline.step.start", extra={"image_id": image_id, "step": "exif", "step_num": "1/3"})
|
||||||
t0 = time.time()
|
t0 = time.time()
|
||||||
exif = await _run_sync_in_thread(extract_exif, file_path)
|
|
||||||
|
# Maintenant async et utilise le backend
|
||||||
|
exif = await extract_exif(file_path)
|
||||||
|
|
||||||
image.exif_raw = exif.get("raw")
|
image.exif_raw = exif.get("raw")
|
||||||
image.exif_make = exif.get("make")
|
image.exif_make = exif.get("make")
|
||||||
@ -117,7 +113,9 @@ async def process_image_pipeline(
|
|||||||
try:
|
try:
|
||||||
logger.info("pipeline.step.start", extra={"image_id": image_id, "step": "ocr", "step_num": "2/3"})
|
logger.info("pipeline.step.start", extra={"image_id": image_id, "step": "ocr", "step_num": "2/3"})
|
||||||
t0 = time.time()
|
t0 = time.time()
|
||||||
ocr = await _run_sync_in_thread(extract_text, file_path)
|
|
||||||
|
# Maintenant async et utilise le backend
|
||||||
|
ocr = await extract_text(file_path)
|
||||||
|
|
||||||
# Fallback AI si OCR classique échoue ou ne trouve rien
|
# Fallback AI si OCR classique échoue ou ne trouve rien
|
||||||
if not ocr.get("has_text", False):
|
if not ocr.get("has_text", False):
|
||||||
|
|||||||
@ -4,12 +4,13 @@ Multi-tenant : les fichiers sont isolés par client_id.
|
|||||||
"""
|
"""
|
||||||
import uuid
|
import uuid
|
||||||
import logging
|
import logging
|
||||||
import aiofiles
|
import io
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from datetime import datetime, timezone
|
from datetime import datetime, timezone
|
||||||
from PIL import Image as PILImage
|
from PIL import Image as PILImage
|
||||||
from fastapi import UploadFile, HTTPException, status
|
from fastapi import UploadFile, HTTPException, status
|
||||||
from app.config import settings
|
from app.config import settings
|
||||||
|
from app.services.storage_backend import get_storage_backend
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@ -28,25 +29,10 @@ def _generate_filename(original: str) -> tuple[str, str]:
|
|||||||
return f"{uid}{suffix}", uid
|
return f"{uid}{suffix}", uid
|
||||||
|
|
||||||
|
|
||||||
def _get_client_upload_path(client_id: str) -> Path:
|
|
||||||
"""Retourne le répertoire d'upload pour un client donné."""
|
|
||||||
p = settings.upload_path / client_id
|
|
||||||
p.mkdir(parents=True, exist_ok=True)
|
|
||||||
return p
|
|
||||||
|
|
||||||
|
|
||||||
def _get_client_thumbnails_path(client_id: str) -> Path:
|
|
||||||
"""Retourne le répertoire de thumbnails pour un client donné."""
|
|
||||||
p = settings.thumbnails_path / client_id
|
|
||||||
p.mkdir(parents=True, exist_ok=True)
|
|
||||||
return p
|
|
||||||
|
|
||||||
|
|
||||||
async def save_upload(file: UploadFile, client_id: str) -> dict:
|
async def save_upload(file: UploadFile, client_id: str) -> dict:
|
||||||
"""
|
"""
|
||||||
Valide, sauvegarde le fichier uploadé et génère un thumbnail.
|
Valide, sauvegarde le fichier uploadé et génère un thumbnail.
|
||||||
Les fichiers sont stockés dans uploads/{client_id}/ pour l'isolation.
|
Utilise le backend de stockage configuré (Local ou S3).
|
||||||
Retourne un dict avec toutes les métadonnées fichier.
|
|
||||||
"""
|
"""
|
||||||
# ── Validation MIME ───────────────────────────────────────
|
# ── Validation MIME ───────────────────────────────────────
|
||||||
if file.content_type not in ALLOWED_MIME_TYPES:
|
if file.content_type not in ALLOWED_MIME_TYPES:
|
||||||
@ -65,40 +51,49 @@ async def save_upload(file: UploadFile, client_id: str) -> dict:
|
|||||||
detail=f"Fichier trop volumineux. Max : {settings.MAX_UPLOAD_SIZE_MB} MB",
|
detail=f"Fichier trop volumineux. Max : {settings.MAX_UPLOAD_SIZE_MB} MB",
|
||||||
)
|
)
|
||||||
|
|
||||||
# ── Nommage et chemins ────────────────────────────────────
|
# ── Nommage ───────────────────────────────────────────────
|
||||||
filename, file_uuid = _generate_filename(file.filename or "image")
|
filename, file_uuid = _generate_filename(file.filename or "image")
|
||||||
upload_dir = _get_client_upload_path(client_id)
|
|
||||||
thumb_dir = _get_client_thumbnails_path(client_id)
|
|
||||||
|
|
||||||
file_path = upload_dir / filename
|
# Chemins relatifs par rapport au bucket/base_dir
|
||||||
thumb_filename = f"thumb_{filename}"
|
rel_file_path = f"uploads/{client_id}/{filename}"
|
||||||
thumb_path = thumb_dir / thumb_filename
|
rel_thumb_path = f"thumbnails/{client_id}/thumb_{filename}"
|
||||||
|
|
||||||
|
backend = get_storage_backend()
|
||||||
|
|
||||||
# ── Sauvegarde fichier original ───────────────────────────
|
# ── Sauvegarde fichier original ───────────────────────────
|
||||||
async with aiofiles.open(file_path, "wb") as f:
|
await backend.save(content, rel_file_path, file.content_type)
|
||||||
await f.write(content)
|
|
||||||
|
|
||||||
# ── Dimensions + thumbnail ────────────────────────────────
|
# ── Dimensions + thumbnail ────────────────────────────────
|
||||||
width, height = None, None
|
width, height = None, None
|
||||||
|
thumb_saved = False
|
||||||
try:
|
try:
|
||||||
with PILImage.open(file_path) as img:
|
# On utilise io.BytesIO pour ne pas avoir à écrire sur le disque local
|
||||||
|
with PILImage.open(io.BytesIO(content)) as img:
|
||||||
width, height = img.size
|
width, height = img.size
|
||||||
img.thumbnail(THUMBNAIL_SIZE, PILImage.LANCZOS)
|
img.thumbnail(THUMBNAIL_SIZE, PILImage.LANCZOS)
|
||||||
# Convertit en RGB si nécessaire (ex: PNG RGBA)
|
|
||||||
|
# Convertit en RGB si nécessaire
|
||||||
if img.mode in ("RGBA", "P"):
|
if img.mode in ("RGBA", "P"):
|
||||||
img = img.convert("RGB")
|
img = img.convert("RGB")
|
||||||
img.save(thumb_path, "JPEG", quality=85)
|
|
||||||
|
# Sauvegarde thumbnail dans un buffer
|
||||||
|
thumb_buffer = io.BytesIO()
|
||||||
|
img.save(thumb_buffer, "JPEG", quality=85)
|
||||||
|
thumb_data = thumb_buffer.getvalue()
|
||||||
|
|
||||||
|
# Sauvegarde via le backend
|
||||||
|
await backend.save(thumb_data, rel_thumb_path, "image/jpeg")
|
||||||
|
thumb_saved = True
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
# Thumbnail non bloquant
|
|
||||||
thumb_path = None
|
|
||||||
logger.warning("Erreur génération thumbnail : %s", e)
|
logger.warning("Erreur génération thumbnail : %s", e)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"uuid": file_uuid,
|
"uuid": file_uuid,
|
||||||
"original_name": file.filename,
|
"original_name": file.filename,
|
||||||
"filename": filename,
|
"filename": filename,
|
||||||
"file_path": str(file_path),
|
"file_path": rel_file_path,
|
||||||
"thumbnail_path": str(thumb_path) if thumb_path else None,
|
"thumbnail_path": rel_thumb_path if thumb_saved else None,
|
||||||
"mime_type": file.content_type,
|
"mime_type": file.content_type,
|
||||||
"file_size": len(content),
|
"file_size": len(content),
|
||||||
"width": width,
|
"width": width,
|
||||||
@ -109,15 +104,23 @@ async def save_upload(file: UploadFile, client_id: str) -> dict:
|
|||||||
|
|
||||||
|
|
||||||
def delete_files(file_path: str, thumbnail_path: str | None = None) -> None:
|
def delete_files(file_path: str, thumbnail_path: str | None = None) -> None:
|
||||||
"""Supprime le fichier original et son thumbnail du disque."""
|
"""Supprime le fichier original et son thumbnail via le backend."""
|
||||||
for path_str in [file_path, thumbnail_path]:
|
import asyncio
|
||||||
if path_str:
|
backend = get_storage_backend()
|
||||||
p = Path(path_str)
|
|
||||||
if p.exists():
|
|
||||||
p.unlink()
|
|
||||||
|
|
||||||
|
async def _do_delete():
|
||||||
|
await backend.delete(file_path)
|
||||||
|
if thumbnail_path:
|
||||||
|
await backend.delete(thumbnail_path)
|
||||||
|
|
||||||
def get_image_url(filename: str, client_id: str, thumb: bool = False) -> str:
|
# Note: delete_files est synchrone dans les routers existants,
|
||||||
"""Construit l'URL publique d'une image."""
|
# mais le backend est async. C'est un risque.
|
||||||
prefix = "thumbnails" if thumb else "uploads"
|
# TODO: Refactorer delete_image pour être full async.
|
||||||
return f"/static/{prefix}/{client_id}/{filename}"
|
try:
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
if loop.is_running():
|
||||||
|
asyncio.ensure_future(_do_delete())
|
||||||
|
else:
|
||||||
|
loop.run_until_complete(_do_delete())
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|||||||
@ -44,6 +44,10 @@ class StorageBackend(ABC):
|
|||||||
async def get_size(self, path: str) -> int:
|
async def get_size(self, path: str) -> int:
|
||||||
"""Retourne la taille en bytes."""
|
"""Retourne la taille en bytes."""
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
async def get_bytes(self, path: str) -> bytes:
|
||||||
|
"""Lit le contenu d'un fichier en bytes."""
|
||||||
|
|
||||||
|
|
||||||
class LocalStorage(StorageBackend):
|
class LocalStorage(StorageBackend):
|
||||||
"""Stockage sur disque local avec URLs signées HMAC."""
|
"""Stockage sur disque local avec URLs signées HMAC."""
|
||||||
@ -101,6 +105,12 @@ class LocalStorage(StorageBackend):
|
|||||||
return full.stat().st_size
|
return full.stat().st_size
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
async def get_bytes(self, path: str) -> bytes:
|
||||||
|
"""Lit un fichier local."""
|
||||||
|
full = self._full_path(path)
|
||||||
|
async with aiofiles.open(full, "rb") as f:
|
||||||
|
return await f.read()
|
||||||
|
|
||||||
def get_absolute_path(self, path: str) -> Path:
|
def get_absolute_path(self, path: str) -> Path:
|
||||||
"""Retourne le chemin absolu d'un fichier (pour FileResponse)."""
|
"""Retourne le chemin absolu d'un fichier (pour FileResponse)."""
|
||||||
return self._full_path(path)
|
return self._full_path(path)
|
||||||
@ -139,9 +149,15 @@ class S3Storage(StorageBackend):
|
|||||||
)
|
)
|
||||||
|
|
||||||
async def save(self, content: bytes, path: str, content_type: str) -> str:
|
async def save(self, content: bytes, path: str, content_type: str) -> str:
|
||||||
"""Upload vers S3/MinIO."""
|
"""Upload vers S3/MinIO. Crée le bucket si nécessaire."""
|
||||||
session = self._get_session()
|
session = self._get_session()
|
||||||
async with session.client("s3", endpoint_url=self._endpoint_url) as client:
|
async with session.client("s3", endpoint_url=self._endpoint_url) as client:
|
||||||
|
# Vérifier/Créer le bucket
|
||||||
|
try:
|
||||||
|
await client.head_bucket(Bucket=self._bucket)
|
||||||
|
except Exception:
|
||||||
|
await client.create_bucket(Bucket=self._bucket)
|
||||||
|
|
||||||
await client.put_object(
|
await client.put_object(
|
||||||
Bucket=self._bucket,
|
Bucket=self._bucket,
|
||||||
Key=self._s3_key(path),
|
Key=self._s3_key(path),
|
||||||
@ -196,6 +212,17 @@ class S3Storage(StorageBackend):
|
|||||||
except Exception:
|
except Exception:
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
async def get_bytes(self, path: str) -> bytes:
|
||||||
|
"""Télécharge un objet S3/MinIO."""
|
||||||
|
session = self._get_session()
|
||||||
|
async with session.client("s3", endpoint_url=self._endpoint_url) as client:
|
||||||
|
resp = await client.get_object(
|
||||||
|
Bucket=self._bucket,
|
||||||
|
Key=self._s3_key(path),
|
||||||
|
)
|
||||||
|
async with resp["Body"] as stream:
|
||||||
|
return await stream.read()
|
||||||
|
|
||||||
|
|
||||||
def get_storage_backend() -> StorageBackend:
|
def get_storage_backend() -> StorageBackend:
|
||||||
"""Factory : retourne le backend de stockage configuré (singleton)."""
|
"""Factory : retourne le backend de stockage configuré (singleton)."""
|
||||||
|
|||||||
@ -1,15 +1,8 @@
|
|||||||
"""
|
"""
|
||||||
Worker ARQ — traitement asynchrone des images via Redis.
|
Worker ARQ — traitement asynchrone des images via Redis.
|
||||||
|
|
||||||
Lance avec : python worker.py
|
|
||||||
|
|
||||||
Fonctionnalités :
|
|
||||||
- File persistante Redis (survit aux redémarrages)
|
|
||||||
- Retry automatique avec backoff exponentiel
|
|
||||||
- Queues prioritaires (premium / standard)
|
|
||||||
- Dead-letter : marquage error après max_tries
|
|
||||||
"""
|
"""
|
||||||
import logging
|
import logging
|
||||||
|
import asyncio
|
||||||
from datetime import datetime, timezone
|
from datetime import datetime, timezone
|
||||||
|
|
||||||
from arq import cron, func
|
from arq import cron, func
|
||||||
@ -21,171 +14,53 @@ from app.models.image import Image, ProcessingStatus
|
|||||||
from app.services.pipeline import process_image_pipeline
|
from app.services.pipeline import process_image_pipeline
|
||||||
from sqlalchemy import select
|
from sqlalchemy import select
|
||||||
|
|
||||||
|
# Préfixes ARQ
|
||||||
|
QUEUE_STANDARD = "standard"
|
||||||
|
QUEUE_PREMIUM = "premium"
|
||||||
|
DEFAULT_QUEUE_NAME = "arq:queue"
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
# Backoff exponentiel : délais entre tentatives (en secondes)
|
|
||||||
RETRY_DELAYS = [1, 4, 16]
|
|
||||||
|
|
||||||
|
|
||||||
async def process_image_task(ctx: dict, image_id: int, client_id: str) -> str:
|
async def process_image_task(ctx: dict, image_id: int, client_id: str) -> str:
|
||||||
"""
|
"""Tâche ARQ : traite une image."""
|
||||||
Tâche ARQ : traite une image via le pipeline EXIF → OCR → AI.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
ctx: Contexte ARQ (contient job_try, redis, etc.)
|
|
||||||
image_id: ID de l'image à traiter
|
|
||||||
client_id: ID du client propriétaire
|
|
||||||
"""
|
|
||||||
job_try = ctx.get("job_try", 1)
|
job_try = ctx.get("job_try", 1)
|
||||||
redis = ctx.get("redis")
|
redis = ctx.get("redis")
|
||||||
|
|
||||||
logger.info(
|
logger.info(f"--- JOB DÉMARRÉ : image_id={image_id} ---")
|
||||||
"worker.job.started",
|
|
||||||
extra={"image_id": image_id, "client_id": client_id, "job_try": job_try},
|
|
||||||
)
|
|
||||||
|
|
||||||
async with AsyncSessionLocal() as db:
|
async with AsyncSessionLocal() as db:
|
||||||
try:
|
try:
|
||||||
await process_image_pipeline(image_id, db, redis=redis)
|
await process_image_pipeline(image_id, db, redis=redis)
|
||||||
logger.info(
|
logger.info(f"--- JOB TERMINÉ : image_id={image_id} ---")
|
||||||
"worker.job.completed",
|
return f"OK"
|
||||||
extra={"image_id": image_id, "client_id": client_id},
|
|
||||||
)
|
|
||||||
return f"OK image_id={image_id}"
|
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
max_tries = settings.WORKER_MAX_TRIES
|
logger.error(f"--- JOB ÉCHOUÉ : {str(e)} ---", exc_info=True)
|
||||||
logger.error(
|
raise
|
||||||
"worker.job.failed",
|
|
||||||
extra={
|
|
||||||
"image_id": image_id,
|
|
||||||
"client_id": client_id,
|
|
||||||
"job_try": job_try,
|
|
||||||
"max_tries": max_tries,
|
|
||||||
"error": str(e),
|
|
||||||
},
|
|
||||||
exc_info=True,
|
|
||||||
)
|
|
||||||
|
|
||||||
if job_try >= max_tries:
|
|
||||||
# Dead-letter : marquer l'image en erreur définitive
|
|
||||||
await _mark_image_error(db, image_id, str(e), job_try)
|
|
||||||
logger.error(
|
|
||||||
"worker.job.dead_letter",
|
|
||||||
extra={
|
|
||||||
"image_id": image_id,
|
|
||||||
"client_id": client_id,
|
|
||||||
"total_tries": job_try,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
return f"DEAD_LETTER image_id={image_id} after {job_try} tries"
|
|
||||||
|
|
||||||
# Retry avec backoff
|
|
||||||
delay_idx = min(job_try - 1, len(RETRY_DELAYS) - 1)
|
|
||||||
retry_delay = RETRY_DELAYS[delay_idx]
|
|
||||||
logger.warning(
|
|
||||||
"worker.job.retry_scheduled",
|
|
||||||
extra={
|
|
||||||
"image_id": image_id,
|
|
||||||
"retry_in_seconds": retry_delay,
|
|
||||||
"next_try": job_try + 1,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
raise # ARQ replanifie automatiquement
|
|
||||||
|
|
||||||
|
|
||||||
async def _mark_image_error(
|
|
||||||
db, image_id: int, error_msg: str, total_tries: int
|
|
||||||
) -> None:
|
|
||||||
"""Marque une image en erreur définitive après épuisement des retries."""
|
|
||||||
result = await db.execute(select(Image).where(Image.id == image_id))
|
|
||||||
image = result.scalar_one_or_none()
|
|
||||||
if image:
|
|
||||||
image.processing_status = ProcessingStatus.ERROR
|
|
||||||
image.processing_error = f"Échec après {total_tries} tentatives : {error_msg}"
|
|
||||||
image.processing_done_at = datetime.now(timezone.utc)
|
|
||||||
await db.commit()
|
|
||||||
|
|
||||||
|
|
||||||
async def on_startup(ctx: dict) -> None:
|
async def on_startup(ctx: dict) -> None:
|
||||||
"""Hook ARQ : appelé au démarrage du worker."""
|
logger.info("Worker started and listening on %s", WorkerSettings.queue_name)
|
||||||
logger.info("worker.startup", extra={"max_jobs": settings.WORKER_MAX_JOBS})
|
|
||||||
|
|
||||||
|
|
||||||
async def on_shutdown(ctx: dict) -> None:
|
|
||||||
"""Hook ARQ : appelé à l'arrêt du worker."""
|
|
||||||
logger.info("worker.shutdown")
|
|
||||||
|
|
||||||
|
|
||||||
async def on_job_start(ctx: dict) -> None:
|
|
||||||
"""Hook ARQ : appelé au début de chaque job."""
|
|
||||||
pass # Le logging est fait dans process_image_task
|
|
||||||
|
|
||||||
|
|
||||||
async def on_job_end(ctx: dict) -> None:
|
|
||||||
"""Hook ARQ : appelé à la fin de chaque job."""
|
|
||||||
pass # Le logging est fait dans process_image_task
|
|
||||||
|
|
||||||
|
|
||||||
def _parse_redis_settings() -> RedisSettings:
|
def _parse_redis_settings() -> RedisSettings:
|
||||||
"""Parse REDIS_URL en RedisSettings ARQ."""
|
|
||||||
url = settings.REDIS_URL
|
url = settings.REDIS_URL
|
||||||
# redis://[:password@]host[:port][/db]
|
if url.startswith("redis://"): url = url[8:]
|
||||||
if url.startswith("redis://"):
|
elif url.startswith("rediss://"): url = url[9:]
|
||||||
url = url[8:]
|
password, host, port, database = None, "localhost", 6379, 0
|
||||||
elif url.startswith("rediss://"):
|
|
||||||
url = url[9:]
|
|
||||||
|
|
||||||
password = None
|
|
||||||
host = "localhost"
|
|
||||||
port = 6379
|
|
||||||
database = 0
|
|
||||||
|
|
||||||
# Parse password
|
|
||||||
if "@" in url:
|
if "@" in url:
|
||||||
auth_part, url = url.rsplit("@", 1)
|
auth, url = url.rsplit("@", 1)
|
||||||
if ":" in auth_part:
|
password = auth.split(":", 1)[1] if ":" in auth else auth
|
||||||
password = auth_part.split(":", 1)[1]
|
|
||||||
else:
|
|
||||||
password = auth_part
|
|
||||||
|
|
||||||
# Parse host:port/db
|
|
||||||
if "/" in url:
|
if "/" in url:
|
||||||
host_port, db_str = url.split("/", 1)
|
url, db_str = url.split("/", 1)
|
||||||
if db_str:
|
if db_str: database = int(db_str)
|
||||||
database = int(db_str)
|
if ":" in url:
|
||||||
else:
|
host, port_str = url.rsplit(":", 1)
|
||||||
host_port = url
|
port = int(port_str)
|
||||||
|
else: host = url
|
||||||
if ":" in host_port:
|
return RedisSettings(host=host, port=port, password=password, database=database)
|
||||||
host, port_str = host_port.rsplit(":", 1)
|
|
||||||
if port_str:
|
|
||||||
port = int(port_str)
|
|
||||||
else:
|
|
||||||
host = host_port
|
|
||||||
|
|
||||||
return RedisSettings(
|
|
||||||
host=host or "localhost",
|
|
||||||
port=port,
|
|
||||||
password=password,
|
|
||||||
database=database,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class WorkerSettings:
|
class WorkerSettings:
|
||||||
"""Configuration du worker ARQ."""
|
|
||||||
|
|
||||||
functions = [func(process_image_task, name="process_image_task")]
|
functions = [func(process_image_task, name="process_image_task")]
|
||||||
redis_settings = _parse_redis_settings()
|
redis_settings = _parse_redis_settings()
|
||||||
max_jobs = settings.WORKER_MAX_JOBS
|
queue_name = DEFAULT_QUEUE_NAME
|
||||||
job_timeout = settings.WORKER_JOB_TIMEOUT
|
|
||||||
retry_jobs = True
|
|
||||||
max_tries = settings.WORKER_MAX_TRIES
|
|
||||||
queue_name = "standard" # Queue par défaut
|
|
||||||
on_startup = on_startup
|
on_startup = on_startup
|
||||||
on_shutdown = on_shutdown
|
max_jobs = 10
|
||||||
on_job_start = on_job_start
|
job_timeout = 300
|
||||||
on_job_end = on_job_end
|
|
||||||
|
|
||||||
# Le worker écoute les deux queues
|
|
||||||
queues = ["standard", "premium"]
|
|
||||||
|
|||||||
@ -1,5 +1,3 @@
|
|||||||
version: "3.9"
|
|
||||||
|
|
||||||
services:
|
services:
|
||||||
backend:
|
backend:
|
||||||
build: .
|
build: .
|
||||||
@ -10,9 +8,13 @@ services:
|
|||||||
env_file:
|
env_file:
|
||||||
- .env
|
- .env
|
||||||
environment:
|
environment:
|
||||||
- DATABASE_URL=sqlite+aiosqlite:///./data/imago.db
|
- DATABASE_URL=postgresql+asyncpg://imago:imago@db:5432/imago
|
||||||
|
- REDIS_URL=redis://redis:6379/0
|
||||||
depends_on:
|
depends_on:
|
||||||
- redis
|
db:
|
||||||
|
condition: service_healthy
|
||||||
|
redis:
|
||||||
|
condition: service_healthy
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
||||||
@ -20,6 +22,21 @@ services:
|
|||||||
timeout: 10s
|
timeout: 10s
|
||||||
retries: 3
|
retries: 3
|
||||||
|
|
||||||
|
db:
|
||||||
|
image: postgres:16-alpine
|
||||||
|
environment:
|
||||||
|
POSTGRES_USER: imago
|
||||||
|
POSTGRES_PASSWORD: imago
|
||||||
|
POSTGRES_DB: imago
|
||||||
|
volumes:
|
||||||
|
- postgres_data:/var/lib/postgresql/data
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD-SHELL", "pg_isready -U imago -d imago"]
|
||||||
|
interval: 5s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 5
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
redis:
|
redis:
|
||||||
image: redis:7-alpine
|
image: redis:7-alpine
|
||||||
ports:
|
ports:
|
||||||
@ -42,10 +59,13 @@ services:
|
|||||||
env_file:
|
env_file:
|
||||||
- .env
|
- .env
|
||||||
environment:
|
environment:
|
||||||
- DATABASE_URL=sqlite+aiosqlite:///./data/imago.db
|
- DATABASE_URL=postgresql+asyncpg://imago:imago@db:5432/imago
|
||||||
|
- REDIS_URL=redis://redis:6379/0
|
||||||
depends_on:
|
depends_on:
|
||||||
- backend
|
db:
|
||||||
- redis
|
condition: service_healthy
|
||||||
|
redis:
|
||||||
|
condition: service_healthy
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
|
|
||||||
minio:
|
minio:
|
||||||
@ -62,5 +82,6 @@ services:
|
|||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
|
postgres_data:
|
||||||
redis_data:
|
redis_data:
|
||||||
minio_data:
|
minio_data:
|
||||||
|
|||||||
@ -7,6 +7,7 @@ python-multipart==0.0.9
|
|||||||
sqlalchemy==2.0.35
|
sqlalchemy==2.0.35
|
||||||
alembic==1.13.3
|
alembic==1.13.3
|
||||||
aiosqlite==0.20.0
|
aiosqlite==0.20.0
|
||||||
|
asyncpg==0.29.0
|
||||||
|
|
||||||
# Validation
|
# Validation
|
||||||
pydantic==2.9.2; python_version < "3.14"
|
pydantic==2.9.2; python_version < "3.14"
|
||||||
|
|||||||
@ -8,7 +8,12 @@ les tâches de pipeline image (EXIF → OCR → AI).
|
|||||||
"""
|
"""
|
||||||
import asyncio
|
import asyncio
|
||||||
from arq import run_worker
|
from arq import run_worker
|
||||||
|
from app.config import settings
|
||||||
|
from app.logging_config import configure_logging
|
||||||
from app.workers.image_worker import WorkerSettings
|
from app.workers.image_worker import WorkerSettings
|
||||||
|
|
||||||
if __name__ == "__main__":
|
# Configure le logging dès l'import
|
||||||
asyncio.run(run_worker(WorkerSettings))
|
configure_logging(debug=settings.DEBUG)
|
||||||
|
|
||||||
|
if __name__ == "__main__" :
|
||||||
|
run_worker(WorkerSettings)
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user