# [TASK] Nouvelle section "HĂŽtes Docker" + monitoring + actions + notifications ## 🎯 Objectif principal Ajouter une fonctionnalitĂ© complĂšte de gestion Docker au Homelab Dashboard existant, permettant : - Surveillance multi-hosts Docker en temps rĂ©el - Actions sur containers (start/stop/restart/redeploy/logs) - DĂ©tection proactive et alerting sur containers down - IntĂ©gration harmonieuse avec l'architecture existante --- ## 📐 Contraintes d'architecture OBLIGATOIRES ### Stack technique existante (Ă  respecter strictement) ```yaml Backend: - FastAPI (routes/ + services/ + models/ + schemas/) - SQLAlchemy 2.x async (data/homelab.db) - Alembic pour migrations - APScheduler (jobs pĂ©riodiques dĂ©jĂ  configurĂ©s) - WebSocket temps rĂ©el (websocket_manager.py) - Auth JWT (app.auth_utils + OAuth2PasswordBearer) - Notifications ntfy (services/notifications.py) Frontend: - index.html + main.js (vanilla JS) - Tailwind CSS - Anime.js pour animations - Pattern navigation par sections (dashboard, hosts, tasks, schedules, etc.) Infrastructure: - Ansible pour automation (inventaire hosts.yml existant) - SSH dĂ©jĂ  configurĂ© (automation user + clĂ©s) - Bootstrap SSH existant (services/bootstrap.py) ``` ### ModĂšles DB existants Ă  Ă©tendre (NE PAS recrĂ©er) ```python # models/host.py - TABLE EXISTANTE class Host(Base): __tablename__ = "hosts" id: int name: str host: str # IP/hostname os_type: str status: str # online/offline bootstrap_status: dict last_seen_at: datetime # À ÉTENDRE avec : docker_enabled, docker_version, docker_status # models/task.py - TABLE EXISTANTE class Task(Base): __tablename__ = "tasks" id: int action: str status: str # pending/running/success/failed # RĂ©utiliser pour actions Docker # À CRÉER (nouvelles tables uniquement) # - docker_containers # - docker_images # - docker_volumes # - docker_alerts ``` --- ## 🔧 DĂ©cisions techniques IMPOSÉES (pas de choix) ### 1. Collecte Docker : **SSH + docker CLI** (rĂ©utiliser pattern Ansible) **Justification** : - ✅ SSH dĂ©jĂ  configurĂ© pour tous les hosts (user automation + clĂ©s) - ✅ ajouter la collecte Docker au processus de collecte des mĂ©triques dĂ©jĂ  en place. - ✅ Pas de config supplĂ©mentaire sur les hosts (pas de TLS Docker API) - ✅ MĂȘme pattern que Ansible (cohĂ©rence) - ✅ Parse JSON : `docker ps --format json`, `docker inspect`, etc. **ImplĂ©mentation** : ```python # services/docker_service.py async def collect_docker_host(host_id: int): host = await get_host(host_id) ssh = await ssh_connect(host.host, user="automation") # Version Docker version = await ssh_exec(ssh, "docker version --format '{{json .}}'") # Containers containers = await ssh_exec(ssh, "docker ps -a --format '{{json .}}' --no-trunc") # Images images = await ssh_exec(ssh, "docker images --format '{{json .}}'") # Volumes volumes = await ssh_exec(ssh, "docker volume ls --format '{{json .}}'") # System df df = await ssh_exec(ssh, "docker system df -v --format '{{json .}}'") ``` ### 2. Stockage : **Étendre tables existantes + crĂ©er tables Docker** ```sql -- Migration Alembic Ă  crĂ©er ALTER TABLE hosts ADD COLUMN docker_enabled BOOLEAN DEFAULT FALSE; ALTER TABLE hosts ADD COLUMN docker_version TEXT; ALTER TABLE hosts ADD COLUMN docker_last_collect_at TIMESTAMP; CREATE TABLE docker_containers ( id INTEGER PRIMARY KEY, host_id INTEGER REFERENCES hosts(id), container_id TEXT NOT NULL, name TEXT NOT NULL, image TEXT, state TEXT, -- running/exited/paused status TEXT, -- Up 2 hours, Exited (0) 5 minutes ago health TEXT, -- healthy/unhealthy/starting/none created_at TIMESTAMP, ports JSON, labels JSON, compose_project TEXT, -- com.docker.compose.project last_update_at TIMESTAMP, UNIQUE(host_id, container_id) ); CREATE TABLE docker_images ( id INTEGER PRIMARY KEY, host_id INTEGER REFERENCES hosts(id), image_id TEXT NOT NULL, repo_tags JSON, -- ["nginx:latest", "nginx:1.25"] size BIGINT, created TIMESTAMP, last_update_at TIMESTAMP, UNIQUE(host_id, image_id) ); CREATE TABLE docker_volumes ( id INTEGER PRIMARY KEY, host_id INTEGER REFERENCES hosts(id), name TEXT NOT NULL, driver TEXT, mountpoint TEXT, scope TEXT, last_update_at TIMESTAMP, UNIQUE(host_id, name) ); CREATE TABLE docker_alerts ( id INTEGER PRIMARY KEY, host_id INTEGER REFERENCES hosts(id), container_name TEXT NOT NULL, severity TEXT, -- warning/error/critical state TEXT, -- open/closed message TEXT, opened_at TIMESTAMP NOT NULL, closed_at TIMESTAMP, last_notified_at TIMESTAMP, INDEX idx_alerts_open (state, host_id) ); ``` ### 3. Scheduler : **Étendre APScheduler existant** ```python # app_optimized.py - AJOUTER au startup from services.docker_collector import DockerCollector @app.on_event("startup") async def start_docker_collector(): collector = DockerCollector(db_session, ws_manager, ntfy_service) # Job pĂ©riodique : collecter tous les hosts Docker enabled scheduler.add_job( collector.collect_all_hosts, trigger="interval", seconds=60, # Toutes les minutes id="docker_collect", name="Docker Metrics Collection" ) # Job pĂ©riodique : vĂ©rifier alertes containers down scheduler.add_job( collector.check_alerts, trigger="interval", seconds=30, id="docker_alerts", name="Docker Alerts Check" ) ``` --- ## 📊 API Routes Ă  crĂ©er (prefix /api/docker) ```python # routes/docker.py router = APIRouter(prefix="/api/docker", tags=["docker"]) @router.get("/hosts") async def list_docker_hosts( current_user: User = Depends(get_current_user) ): """Liste tous les hosts avec Docker enabled""" @router.post("/hosts/{host_id}/enable") async def enable_docker_monitoring( host_id: int, current_user: User = Depends(require_role("admin")) ): """Active la surveillance Docker sur un host""" @router.post("/hosts/{host_id}/collect") async def collect_docker_now( host_id: int, current_user: User = Depends(require_role("operator")) ): """Force une collecte immĂ©diate""" @router.get("/hosts/{host_id}/containers") async def get_containers(host_id: int): """Liste containers d'un host""" @router.post("/containers/{host_id}/{container_id}/start") async def start_container( host_id: int, container_id: str, current_user: User = Depends(require_role("operator")) ): """DĂ©marre un container""" @router.post("/containers/{host_id}/{container_id}/stop") @router.post("/containers/{host_id}/{container_id}/restart") @router.post("/containers/{host_id}/{container_id}/remove") @router.post("/containers/{host_id}/{container_id}/redeploy") @router.get("/containers/{host_id}/{container_id}/logs") async def get_container_logs( host_id: int, container_id: str, tail: int = 200 ): """RĂ©cupĂšre logs d'un container""" @router.get("/containers/{host_id}/{container_id}/inspect") async def inspect_container(host_id: int, container_id: str): """DĂ©tails complets JSON d'un container""" @router.get("/alerts") async def list_alerts( host_id: Optional[int] = None, state: Optional[str] = "open" ): """Liste des alertes Docker""" @router.post("/alerts/{alert_id}/ack") async def acknowledge_alert( alert_id: int, current_user: User = Depends(require_role("operator")) ): """Accuser rĂ©ception d'une alerte""" ``` --- ## 🔔 Logique d'alerting (dĂ©tection containers down) ### RĂšgles de dĂ©tection ```python # services/docker_alerts.py async def check_container_alerts(session: AsyncSession): """ VĂ©rifie tous les containers critiques et gĂ©nĂšre des alertes """ # RĂ©cupĂ©rer containers avec label homelab.monitor=true critical_containers = await session.execute( select(DockerContainer) .where(DockerContainer.labels.contains({"homelab.monitor": "true"})) ) for container in critical_containers: expected_state = container.labels.get("homelab.desired", "running") # Cas 1 : Container arrĂȘtĂ© alors qu'il devrait tourner if expected_state == "running" and container.state != "running": await open_alert( host_id=container.host_id, container_name=container.name, severity="error", message=f"Container {container.name} is {container.state}, expected running" ) # Cas 2 : Container unhealthy if container.health == "unhealthy": await open_alert( host_id=container.host_id, container_name=container.name, severity="warning", message=f"Container {container.name} health check failing" ) # Cas 3 : Container OK -> fermer alerte si ouverte if container.state == "running" and container.health in ["healthy", "none"]: await close_alert(container.host_id, container.name) async def open_alert(host_id: int, container_name: str, severity: str, message: str): """ Ouvre une alerte et envoie notification ntfy """ # VĂ©rifier si alerte dĂ©jĂ  ouverte existing = await get_open_alert(host_id, container_name) if existing: # Mettre Ă  jour timestamp existing.last_notified_at = datetime.utcnow() return # CrĂ©er nouvelle alerte alert = DockerAlert( host_id=host_id, container_name=container_name, severity=severity, state="open", message=message, opened_at=datetime.utcnow() ) session.add(alert) await session.commit() # Notification ntfy host = await get_host(host_id) await ntfy_service.send_notification( topic="homelab-docker", title=f"🚹 Docker Alert - {host.name}", message=f"{container_name}: {message}", priority=4, tags=["warning", "docker"] ) # WebSocket temps rĂ©el await ws_manager.broadcast({ "type": "docker_alert_opened", "alert": alert.to_dict() }) ``` --- ## 🎹 UI/UX Frontend (intĂ©gration dans index.html + main.js) ### Navigation (ajouter dans index.html) ```html ``` ### Section Docker (nouvelle section HTML) ```html ``` ### Logique JavaScript (main.js) ```javascript // Gestion section Docker const dockerSection = { async init() { await this.loadDockerHosts(); this.setupWebSocket(); this.setupEventListeners(); }, async loadDockerHosts() { const response = await fetchAPI('/api/docker/hosts'); this.renderHostsGrid(response.hosts); }, renderHostsGrid(hosts) { const grid = document.getElementById('docker-hosts-grid'); grid.innerHTML = hosts.map(host => `

${host.name}

${host.docker_status}
${host.containers_running}/${host.containers_total} containers
${host.open_alerts} alerts
Last: ${formatRelativeTime(host.docker_last_collect_at)}
`).join(''); }, async viewDetails(hostId) { const [containers, images, volumes, alerts] = await Promise.all([ fetchAPI(`/api/docker/hosts/${hostId}/containers`), fetchAPI(`/api/docker/hosts/${hostId}/images`), fetchAPI(`/api/docker/hosts/${hostId}/volumes`), fetchAPI(`/api/docker/alerts?host_id=${hostId}`) ]); this.renderContainersTab(containers); showModal('docker-detail-modal'); }, renderContainersTab(containers) { const tbody = document.querySelector('#containers-table tbody'); tbody.innerHTML = containers.map(c => ` ${c.name} ${c.compose_project ? `${c.compose_project}` : ''} ${c.image} ${c.state} ${c.health || 'none'} ${this.formatPorts(c.ports)} ${formatRelativeTime(c.created_at)}
${c.state !== 'running' ? `` : ''} ${c.state === 'running' ? `` : ''}
`).join(''); }, async startContainer(hostId, containerId) { await fetchAPI(`/api/docker/containers/${hostId}/${containerId}/start`, { method: 'POST' }); showToast('Container started successfully', 'success'); await this.viewDetails(hostId); // Refresh }, setupWebSocket() { ws.addEventListener('message', (event) => { const data = JSON.parse(event.data); if (data.type === 'docker_host_updated') { this.updateHostCard(data.host); } if (data.type === 'docker_alert_opened') { this.showAlertNotification(data.alert); this.updateAlertsBadge(); } }); } }; ``` --- ## đŸ§Ș Tests obligatoires (minimum 8 tests) ### Backend tests (pytest + pytest-asyncio) ```python # tests/test_docker_service.py @pytest.mark.asyncio async def test_collect_docker_host(mock_ssh): """Test collecte Docker rĂ©ussie""" mock_ssh.exec_command.return_value = '{"Version": "24.0.7"}' result = await docker_service.collect_docker_host(host_id=1) assert result.docker_version == "24.0.7" @pytest.mark.asyncio async def test_detect_container_down(): """Test dĂ©tection container arrĂȘtĂ©""" container = create_test_container( state="exited", labels={"homelab.monitor": "true", "homelab.desired": "running"} ) alerts = await docker_alerts.check_container_alerts([container]) assert len(alerts) == 1 assert alerts[0].severity == "error" @pytest.mark.asyncio async def test_start_container(mock_ssh): """Test dĂ©marrage container""" mock_ssh.exec_command.return_value = "container_id" result = await docker_actions.start_container(host_id=1, container_id="abc123") assert result.success is True mock_ssh.exec_command.assert_called_with("docker start abc123") @pytest.mark.asyncio async def test_alert_notification_sent(mock_ntfy): """Test notification ntfy envoyĂ©e lors alerte""" await docker_alerts.open_alert( host_id=1, container_name="nginx", severity="error", message="Container down" ) assert mock_ntfy.send_notification.called assert "nginx" in mock_ntfy.call_args.kwargs['message'] ``` ### Frontend tests (Jest ou Ă©quivalent vanilla) ```javascript // tests/docker_section.test.js test('renderHostsGrid displays correct number of cards', () => { const hosts = [ {id: 1, name: 'host1', docker_status: 'online'}, {id: 2, name: 'host2', docker_status: 'offline'} ]; dockerSection.renderHostsGrid(hosts); const cards = document.querySelectorAll('.docker-host-card'); expect(cards.length).toBe(2); }); test('container action buttons reflect state', () => { const runningContainer = {state: 'running'}; const stoppedContainer = {state: 'exited'}; const html1 = dockerSection.renderContainerRow(runningContainer); expect(html1).toContain('fa-stop'); expect(html1).not.toContain('fa-play'); const html2 = dockerSection.renderContainerRow(stoppedContainer); expect(html2).toContain('fa-play'); expect(html2).not.toContain('fa-stop'); }); test('WebSocket updates host card in realtime', async () => { const ws = new MockWebSocket(); dockerSection.setupWebSocket(); ws.emit({ type: 'docker_host_updated', host: {id: 1, containers_running: 5} }); await nextTick(); const card = document.querySelector('[data-host-id="1"]'); expect(card.textContent).toContain('5/'); }); ``` --- ## 📋 Checklist "Definition of Done" ### Backend ✅ - [ ] Migration Alembic créée et testĂ©e (docker_containers, docker_images, etc.) - [ ] Service `docker_service.py` avec collecte SSH + parsing JSON - [ ] Service `docker_actions.py` avec start/stop/restart/remove/redeploy - [ ] Service `docker_alerts.py` avec logique de dĂ©tection + notifications ntfy - [ ] Routes `/api/docker/*` complĂštes avec auth JWT - [ ] Jobs APScheduler ajoutĂ©s (collect + alerts) - [ ] WebSocket events Ă©mis (docker_host_updated, docker_alert_opened) - [ ] Gestion erreurs robuste (SSH timeout, docker unreachable, parsing errors) - [ ] 6+ tests backend passants ### Frontend ✅ - [ ] Section "Docker Hosts" ajoutĂ©e au menu navigation - [ ] Vue liste hosts Docker (cards avec mĂ©triques) - [ ] Modal dĂ©tails host avec tabs (Containers / Images / Volumes / Alerts) - [ ] Actions containers fonctionnelles (start/stop/restart/logs/inspect/remove) - [ ] Confirmations modales sur actions destructives (remove, redeploy) - [ ] Logs container (drawer avec tail + auto-refresh) - [ ] Inspect container (modal JSON viewer) - [ ] WebSocket live updates (hosts + alerts) - [ ] Animations cohĂ©rentes avec le reste du dashboard - [ ] 4+ tests frontend passants ### SĂ©curitĂ© ✅ - [ ] Toutes les actions Docker nĂ©cessitent auth JWT (rĂŽle operator minimum) - [ ] Actions destructives (remove) nĂ©cessitent rĂŽle admin - [ ] Timeouts SSH stricts (5s connect, 15s exec) - [ ] Validation Pydantic sur tous les inputs - [ ] Pas d'exĂ©cution de commandes arbitraires - [ ] Logs serveur structurĂ©s (pas de secrets loggĂ©s) ### Documentation ✅ - [ ] README.md mis Ă  jour (section Docker) - [ ] Exemples curl pour endpoints Docker - [ ] Guide configuration labels `homelab.monitor` et `homelab.desired` - [ ] Instructions migration Alembic --- ## 🚀 Instructions d'exĂ©cution ### 1. Appliquer la migration DB ```bash cd homelab-automation-api-v2 alembic revision --autogenerate -m "Add Docker management tables" alembic upgrade head ``` ### 2. Activer Docker sur un host (via UI ou API) ```bash # Via API curl -X POST -H "Authorization: Bearer $TOKEN" \ http://localhost:8000/api/docker/hosts/1/enable # Via UI : Section "Hosts" > Clic host > Bouton "Enable Docker" ``` ### 3. Labelliser containers critiques ```yaml # docker-compose.yml services: nginx: image: nginx:latest labels: homelab.monitor: "true" homelab.desired: "running" ``` ### 4. Tester collecte manuelle ```bash curl -X POST -H "Authorization: Bearer $TOKEN" \ http://localhost:8000/api/docker/hosts/1/collect ``` ### 5. VĂ©rifier alertes ```bash curl -H "Authorization: Bearer $TOKEN" \ http://localhost:8000/api/docker/alerts ``` --- ## 🎁 Features bonus (si temps disponible) ### PrioritĂ© 1 (haute valeur, faible coĂ»t) - **Compose awareness** : Grouper containers par `com.docker.compose.project` - **Resource stats** : `docker stats --no-stream` (CPU/mem snapshot) - **Bulk actions** : Restart tous containers d'un projet compose ### PrioritĂ© 2 (bonne valeur, coĂ»t moyen) - **Event timeline** : Journal des actions Docker dans vue host - **Auto-remediation** : Flag `homelab.auto_restart=true` → restart auto si down - **Networks tab** : Liste networks Docker + containers attachĂ©s ### PrioritĂ© 3 (nice-to-have, coĂ»t Ă©levĂ©) - **Prune management** : Nettoyage images/volumes (danger zone + admin uniquement) - **Image scanning** : VulnĂ©rabilitĂ©s via Trivy (si installĂ© sur hosts) - **Logs streaming** : WebSocket real-time logs (au lieu de tail statique) --- ## ⚠ Risques et mitigation | Risque | Impact | Mitigation | |--------|--------|-----------| | SSH timeout sur collecte | Hosts marquĂ©s offline | Retry logic + timeout adaptatif (5s → 10s → 30s) | | Parsing JSON Docker Ă©choue | Collecte partielle | Try/catch par entity (containers/images/volumes) | | WebSocket spam si many hosts | UI lag | Throttle broadcasts (max 1/sec par type) | | Actions Docker simultanĂ©es | Race conditions | Lock par container_id (asyncio.Lock) | | Alerte spam si container flapping | Notification fatigue | Cooldown 5min entre notifications mĂȘme alerte | --- ## 📊 MĂ©triques de succĂšs - ✅ Collecte Docker rĂ©ussie sur 3+ hosts simultanĂ©s sans timeout - ✅ DĂ©tection container down < 60s aprĂšs arrĂȘt rĂ©el - ✅ Notification ntfy reçue dans les 5s aprĂšs ouverture alerte - ✅ Actions containers (start/stop) < 3s (hors dĂ©lai Docker lui-mĂȘme) -