Some checks failed
Tests / Backend Tests (Python) (3.10) (push) Has been cancelled
Tests / Backend Tests (Python) (3.11) (push) Has been cancelled
Tests / Backend Tests (Python) (3.12) (push) Has been cancelled
Tests / Frontend Tests (JS) (push) Has been cancelled
Tests / Integration Tests (push) Has been cancelled
Tests / All Tests Passed (push) Has been cancelled
776 lines
25 KiB
Markdown
776 lines
25 KiB
Markdown
# [TASK] Nouvelle section "Hôtes Docker" + monitoring + actions + notifications
|
|
|
|
## 🎯 Objectif principal
|
|
|
|
Ajouter une fonctionnalité complète de gestion Docker au Homelab Dashboard existant, permettant :
|
|
- Surveillance multi-hosts Docker en temps réel
|
|
- Actions sur containers (start/stop/restart/redeploy/logs)
|
|
- Détection proactive et alerting sur containers down
|
|
- Intégration harmonieuse avec l'architecture existante
|
|
|
|
---
|
|
|
|
## 📐 Contraintes d'architecture OBLIGATOIRES
|
|
|
|
### Stack technique existante (à respecter strictement)
|
|
|
|
```yaml
|
|
Backend:
|
|
- FastAPI (routes/ + services/ + models/ + schemas/)
|
|
- SQLAlchemy 2.x async (data/homelab.db)
|
|
- Alembic pour migrations
|
|
- APScheduler (jobs périodiques déjà configurés)
|
|
- WebSocket temps réel (websocket_manager.py)
|
|
- Auth JWT (app.auth_utils + OAuth2PasswordBearer)
|
|
- Notifications ntfy (services/notifications.py)
|
|
|
|
Frontend:
|
|
- index.html + main.js (vanilla JS)
|
|
- Tailwind CSS
|
|
- Anime.js pour animations
|
|
- Pattern navigation par sections (dashboard, hosts, tasks, schedules, etc.)
|
|
|
|
Infrastructure:
|
|
- Ansible pour automation (inventaire hosts.yml existant)
|
|
- SSH déjà configuré (automation user + clés)
|
|
- Bootstrap SSH existant (services/bootstrap.py)
|
|
```
|
|
|
|
### Modèles DB existants à étendre (NE PAS recréer)
|
|
|
|
```python
|
|
# models/host.py - TABLE EXISTANTE
|
|
class Host(Base):
|
|
__tablename__ = "hosts"
|
|
id: int
|
|
name: str
|
|
host: str # IP/hostname
|
|
os_type: str
|
|
status: str # online/offline
|
|
bootstrap_status: dict
|
|
last_seen_at: datetime
|
|
# À ÉTENDRE avec : docker_enabled, docker_version, docker_status
|
|
|
|
# models/task.py - TABLE EXISTANTE
|
|
class Task(Base):
|
|
__tablename__ = "tasks"
|
|
id: int
|
|
action: str
|
|
status: str # pending/running/success/failed
|
|
# Réutiliser pour actions Docker
|
|
|
|
# À CRÉER (nouvelles tables uniquement)
|
|
# - docker_containers
|
|
# - docker_images
|
|
# - docker_volumes
|
|
# - docker_alerts
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 Décisions techniques IMPOSÉES (pas de choix)
|
|
|
|
### 1. Collecte Docker : **SSH + docker CLI** (réutiliser pattern Ansible)
|
|
|
|
**Justification** :
|
|
- ✅ SSH déjà configuré pour tous les hosts (user automation + clés)
|
|
- ✅ ajouter la collecte Docker au processus de collecte des métriques déjà en place.
|
|
- ✅ Pas de config supplémentaire sur les hosts (pas de TLS Docker API)
|
|
- ✅ Même pattern que Ansible (cohérence)
|
|
- ✅ Parse JSON : `docker ps --format json`, `docker inspect`, etc.
|
|
|
|
**Implémentation** :
|
|
```python
|
|
# services/docker_service.py
|
|
async def collect_docker_host(host_id: int):
|
|
host = await get_host(host_id)
|
|
ssh = await ssh_connect(host.host, user="automation")
|
|
|
|
# Version Docker
|
|
version = await ssh_exec(ssh, "docker version --format '{{json .}}'")
|
|
|
|
# Containers
|
|
containers = await ssh_exec(ssh,
|
|
"docker ps -a --format '{{json .}}' --no-trunc")
|
|
|
|
# Images
|
|
images = await ssh_exec(ssh,
|
|
"docker images --format '{{json .}}'")
|
|
|
|
# Volumes
|
|
volumes = await ssh_exec(ssh,
|
|
"docker volume ls --format '{{json .}}'")
|
|
|
|
# System df
|
|
df = await ssh_exec(ssh, "docker system df -v --format '{{json .}}'")
|
|
```
|
|
|
|
### 2. Stockage : **Étendre tables existantes + créer tables Docker**
|
|
|
|
```sql
|
|
-- Migration Alembic à créer
|
|
ALTER TABLE hosts ADD COLUMN docker_enabled BOOLEAN DEFAULT FALSE;
|
|
ALTER TABLE hosts ADD COLUMN docker_version TEXT;
|
|
ALTER TABLE hosts ADD COLUMN docker_last_collect_at TIMESTAMP;
|
|
|
|
CREATE TABLE docker_containers (
|
|
id INTEGER PRIMARY KEY,
|
|
host_id INTEGER REFERENCES hosts(id),
|
|
container_id TEXT NOT NULL,
|
|
name TEXT NOT NULL,
|
|
image TEXT,
|
|
state TEXT, -- running/exited/paused
|
|
status TEXT, -- Up 2 hours, Exited (0) 5 minutes ago
|
|
health TEXT, -- healthy/unhealthy/starting/none
|
|
created_at TIMESTAMP,
|
|
ports JSON,
|
|
labels JSON,
|
|
compose_project TEXT, -- com.docker.compose.project
|
|
last_update_at TIMESTAMP,
|
|
UNIQUE(host_id, container_id)
|
|
);
|
|
|
|
CREATE TABLE docker_images (
|
|
id INTEGER PRIMARY KEY,
|
|
host_id INTEGER REFERENCES hosts(id),
|
|
image_id TEXT NOT NULL,
|
|
repo_tags JSON, -- ["nginx:latest", "nginx:1.25"]
|
|
size BIGINT,
|
|
created TIMESTAMP,
|
|
last_update_at TIMESTAMP,
|
|
UNIQUE(host_id, image_id)
|
|
);
|
|
|
|
CREATE TABLE docker_volumes (
|
|
id INTEGER PRIMARY KEY,
|
|
host_id INTEGER REFERENCES hosts(id),
|
|
name TEXT NOT NULL,
|
|
driver TEXT,
|
|
mountpoint TEXT,
|
|
scope TEXT,
|
|
last_update_at TIMESTAMP,
|
|
UNIQUE(host_id, name)
|
|
);
|
|
|
|
CREATE TABLE docker_alerts (
|
|
id INTEGER PRIMARY KEY,
|
|
host_id INTEGER REFERENCES hosts(id),
|
|
container_name TEXT NOT NULL,
|
|
severity TEXT, -- warning/error/critical
|
|
state TEXT, -- open/closed
|
|
message TEXT,
|
|
opened_at TIMESTAMP NOT NULL,
|
|
closed_at TIMESTAMP,
|
|
last_notified_at TIMESTAMP,
|
|
INDEX idx_alerts_open (state, host_id)
|
|
);
|
|
```
|
|
|
|
### 3. Scheduler : **Étendre APScheduler existant**
|
|
|
|
```python
|
|
# app_optimized.py - AJOUTER au startup
|
|
from services.docker_collector import DockerCollector
|
|
|
|
@app.on_event("startup")
|
|
async def start_docker_collector():
|
|
collector = DockerCollector(db_session, ws_manager, ntfy_service)
|
|
|
|
# Job périodique : collecter tous les hosts Docker enabled
|
|
scheduler.add_job(
|
|
collector.collect_all_hosts,
|
|
trigger="interval",
|
|
seconds=60, # Toutes les minutes
|
|
id="docker_collect",
|
|
name="Docker Metrics Collection"
|
|
)
|
|
|
|
# Job périodique : vérifier alertes containers down
|
|
scheduler.add_job(
|
|
collector.check_alerts,
|
|
trigger="interval",
|
|
seconds=30,
|
|
id="docker_alerts",
|
|
name="Docker Alerts Check"
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 API Routes à créer (prefix /api/docker)
|
|
|
|
```python
|
|
# routes/docker.py
|
|
router = APIRouter(prefix="/api/docker", tags=["docker"])
|
|
|
|
@router.get("/hosts")
|
|
async def list_docker_hosts(
|
|
current_user: User = Depends(get_current_user)
|
|
):
|
|
"""Liste tous les hosts avec Docker enabled"""
|
|
|
|
@router.post("/hosts/{host_id}/enable")
|
|
async def enable_docker_monitoring(
|
|
host_id: int,
|
|
current_user: User = Depends(require_role("admin"))
|
|
):
|
|
"""Active la surveillance Docker sur un host"""
|
|
|
|
@router.post("/hosts/{host_id}/collect")
|
|
async def collect_docker_now(
|
|
host_id: int,
|
|
current_user: User = Depends(require_role("operator"))
|
|
):
|
|
"""Force une collecte immédiate"""
|
|
|
|
@router.get("/hosts/{host_id}/containers")
|
|
async def get_containers(host_id: int):
|
|
"""Liste containers d'un host"""
|
|
|
|
@router.post("/containers/{host_id}/{container_id}/start")
|
|
async def start_container(
|
|
host_id: int,
|
|
container_id: str,
|
|
current_user: User = Depends(require_role("operator"))
|
|
):
|
|
"""Démarre un container"""
|
|
|
|
@router.post("/containers/{host_id}/{container_id}/stop")
|
|
@router.post("/containers/{host_id}/{container_id}/restart")
|
|
@router.post("/containers/{host_id}/{container_id}/remove")
|
|
@router.post("/containers/{host_id}/{container_id}/redeploy")
|
|
|
|
@router.get("/containers/{host_id}/{container_id}/logs")
|
|
async def get_container_logs(
|
|
host_id: int,
|
|
container_id: str,
|
|
tail: int = 200
|
|
):
|
|
"""Récupère logs d'un container"""
|
|
|
|
@router.get("/containers/{host_id}/{container_id}/inspect")
|
|
async def inspect_container(host_id: int, container_id: str):
|
|
"""Détails complets JSON d'un container"""
|
|
|
|
@router.get("/alerts")
|
|
async def list_alerts(
|
|
host_id: Optional[int] = None,
|
|
state: Optional[str] = "open"
|
|
):
|
|
"""Liste des alertes Docker"""
|
|
|
|
@router.post("/alerts/{alert_id}/ack")
|
|
async def acknowledge_alert(
|
|
alert_id: int,
|
|
current_user: User = Depends(require_role("operator"))
|
|
):
|
|
"""Accuser réception d'une alerte"""
|
|
```
|
|
|
|
---
|
|
|
|
## 🔔 Logique d'alerting (détection containers down)
|
|
|
|
### Règles de détection
|
|
|
|
```python
|
|
# services/docker_alerts.py
|
|
|
|
async def check_container_alerts(session: AsyncSession):
|
|
"""
|
|
Vérifie tous les containers critiques et génère des alertes
|
|
"""
|
|
|
|
# Récupérer containers avec label homelab.monitor=true
|
|
critical_containers = await session.execute(
|
|
select(DockerContainer)
|
|
.where(DockerContainer.labels.contains({"homelab.monitor": "true"}))
|
|
)
|
|
|
|
for container in critical_containers:
|
|
expected_state = container.labels.get("homelab.desired", "running")
|
|
|
|
# Cas 1 : Container arrêté alors qu'il devrait tourner
|
|
if expected_state == "running" and container.state != "running":
|
|
await open_alert(
|
|
host_id=container.host_id,
|
|
container_name=container.name,
|
|
severity="error",
|
|
message=f"Container {container.name} is {container.state}, expected running"
|
|
)
|
|
|
|
# Cas 2 : Container unhealthy
|
|
if container.health == "unhealthy":
|
|
await open_alert(
|
|
host_id=container.host_id,
|
|
container_name=container.name,
|
|
severity="warning",
|
|
message=f"Container {container.name} health check failing"
|
|
)
|
|
|
|
# Cas 3 : Container OK -> fermer alerte si ouverte
|
|
if container.state == "running" and container.health in ["healthy", "none"]:
|
|
await close_alert(container.host_id, container.name)
|
|
|
|
|
|
async def open_alert(host_id: int, container_name: str, severity: str, message: str):
|
|
"""
|
|
Ouvre une alerte et envoie notification ntfy
|
|
"""
|
|
# Vérifier si alerte déjà ouverte
|
|
existing = await get_open_alert(host_id, container_name)
|
|
if existing:
|
|
# Mettre à jour timestamp
|
|
existing.last_notified_at = datetime.utcnow()
|
|
return
|
|
|
|
# Créer nouvelle alerte
|
|
alert = DockerAlert(
|
|
host_id=host_id,
|
|
container_name=container_name,
|
|
severity=severity,
|
|
state="open",
|
|
message=message,
|
|
opened_at=datetime.utcnow()
|
|
)
|
|
session.add(alert)
|
|
await session.commit()
|
|
|
|
# Notification ntfy
|
|
host = await get_host(host_id)
|
|
await ntfy_service.send_notification(
|
|
topic="homelab-docker",
|
|
title=f"🚨 Docker Alert - {host.name}",
|
|
message=f"{container_name}: {message}",
|
|
priority=4,
|
|
tags=["warning", "docker"]
|
|
)
|
|
|
|
# WebSocket temps réel
|
|
await ws_manager.broadcast({
|
|
"type": "docker_alert_opened",
|
|
"alert": alert.to_dict()
|
|
})
|
|
```
|
|
|
|
---
|
|
|
|
## 🎨 UI/UX Frontend (intégration dans index.html + main.js)
|
|
|
|
### Navigation (ajouter dans index.html)
|
|
|
|
```html
|
|
<!-- Ajouter dans le menu de navigation existant -->
|
|
<nav class="nav-tabs">
|
|
<!-- Existant : Dashboard, Hosts, Tasks, Schedules, Logs -->
|
|
|
|
<button class="nav-tab" data-section="docker">
|
|
<i class="fas fa-docker"></i>
|
|
Docker Hosts
|
|
<span class="badge" id="docker-alerts-badge">0</span>
|
|
</button>
|
|
</nav>
|
|
```
|
|
|
|
### Section Docker (nouvelle section HTML)
|
|
|
|
```html
|
|
<section id="docker-section" class="hidden">
|
|
<div class="section-header">
|
|
<h2><i class="fab fa-docker"></i> Docker Hosts</h2>
|
|
<div class="actions">
|
|
<button id="collect-all-docker" class="btn btn-primary">
|
|
<i class="fas fa-sync"></i> Collect All
|
|
</button>
|
|
<input type="text" id="docker-search" placeholder="Search hosts...">
|
|
</div>
|
|
</div>
|
|
|
|
<!-- Liste des hosts Docker -->
|
|
<div id="docker-hosts-grid" class="hosts-grid">
|
|
<!-- Généré dynamiquement par JS -->
|
|
</div>
|
|
|
|
<!-- Modal détails host Docker -->
|
|
<div id="docker-detail-modal" class="modal hidden">
|
|
<div class="modal-content large">
|
|
<div class="modal-header">
|
|
<h3 id="docker-host-name"></h3>
|
|
<button class="close-modal">×</button>
|
|
</div>
|
|
|
|
<!-- Tabs : Containers / Images / Volumes / Alerts -->
|
|
<div class="tabs">
|
|
<button class="tab active" data-tab="containers">Containers</button>
|
|
<button class="tab" data-tab="images">Images</button>
|
|
<button class="tab" data-tab="volumes">Volumes</button>
|
|
<button class="tab" data-tab="alerts">Alerts</button>
|
|
</div>
|
|
|
|
<!-- Contenu des tabs -->
|
|
<div id="containers-tab" class="tab-content">
|
|
<table id="containers-table">
|
|
<thead>
|
|
<tr>
|
|
<th>Name</th>
|
|
<th>Image</th>
|
|
<th>State</th>
|
|
<th>Health</th>
|
|
<th>Ports</th>
|
|
<th>Age</th>
|
|
<th>Actions</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody></tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</section>
|
|
```
|
|
|
|
### Logique JavaScript (main.js)
|
|
|
|
```javascript
|
|
// Gestion section Docker
|
|
const dockerSection = {
|
|
async init() {
|
|
await this.loadDockerHosts();
|
|
this.setupWebSocket();
|
|
this.setupEventListeners();
|
|
},
|
|
|
|
async loadDockerHosts() {
|
|
const response = await fetchAPI('/api/docker/hosts');
|
|
this.renderHostsGrid(response.hosts);
|
|
},
|
|
|
|
renderHostsGrid(hosts) {
|
|
const grid = document.getElementById('docker-hosts-grid');
|
|
grid.innerHTML = hosts.map(host => `
|
|
<div class="docker-host-card" data-host-id="${host.id}">
|
|
<div class="card-header">
|
|
<h3>${host.name}</h3>
|
|
<span class="badge ${host.docker_status}">${host.docker_status}</span>
|
|
</div>
|
|
<div class="card-body">
|
|
<div class="metric">
|
|
<i class="fas fa-box"></i>
|
|
${host.containers_running}/${host.containers_total} containers
|
|
</div>
|
|
<div class="metric">
|
|
<i class="fas fa-exclamation-triangle"></i>
|
|
${host.open_alerts} alerts
|
|
</div>
|
|
<div class="metric">
|
|
<i class="fas fa-clock"></i>
|
|
Last: ${formatRelativeTime(host.docker_last_collect_at)}
|
|
</div>
|
|
</div>
|
|
<div class="card-actions">
|
|
<button class="btn btn-sm" onclick="dockerSection.viewDetails(${host.id})">
|
|
<i class="fas fa-eye"></i> Details
|
|
</button>
|
|
<button class="btn btn-sm" onclick="dockerSection.collectNow(${host.id})">
|
|
<i class="fas fa-sync"></i> Collect
|
|
</button>
|
|
</div>
|
|
</div>
|
|
`).join('');
|
|
},
|
|
|
|
async viewDetails(hostId) {
|
|
const [containers, images, volumes, alerts] = await Promise.all([
|
|
fetchAPI(`/api/docker/hosts/${hostId}/containers`),
|
|
fetchAPI(`/api/docker/hosts/${hostId}/images`),
|
|
fetchAPI(`/api/docker/hosts/${hostId}/volumes`),
|
|
fetchAPI(`/api/docker/alerts?host_id=${hostId}`)
|
|
]);
|
|
|
|
this.renderContainersTab(containers);
|
|
showModal('docker-detail-modal');
|
|
},
|
|
|
|
renderContainersTab(containers) {
|
|
const tbody = document.querySelector('#containers-table tbody');
|
|
tbody.innerHTML = containers.map(c => `
|
|
<tr class="container-row" data-state="${c.state}">
|
|
<td>
|
|
<i class="fab fa-docker"></i> ${c.name}
|
|
${c.compose_project ? `<span class="badge">${c.compose_project}</span>` : ''}
|
|
</td>
|
|
<td>${c.image}</td>
|
|
<td><span class="badge state-${c.state}">${c.state}</span></td>
|
|
<td><span class="badge health-${c.health}">${c.health || 'none'}</span></td>
|
|
<td>${this.formatPorts(c.ports)}</td>
|
|
<td>${formatRelativeTime(c.created_at)}</td>
|
|
<td>
|
|
<div class="action-buttons">
|
|
${c.state !== 'running' ?
|
|
`<button onclick="dockerSection.startContainer(${c.host_id}, '${c.container_id}')">
|
|
<i class="fas fa-play"></i>
|
|
</button>` : ''}
|
|
${c.state === 'running' ?
|
|
`<button onclick="dockerSection.stopContainer(${c.host_id}, '${c.container_id}')">
|
|
<i class="fas fa-stop"></i>
|
|
</button>` : ''}
|
|
<button onclick="dockerSection.restartContainer(${c.host_id}, '${c.container_id}')">
|
|
<i class="fas fa-redo"></i>
|
|
</button>
|
|
<button onclick="dockerSection.showLogs(${c.host_id}, '${c.container_id}')">
|
|
<i class="fas fa-file-alt"></i>
|
|
</button>
|
|
<button class="danger" onclick="dockerSection.confirmRemove(${c.host_id}, '${c.container_id}')">
|
|
<i class="fas fa-trash"></i>
|
|
</button>
|
|
</div>
|
|
</td>
|
|
</tr>
|
|
`).join('');
|
|
},
|
|
|
|
async startContainer(hostId, containerId) {
|
|
await fetchAPI(`/api/docker/containers/${hostId}/${containerId}/start`, {
|
|
method: 'POST'
|
|
});
|
|
showToast('Container started successfully', 'success');
|
|
await this.viewDetails(hostId); // Refresh
|
|
},
|
|
|
|
setupWebSocket() {
|
|
ws.addEventListener('message', (event) => {
|
|
const data = JSON.parse(event.data);
|
|
|
|
if (data.type === 'docker_host_updated') {
|
|
this.updateHostCard(data.host);
|
|
}
|
|
|
|
if (data.type === 'docker_alert_opened') {
|
|
this.showAlertNotification(data.alert);
|
|
this.updateAlertsBadge();
|
|
}
|
|
});
|
|
}
|
|
};
|
|
```
|
|
|
|
---
|
|
|
|
## 🧪 Tests obligatoires (minimum 8 tests)
|
|
|
|
### Backend tests (pytest + pytest-asyncio)
|
|
|
|
```python
|
|
# tests/test_docker_service.py
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_collect_docker_host(mock_ssh):
|
|
"""Test collecte Docker réussie"""
|
|
mock_ssh.exec_command.return_value = '{"Version": "24.0.7"}'
|
|
result = await docker_service.collect_docker_host(host_id=1)
|
|
assert result.docker_version == "24.0.7"
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_detect_container_down():
|
|
"""Test détection container arrêté"""
|
|
container = create_test_container(
|
|
state="exited",
|
|
labels={"homelab.monitor": "true", "homelab.desired": "running"}
|
|
)
|
|
alerts = await docker_alerts.check_container_alerts([container])
|
|
assert len(alerts) == 1
|
|
assert alerts[0].severity == "error"
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_start_container(mock_ssh):
|
|
"""Test démarrage container"""
|
|
mock_ssh.exec_command.return_value = "container_id"
|
|
result = await docker_actions.start_container(host_id=1, container_id="abc123")
|
|
assert result.success is True
|
|
mock_ssh.exec_command.assert_called_with("docker start abc123")
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_alert_notification_sent(mock_ntfy):
|
|
"""Test notification ntfy envoyée lors alerte"""
|
|
await docker_alerts.open_alert(
|
|
host_id=1,
|
|
container_name="nginx",
|
|
severity="error",
|
|
message="Container down"
|
|
)
|
|
assert mock_ntfy.send_notification.called
|
|
assert "nginx" in mock_ntfy.call_args.kwargs['message']
|
|
```
|
|
|
|
### Frontend tests (Jest ou équivalent vanilla)
|
|
|
|
```javascript
|
|
// tests/docker_section.test.js
|
|
|
|
test('renderHostsGrid displays correct number of cards', () => {
|
|
const hosts = [
|
|
{id: 1, name: 'host1', docker_status: 'online'},
|
|
{id: 2, name: 'host2', docker_status: 'offline'}
|
|
];
|
|
dockerSection.renderHostsGrid(hosts);
|
|
const cards = document.querySelectorAll('.docker-host-card');
|
|
expect(cards.length).toBe(2);
|
|
});
|
|
|
|
test('container action buttons reflect state', () => {
|
|
const runningContainer = {state: 'running'};
|
|
const stoppedContainer = {state: 'exited'};
|
|
|
|
const html1 = dockerSection.renderContainerRow(runningContainer);
|
|
expect(html1).toContain('fa-stop');
|
|
expect(html1).not.toContain('fa-play');
|
|
|
|
const html2 = dockerSection.renderContainerRow(stoppedContainer);
|
|
expect(html2).toContain('fa-play');
|
|
expect(html2).not.toContain('fa-stop');
|
|
});
|
|
|
|
test('WebSocket updates host card in realtime', async () => {
|
|
const ws = new MockWebSocket();
|
|
dockerSection.setupWebSocket();
|
|
|
|
ws.emit({
|
|
type: 'docker_host_updated',
|
|
host: {id: 1, containers_running: 5}
|
|
});
|
|
|
|
await nextTick();
|
|
const card = document.querySelector('[data-host-id="1"]');
|
|
expect(card.textContent).toContain('5/');
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## 📋 Checklist "Definition of Done"
|
|
|
|
### Backend ✅
|
|
- [ ] Migration Alembic créée et testée (docker_containers, docker_images, etc.)
|
|
- [ ] Service `docker_service.py` avec collecte SSH + parsing JSON
|
|
- [ ] Service `docker_actions.py` avec start/stop/restart/remove/redeploy
|
|
- [ ] Service `docker_alerts.py` avec logique de détection + notifications ntfy
|
|
- [ ] Routes `/api/docker/*` complètes avec auth JWT
|
|
- [ ] Jobs APScheduler ajoutés (collect + alerts)
|
|
- [ ] WebSocket events émis (docker_host_updated, docker_alert_opened)
|
|
- [ ] Gestion erreurs robuste (SSH timeout, docker unreachable, parsing errors)
|
|
- [ ] 6+ tests backend passants
|
|
|
|
### Frontend ✅
|
|
- [ ] Section "Docker Hosts" ajoutée au menu navigation
|
|
- [ ] Vue liste hosts Docker (cards avec métriques)
|
|
- [ ] Modal détails host avec tabs (Containers / Images / Volumes / Alerts)
|
|
- [ ] Actions containers fonctionnelles (start/stop/restart/logs/inspect/remove)
|
|
- [ ] Confirmations modales sur actions destructives (remove, redeploy)
|
|
- [ ] Logs container (drawer avec tail + auto-refresh)
|
|
- [ ] Inspect container (modal JSON viewer)
|
|
- [ ] WebSocket live updates (hosts + alerts)
|
|
- [ ] Animations cohérentes avec le reste du dashboard
|
|
- [ ] 4+ tests frontend passants
|
|
|
|
### Sécurité ✅
|
|
- [ ] Toutes les actions Docker nécessitent auth JWT (rôle operator minimum)
|
|
- [ ] Actions destructives (remove) nécessitent rôle admin
|
|
- [ ] Timeouts SSH stricts (5s connect, 15s exec)
|
|
- [ ] Validation Pydantic sur tous les inputs
|
|
- [ ] Pas d'exécution de commandes arbitraires
|
|
- [ ] Logs serveur structurés (pas de secrets loggés)
|
|
|
|
### Documentation ✅
|
|
- [ ] README.md mis à jour (section Docker)
|
|
- [ ] Exemples curl pour endpoints Docker
|
|
- [ ] Guide configuration labels `homelab.monitor` et `homelab.desired`
|
|
- [ ] Instructions migration Alembic
|
|
|
|
---
|
|
|
|
## 🚀 Instructions d'exécution
|
|
|
|
### 1. Appliquer la migration DB
|
|
|
|
```bash
|
|
cd homelab-automation-api-v2
|
|
alembic revision --autogenerate -m "Add Docker management tables"
|
|
alembic upgrade head
|
|
```
|
|
|
|
### 2. Activer Docker sur un host (via UI ou API)
|
|
|
|
```bash
|
|
# Via API
|
|
curl -X POST -H "Authorization: Bearer $TOKEN" \
|
|
http://localhost:8000/api/docker/hosts/1/enable
|
|
|
|
# Via UI : Section "Hosts" > Clic host > Bouton "Enable Docker"
|
|
```
|
|
|
|
### 3. Labelliser containers critiques
|
|
|
|
```yaml
|
|
# docker-compose.yml
|
|
services:
|
|
nginx:
|
|
image: nginx:latest
|
|
labels:
|
|
homelab.monitor: "true"
|
|
homelab.desired: "running"
|
|
```
|
|
|
|
### 4. Tester collecte manuelle
|
|
|
|
```bash
|
|
curl -X POST -H "Authorization: Bearer $TOKEN" \
|
|
http://localhost:8000/api/docker/hosts/1/collect
|
|
```
|
|
|
|
### 5. Vérifier alertes
|
|
|
|
```bash
|
|
curl -H "Authorization: Bearer $TOKEN" \
|
|
http://localhost:8000/api/docker/alerts
|
|
```
|
|
|
|
---
|
|
|
|
## 🎁 Features bonus (si temps disponible)
|
|
|
|
### Priorité 1 (haute valeur, faible coût)
|
|
- **Compose awareness** : Grouper containers par `com.docker.compose.project`
|
|
- **Resource stats** : `docker stats --no-stream` (CPU/mem snapshot)
|
|
- **Bulk actions** : Restart tous containers d'un projet compose
|
|
|
|
### Priorité 2 (bonne valeur, coût moyen)
|
|
- **Event timeline** : Journal des actions Docker dans vue host
|
|
- **Auto-remediation** : Flag `homelab.auto_restart=true` → restart auto si down
|
|
- **Networks tab** : Liste networks Docker + containers attachés
|
|
|
|
### Priorité 3 (nice-to-have, coût élevé)
|
|
- **Prune management** : Nettoyage images/volumes (danger zone + admin uniquement)
|
|
- **Image scanning** : Vulnérabilités via Trivy (si installé sur hosts)
|
|
- **Logs streaming** : WebSocket real-time logs (au lieu de tail statique)
|
|
|
|
---
|
|
|
|
## ⚠️ Risques et mitigation
|
|
|
|
| Risque | Impact | Mitigation |
|
|
|--------|--------|-----------|
|
|
| SSH timeout sur collecte | Hosts marqués offline | Retry logic + timeout adaptatif (5s → 10s → 30s) |
|
|
| Parsing JSON Docker échoue | Collecte partielle | Try/catch par entity (containers/images/volumes) |
|
|
| WebSocket spam si many hosts | UI lag | Throttle broadcasts (max 1/sec par type) |
|
|
| Actions Docker simultanées | Race conditions | Lock par container_id (asyncio.Lock) |
|
|
| Alerte spam si container flapping | Notification fatigue | Cooldown 5min entre notifications même alerte |
|
|
|
|
---
|
|
|
|
## 📊 Métriques de succès
|
|
|
|
- ✅ Collecte Docker réussie sur 3+ hosts simultanés sans timeout
|
|
- ✅ Détection container down < 60s après arrêt réel
|
|
- ✅ Notification ntfy reçue dans les 5s après ouverture alerte
|
|
- ✅ Actions containers (start/stop) < 3s (hors délai Docker lui-même)
|
|
- |