snapshot orchestration

This commit is contained in:
iceBear67
2026-06-03 15:31:27 +08:00
parent b27a0a564e
commit 0e5c3d6c65
3 changed files with 276 additions and 62 deletions

3
.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
.venv
.idea/*
__pycache__/**

114
USAGE.md Normal file
View File

@@ -0,0 +1,114 @@
# Composer
多租户 Docker Compose 编排工具。扫描 `users/` 下的用户目录,校验每个用户的 `compose.yml`,生成注入网络配置的快照,然后启动所有服务。
## 目录结构
```
<root>/
├── users/
│ ├── 10-alice/ # 优先级-用户名
│ │ ├── compose.yml # 必需
│ │ ├── .env # 可选(明文)
│ │ ├── .env.gpg # 可选(加密,二选一即可)
│ │ ├── app/
│ │ │ └── Dockerfile
│ │ └── ...
│ └── 20-bob/
│ └── ...
└── snapshots/ # 自动生成(每次覆盖)
├── 10-alice/
│ └── compose.yml # 已注入网络配置的版本
└── 20-bob/
```
- 目录名格式:`<数字优先级>-<用户名>`
- 用户名:`[a-z0-9]+`
- 优先级数字越小越先处理
## CLI 参数
| 参数 | 必需 | 默认值 | 说明 |
|------|------|--------|------|
| `--root` | | `.` | 项目根目录 |
| `--network` | | `cloud` | 注入到 services 的外部网络名 |
| `--volume-parent` | ✓ | | 卷挂载路径白名单前缀 |
| `--dry` | | `false` | 只打印操作,不实际执行 |
## compose.yml 约束
### 允许的顶层字段
只有 `services` 被识别。如果 YAML 中有 `networks``volumes``version` 等额外顶层键,会触发 **WARN** —— 它们会被 Pydantic 静默丢弃,可能与网络注入冲突。
### 允许的 service 字段
`build` `image` `volumes` `command` `depends_on` `entrypoint` `env_file` `environment`
`build` 可以是字符串(指向含 Dockerfile 的目录)或对象 `{context, dockerfile, args}`。其他键同样触发 WARN。
### 卷约束
所有 volume 的宿主机路径必须以 `--volume-parent` 为前缀,且只允许 `ro` 模式。
### 环境文件
- 支持明文 `.env` 或 GPG 加密 `.env.gpg`
- 只校验明文文件的内容格式(`KEY=value``export KEY=value`
- 如果只存在 `.env.gpg` 则跳过内容校验
## 运行流程
```
1. 扫描 users/ → 解析优先级和用户名
2. 加载每个 compose.yml → Pydantic 校验
3. 检测注入干扰字段WARN
4. 校验 service 引用、Dockerfile 存在性、卷路径、env_file
5. ── 有错则退出 ──
6. 为每个用户创建 snapshots/<prio>-<name>/
7. 拷贝所有文件compose.yml 注入网络配置,.gpg 文件解密
8. ── 快照失败则退出 ──
9. 在每个快照目录执行 docker compose up --remove-orphans --build --detach
```
## 网络注入
快照中的 `compose.yml` 会被注入:
```yaml
networks:
cloud: # --network 参数值
external: true
alice: # 用户名
name: alice
services:
app:
networks: [cloud, alice]
```
每个 service 的 `networks` 会被**覆盖**为 `[<network>, <username>]`
## GPG 解密
快照拷贝时,`.gpg` 后缀的文件会调用系统 `gpg --decrypt --batch` 解密,输出到去掉 `.gpg` 后缀的路径。原 `.gpg` 文件不进入快照。
要求运行环境已配置好 GPG 密钥,且能无交互解密。
## Dry-run 模式
`--dry` 执行全部校验,但只打印操作摘要:
```
[dry] would snapshot users/10-alice -> snapshots/10-alice
[dry] inject networks: .networks.cloud.external=true, ...
[dry] would run: docker compose -f snapshots/10-alice/compose.yml up ...
```
不创建目录、不写文件、不解密、不启动容器。
## 错误处理
- **校验阶段有错** → 打印错误并 `exit 1`(不创建快照)
- **快照创建失败**(单个用户)→ 标记错误,跳过该用户,继续处理其他用户;全部完成后 `exit 1`
- **compose up 失败**(单个用户)→ 标记错误,继续启动其他用户;全部完成后 `exit 1`

View File

@@ -1,24 +1,21 @@
import argparse import argparse
import subprocess
from collections import defaultdict
from dataclasses import dataclass
import os import os
import re import re
import shutil
import subprocess
import sys
import docker
from pydantic import BaseModel, ValidationError from pydantic import BaseModel, ValidationError
import yaml import yaml
@dataclass
class BuildSpec(BaseModel): class BuildSpec(BaseModel):
context: str context: str
dockerfile: str dockerfile: str
args: dict[str, str] = defaultdict args: dict[str, str] = {}
@dataclass
class Service(BaseModel): class Service(BaseModel):
build: str | BuildSpec | None = None build: str | BuildSpec | None = None
image: str | None = None image: str | None = None
@@ -30,17 +27,15 @@ class Service(BaseModel):
environment: dict[str, str] | None = None environment: dict[str, str] | None = None
@dataclass
class ComposeSpec(BaseModel): class ComposeSpec(BaseModel):
services: dict[str, Service] services: dict[str, Service]
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument("--dry", type=bool, default=False) parser.add_argument("--dry", action="store_true", default=False)
parser.add_argument("--root", default=".", type=str, required=True) parser.add_argument("--root", default=".", type=str)
parser.add_argument("--network", default="cloud", type=str) parser.add_argument("--network", default="cloud", type=str)
parser.add_argument("--volume-parent", type=str, required=True) parser.add_argument("--volume-parent", type=str, required=True)
parser.add_argument("--lock", type=str)
args = parser.parse_args() args = parser.parse_args()
dry_run = args.dry dry_run = args.dry
@@ -61,7 +56,7 @@ userNamePattern = r"^[a-z0-9]+$"
users: list[tuple[int, str]] = [] users: list[tuple[int, str]] = []
for userDir in os.listdir(f"{root}/users"): for userDir in os.listdir(f"{root}/users"):
spl = userDir.split("-") spl = userDir.split("-", 1)
if len(spl) != 2: if len(spl) != 2:
err(f"ERR: Valid priority isn't set for userDir {spl}") err(f"ERR: Valid priority isn't set for userDir {spl}")
continue continue
@@ -77,23 +72,6 @@ users.sort(key=lambda x: x[0])
serviceNamePattern = r"^[a-z0-9]+$" serviceNamePattern = r"^[a-z0-9]+$"
userWorkDir: dict[str, str] = {}
userComposeFiles: dict[str, ComposeSpec] = {}
for prio, name in users:
path = f"{root}/users/{prio}-{name}/compose.yml"
with open(path) as csf:
data = yaml.safe_load(csf)
try:
spec = ComposeSpec(**data)
except ValidationError as e:
err(f"Cannot validate compose spec at {path}")
print(e.errors())
continue
userComposeFiles[name] = spec
userWorkDir[name] = f"{root}/users/{prio}-{name}/"
def validate_compose_spec(spec: ComposeSpec, workdir: str): def validate_compose_spec(spec: ComposeSpec, workdir: str):
invalid_services = [serviceName for serviceName, _ in spec.services.items() if invalid_services = [serviceName for serviceName, _ in spec.services.items() if
not re.match(serviceNamePattern, serviceName)] not re.match(serviceNamePattern, serviceName)]
@@ -101,16 +79,68 @@ def validate_compose_spec(spec: ComposeSpec, workdir: str):
err(f"ERR: Invalid service names: {', '.join([x for x in invalid_services])}") err(f"ERR: Invalid service names: {', '.join([x for x in invalid_services])}")
for invalid_service in invalid_services: for invalid_service in invalid_services:
spec.services.pop(invalid_service) spec.services.pop(invalid_service)
for key, spec in spec.services.items(): for key, svc in spec.services.items():
validate_service_spec(spec, workdir) validate_service_spec(svc, workdir, key)
depended_services = [(spec.depends_on or []) for name, spec in spec.services.items()] depended_services = [(svc.depends_on or []) for _name, svc in spec.services.items()]
depended_services = [item for sublist in depended_services for item in sublist] depended_services = [item for sublist in depended_services for item in sublist]
for serviceName in depended_services: for serviceName in depended_services:
if serviceName not in spec.services: if serviceName not in spec.services:
err(f"ERR: Service {serviceName} is depended on but not defined in the compose spec") err(f"ERR: Service {serviceName} is depended on but not defined in the compose spec")
# Fields recognized by our Pydantic models — anything else in the YAML
# is silently dropped by Pydantic and could interfere with our injection.
_TOP_LEVEL_KNOWN = {"services"}
_SERVICE_KNOWN = {"build", "image", "volumes", "command", "depends_on",
"entrypoint", "env_file", "environment"}
_BUILD_KNOWN = {"context", "dockerfile", "args"}
def detect_injected_fields(path_: str, data: dict):
"""Warn about YAML keys not captured by the Pydantic model — these would
be silently dropped and could interfere with network injection."""
if not isinstance(data, dict):
return
extra_toplevel = set(data.keys()) - _TOP_LEVEL_KNOWN
if extra_toplevel:
err(f"WARN: {path_} has unexpected top-level keys (ignored by model): "
f"{', '.join(sorted(extra_toplevel))}. "
f"These may interfere with injected networks/config.")
services = data.get("services")
if not isinstance(services, dict):
return
for svc_name, svc_data in services.items():
if not isinstance(svc_data, dict):
continue
extra_svc = set(svc_data.keys()) - _SERVICE_KNOWN
if extra_svc:
err(f"WARN: {path_} service '{svc_name}' has unexpected keys "
f"(ignored by model): {', '.join(sorted(extra_svc))}")
build = svc_data.get("build")
if isinstance(build, dict):
extra_build = set(build.keys()) - _BUILD_KNOWN
if extra_build:
err(f"WARN: {path_} service '{svc_name}' build block has "
f"unexpected keys: {', '.join(sorted(extra_build))}")
for prio, name in users:
workdir = f"{root}/users/{prio}-{name}/"
path = f"{workdir}compose.yml"
with open(path) as csf:
data = yaml.safe_load(csf)
detect_injected_fields(path, data)
try:
spec = ComposeSpec(**data)
except ValidationError as e:
err(f"Cannot validate compose spec at {path}")
print(e.errors())
continue
validate_compose_spec(spec, workdir)
def validate_env_file(path_: str) -> bool: def validate_env_file(path_: str) -> bool:
try: try:
with open(path_, "r") as f: with open(path_, "r") as f:
@@ -134,46 +164,113 @@ def validate_env_file(path_: str) -> bool:
return False return False
def validate_service_spec(serv: Service, workdir: str): def validate_service_spec(serv: Service, workdir: str, name: str):
if serv.image is None and serv.build is None: if serv.image is None and serv.build is None:
err(f"ERR: Service {serv.image} doesn't have an image or build spec") err(f"ERR: Service {name} doesn't have an image or build spec")
if isinstance(serv.build, str) and not os.path.exists(f"{workdir}/{serv.build}/Dockerfile"): if isinstance(serv.build, str) and not os.path.exists(f"{workdir}/{serv.build}/Dockerfile"):
err(f"ERR: Dockerfile doesn't exist at {workdir}/{serv.build}/Dockerfile") err(f"ERR: Service {name}: Dockerfile doesn't exist at {workdir}/{serv.build}/Dockerfile")
for volume in serv.volumes: for volume in (serv.volumes or []):
spl = volume.split(":") spl = volume.split(":")
if (len(spl) != 2 and len(spl) != 3) or \ if (len(spl) != 2 and len(spl) != 3) or \
not os.path.normpath(spl[0]).startswith(vol_parent) or \ not os.path.normpath(spl[0]).startswith(vol_parent) or \
(len(spl) == 3 and spl[2] != "ro"): (len(spl) == 3 and spl[2] != "ro"):
err(f"ERR: Invalid volume spec {volume} in service {serv.image}") err(f"ERR: Invalid volume spec {volume} in service {name}")
continue if serv.env_file and not os.path.exists(f"{workdir}/{serv.env_file}") \
if serv.env_file and not os.path.exists(f"{workdir}/{serv.env_file}"): and not os.path.exists(f"{workdir}/{serv.env_file}.gpg"):
err(f"ERR: env_file {serv.env_file} doesn't exist in {workdir}") err(f"ERR: Service {name}: env_file {serv.env_file} (or .gpg) doesn't exist in {workdir}")
elif serv.env_file: elif serv.env_file and os.path.exists(f"{workdir}/{serv.env_file}"):
validate_env_file(f"{workdir}/{serv.env_file}") validate_env_file(f"{workdir}/{serv.env_file}")
if isinstance(serv.build, BuildSpec) and not os.path.exists(f"{workdir}/{serv.build.dockerfile}"): if isinstance(serv.build, BuildSpec) and not os.path.exists(f"{workdir}/{serv.build.dockerfile}"):
err(f"ERR: Dockerfile {serv.build.dockerfile} doesn't exist in {workdir}") err(f"ERR: Service {name}: Dockerfile {serv.build.dockerfile} doesn't exist in {workdir}")
dk = docker.from_env() if has_error:
print("Errors found during validation. Exiting.")
sys.exit(1)
def orchestrate(user: str, spec: ComposeSpec, workdir: str): # ── Snapshot: copy userdir + inject networks into compose.yml ──────────
for name, serv in spec.services.items(): snapshot_root = f"{root}/snapshots"
if isinstance(serv.build, str): print(f"Snapshot target: {snapshot_root}")
serv.build = BuildSpec(context=serv.build, dockerfile=f"{serv.build}/Dockerfile")
if serv.build: snapshot_dirs: list[tuple[str, str]] = [] # (dst_dir, name) for compose-up later
command = [
"docker", "buildx", "build", "-t", f"{user}-{name}:latest" for prio, name in users:
] src_dir = f"{root}/users/{prio}-{name}"
build_args = [["--build-arg", f"{k}={v}"] for k, v in serv.build.args.items()] dst_dir = f"{snapshot_root}/{prio}-{name}"
command += [arg for build_arg in build_args for arg in build_arg]
command += ["-f", serv.build.dockerfile, serv.build.context] if dry_run:
print(f"Building image for {user}:{name} with command: {' '.join(command)}") print(f" [dry] would snapshot {src_dir} -> {dst_dir}")
if not dry_run: print(f" [dry] inject networks: .networks.{network}.external=true, "
try: f".networks.{name}.name={name}, services[].networks=[{network}, {name}]")
subprocess.run(command, check=True) snapshot_dirs.append((dst_dir, name))
except subprocess.CalledProcessError as e: continue
err(f"ERR: Failed to build image for {user}:{name} with error: {e}")
continue print(f" Snapshotting {src_dir} -> {dst_dir}")
try:
if os.path.exists(dst_dir):
shutil.rmtree(dst_dir)
os.makedirs(dst_dir, exist_ok=True)
for item in os.listdir(src_dir):
src_item = os.path.join(src_dir, item)
dst_item = os.path.join(dst_dir, item)
if item == "compose.yml":
# Rewrite with injected network configuration
with open(src_item) as f:
compose_data = yaml.safe_load(f)
compose_data.setdefault("networks", {})
compose_data["networks"][network] = {"external": True}
compose_data["networks"][name] = {"name": name}
for svc_name, svc in (compose_data.get("services") or {}).items():
svc["networks"] = [network, name]
print(f" Injected networks into service '{svc_name}': "
f"[{network}, {name}]")
with open(dst_item, "w") as f:
yaml.dump(compose_data, f, default_flow_style=False, sort_keys=False)
print(f" Wrote {dst_item} (networks injected)")
elif item.endswith(".gpg"):
# Decrypt .gpg file → output without .gpg suffix
plain_name = item[:-4] # strip ".gpg"
dst_plain = os.path.join(dst_dir, plain_name)
print(f" Decrypting {item} -> {plain_name}")
subprocess.run(
["gpg", "--decrypt", "--batch", "--output", dst_plain, src_item],
check=True,
)
elif os.path.isdir(src_item):
shutil.copytree(src_item, dst_item)
else: else:
print("Build skipped due to dry run mode.") shutil.copy2(src_item, dst_item)
snapshot_dirs.append((dst_dir, name))
print(f" Done snapshotting {name}")
except Exception as e:
err(f"ERR: Snapshot failed for {name}: {e}")
# ── Compose up in each snapshot ────────────────────────────────────────
if has_error:
print("Snapshot errors detected — skipping compose up.")
sys.exit(1)
if dry_run:
for dst_dir, name in snapshot_dirs:
print(f" [dry] would run: docker compose -f {dst_dir}/compose.yml "
f"up --remove-orphans --build --detach")
else:
for dst_dir, name in snapshot_dirs:
print(f" Starting services for {name} in {dst_dir}")
try:
subprocess.run(
["docker", "compose", "-f", f"{dst_dir}/compose.yml",
"up", "--remove-orphans", "--build", "--detach"],
cwd=dst_dir, check=True,
)
print(f"{name} started")
except subprocess.CalledProcessError as e:
err(f"ERR: docker compose up failed for {name}: {e}")
if has_error:
print("Errors during compose up. Exiting.")
sys.exit(1)