feat: 视频创作流程全链路优化

- 后端: Vidu Provider、System API、Upload API、素材服务更新 - 前端: 字幕压制、视频生成、配音、本地存储、类型定义优化 - Rust: FFmpeg 命令、视频合成、语音命令、库注册更新 - Store: 项目状态、语音状态管理优化 - 新增: 对口型替换文档、健康检查器、字幕 API 模块、音频对齐工具 - 删除: 废弃的 polish 提示词模板
2026-04-26 21:24:42 +08:00
parent 3766a977e2
commit bc724810a6
28 changed files with 1603 additions and 563 deletions
@@ -0,0 +1,122 @@
+# MVP 实验：对口型视频替换空镜片段
+
+## 目标
+通过音频打轴定位时间戳，用对口型后的人物视频片段替换空镜视频的对应片段。
+
+## 完整流程
+
+### 1. 音频打轴（Whisper）
+对配音音频进行语音识别，生成带时间轴的 SRT 字幕文件。
+
+```bash
+cd /Users/0fun/work/meijiaka-zj/python-api
+source .venv/bin/activate
+whisper "/Users/0fun/Documents/Meijiaka-zj/projects/.../audios/voice_xxx.mp3" \
+  --model base --language zh --output_format srt
+```
+
+**输出**：56 句话，总时长 ~75s，生成 `voice_xxx.srt`。
+
+### 2. 文案定位
+输入目标文案，在 SRT 中匹配对应句子，得到时间范围。
+
+**示例**：
+- 文案：`"新房装修这七个时间,你必须在场盯着"`
+- 匹配结果：时间范围 `0.000s ~ 4.120s`
+- 涉及片段：segment 1~2
+
+### 3. 片段截取
+
+#### 人物视频（静音画面 + 对口型音频）
+```bash
+# 画面：0~4.12s
+ffmpeg -y -ss 0 -t 4.12 -i video.mp4 -c:v libx264 -an clip_video.mp4
+
+# 音频：0~4.12s
+ffmpeg -y -ss 0 -t 4.12 -i video.mp4 -vn -c:a copy clip_audio.mp3
+```
+
+**音频检测**：
+- mean_volume: -20.8 dB
+- max_volume: -3.8 dB
+- 格式：24000 Hz / mono / 69 kb/s
+
+### 4. 画面替换（FFmpeg overlay）
+将人物视频画面覆盖到 composed 视频对应时间段。
+
+```bash
+ffmpeg -y \
+  -i composed.mp4 \
+  -i video.mp4 \
+  -filter_complex \
+    "[1:v]setpts=PTS-STARTPTS[clip];
+     [0:v][clip]overlay=enable='between(t\,0\,4.12)':x=0:y=0[v]" \
+  -map "[v]" -map 0:a \
+  -c:v libx264 -crf 18 -preset fast \
+  -c:a copy \
+  composed_overlay.mp4
+```
+
+### 5. 音频拼接（关键步骤）
+将对口型音频插入到 0~4.12s，原 composed 音频接在后面。
+
+#### ❌ 第一次尝试（失败）
+```bash
+[a_rep]atrim=start=0:end=4.12...;
+[a_tail]atrim=start=4.12...;
+[a_rep][a_tail]concat=n=2:v=0:a=1[a]
+```
+
+**问题**：concat 要求所有输入流的**采样率和声道数一致**。
+- 对口型音频：24000 Hz / mono
+- composed 音频：44100 Hz / stereo
+
+结果：拼接后开头 4 秒无声。
+
+#### ✅ 修复方案
+在 concat 前统一音频格式：
+
+```bash
+[1:a]aresample=44100,pan=stereo|c0=c0|c1=c0,atrim=start=0:end=4.12,asetpts=PTS-STARTPTS[a_rep];
+[0:a]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,atrim=start=4.12,asetpts=PTS-STARTPTS[a_tail];
+[a_rep][a_tail]concat=n=2:v=0:a=1[a]
+```
+
+### 6. 完整替换命令（最终版）
+
+```bash
+ffmpeg -y \
+  -i "composed.mp4" \
+  -i "video.mp4" \
+  -filter_complex \
+    "[1:v]setpts=PTS-STARTPTS[clip];
+     [0:v][clip]overlay=enable='between(t\,0\,4.12)':x=0:y=0[v];
+     [1:a]aresample=44100,pan=stereo|c0=c0|c1=c0,atrim=start=0:end=4.12,asetpts=PTS-STARTPTS[a_rep];
+     [0:a]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,atrim=start=4.12,asetpts=PTS-STARTPTS[a_tail];
+     [a_rep][a_tail]concat=n=2:v=0:a=1[a]" \
+  -map "[v]" \
+  -map "[a]" \
+  -c:v libx264 -crf 18 -preset fast \
+  -c:a aac -b:a 128k \
+  "composed_replaced.mp4"
+```
+
+**输出验证**：
+- 视频：H.264 / 1080x1920 / 29.82 fps
+- 音频：AAC / 44100 Hz / stereo / 37 kb/s
+- 前 4.12s 音量：mean -20.9 dB / max -3.7 dB（正常有声）
+- 总时长：74.27 秒
+
+## 踩坑记录
+
+| 问题 | 原因 | 解决方案 |
+|------|------|----------|
+| 音频拼接后开头无声 | concat 输入流采样率/声道不一致 | 用 `aresample` + `pan` 统一为 44100 Hz stereo |
+| 视频画面不同步 | overlay 时间范围写错 | 确认 `between(t,0,4.12)` 与文案定位时间一致 |
+| 音频格式差异 | 对口型音频 24000Hz mono，composed 44100Hz stereo | 拼接前强制格式统一 |
+
+## 后续可优化
+
+1. **音频过渡**：concat 是硬切，可在拼接处加 `acrossfade` 实现淡入淡出
+2. **音量平衡**：对口型音频与 composed 音频音量差异大，可用 `volume` 滤镜统一
+3. **自动化**：将以上步骤封装为 Python/Rust 函数，输入文案自动完成定位→截取→替换
@@ -1,13 +0,0 @@
-你是一位口播短视频专家。请润色以下空镜画面描述，使其更适合AI视频生成：
-
-【原文】
-{content}
-
-【要求】
- 保持原意，优化细节
- 重点强调场景环境、空间氛围、光影效果、材质质感
- 可以描述静态景物、装修细节、空间布局
- 不要有"镜头""特写""机位"等摄影术语
- 控制好字数，字数不能与原文差距超过20个字
-
-直接输出润色后的描述，不要添加任何说明：
@@ -1,13 +0,0 @@
-你是一位【口播短视频】专家。请润色以下分镜画面描述，使其更适合AI视频生成：
-
-【原文】
-{content}
-
-【要求】
- 保持原意，优化细节
- 重点强调人物神态、表情、动作、姿态
- 描述人物与镜头前观众的互动
- 不要有"镜头""特写""机位"等摄影术语
- 控制好字数，字数不能与原文差距超过20个字
-
-直接输出润色后的描述，不要添加任何说明：
@@ -0,0 +1,71 @@
+你是一位专业的【口播类短视频】脚本创作专家，专注于家装/装修领域的抖音/视频号口播内容创作。
+【平台适配要求】
+1. 竖屏拍摄（9:16比例），画面构图以人物为主体
+2. 台词口语化、接地气，像跟朋友聊天，避免"综上所述""研究表明"等书面语
+3. 语速稍快有节奏感，每秒4个纯文字（不含标点），每句15-25字（对应3.75-6.25秒），一口气说完不换气，不拖沓
+4. 避免专业术语堆砌，用业主听得懂的大白话
+5. 符合新媒体用户观看习惯：3秒定生死，节奏紧凑
+【文案要求】
+请严格按照以下固定结构，生成装修现场监工类口播文案，要求语言口语化、有警示性，贴合装修业主视角，结构严格不变，内容围绕“新房装修一定要在场的7个时间”展开，每部分内容完整，总文案包含标点符号不得超过450字：
+开篇总起：明确核心警示——新房装修一定要在场的7个时间，尤其最后一个，直接关系家里是不是甲醛房，提醒认真看完，避免后期返工、踩坑受害，语气直接、有紧迫感。
+分点阐述（7点，严格遵循此顺序和格式）：
+每点均按照“监工场景+必做核查事项+不盯工的核心隐患”撰写，语言接地气，有劝诫感，避免生硬说教：
+第1点：砸墙时必须在场，盯紧师傅封好下水口，避免管道堵塞，后期还要跑楼下疏通
+第2点：封窗时一定要在场，监督师傅做好防水斜坡，防止下雨天雨水往屋里倒灌
+第3点：水电验收必须在场，核对插座点位、检查强弱电包裹情况，记得拍照留存，避免后期返工
+第4点：防水瓷砖验收必在场，做闭水试验检查是否漏水，核对瓷砖型号，防止色差导致重铺
+第5点：贴砖时要在场，检查瓷砖平整度、空鼓率，确保阴阳角方正、缝隙均匀合格
+第6点：木工吊顶必在场，要求拐角用整板、接缝做V型槽，杜绝后期乳胶漆开裂
+第7点：刮腻子一定要在场，严禁师傅往腻子中加胶水，避免甲醛超标，变成毒气房
+结尾引流：补充提示——准备装修的朋友，我整理了避坑手册，评论区回避坑直接领取参考，帮你装修少踩坑，语气亲切贴合业主需求。
+提示：文案整体风格通俗好记，有代入感，符合普通装修业主的认知，避免专业术语过多，每部分内容饱满，不遗漏核心要点，严格匹配上述结构，不新增、不删减板块。
+【素材库标题】
+网红开篇
+铺砖施工
+吊顶施工
+美缝施工
+水电施工
+壁纸施工
+刮腻子
+木工施工
+柜子安装
+乳胶漆
+签合同
+背景墙
+【分镜结构】
+开篇的分镜为：网红开头+人物出镜3秒+空镜补充
+分点阐述全部用空镜
+结尾人物出镜3秒+空镜补充
+每个分镜时长不得少于3秒，且不得高于8秒，可以是一位小数，如3.1秒
+且每个分镜配音文案的文字数量对应每秒4-5个纯文字（不含标点）
+总分镜时长为：文案总字数/4
+"segment"（主播口播出镜）对应"人物出镜"，且时长为3秒（对应12字左右纯文字）
+人物出镜画面的内容，可以不用完整的句子，句子可以延伸到下一个画面
+"empty_shot"（空镜补充）对应"素材库标题"
+配音文案必须要有标点符号断句，避免大长句，如：水电装错毁一生，错一个，返工就要好几万。
+【输出格式要求】
+输出的内容必须包含以下两部分
+一、分镜内容
+- id:1
+- type:"segment"（主播口播出镜）或 "empty_shot"（空镜补充）
+- scene:"人物出镜"或"素材库标题"
+- voiceover: 配音文案（必填，口语化15-25字/句，对应4-6秒）
+- duration: 时长（如 "5s"，根据字数生成，严格按每秒4字、不含标点，可保留2位小数，如12个字3.00s，17个字4.25s，19个字4.75s）
+注意：只输出纯 JSON，不要包含 markdown 代码块或其他说明文字。
+【示例】
+[
+  {
+    "id": 1,
+    "type": "empty_shot",
+    "scene": "网红开篇",
+    "voiceover": "装修签合同别踩坑！固定模板千万别直接签！",
+    "duration": 3
+  },
+  {
+    "id": 2,
+    "type": "segment",
+    "scene": "人物出镜",
+    "voiceover": "这8条内容，必须白纸黑字写进合同里！",
+    "duration": 3
+  }
+]
@@ -74,12 +74,14 @@ class ViduProvider:
        if payload:
            body["payload"] = payload

+        logger.info(f"[Vidu TTS] 请求参数: text_length={len(text)}, body={body}")
+
        async with aiohttp.ClientSession() as session:
            async with session.post(url, json=body, headers=self._get_headers()) as resp:
                data = await resp.json()
                if resp.status != 200 or data.get("state") == "failed":
                    msg = data.get("err_code") or data.get("message") or f"HTTP {resp.status}"
-                    logger.error(f"[Vidu TTS] 请求失败: url={url}, status={resp.status}, headers={self._get_headers()}, body={body}, response={data}")
+                    logger.error(f"[Vidu TTS] 请求失败: url={url}, status={resp.status}, response={data}")
                    raise Exception(f"Vidu TTS error: {msg}")
                return data

@@ -3,8 +3,10 @@
 ============
 """

-from fastapi import APIRouter
+from fastapi import APIRouter, status
+from fastapi.responses import JSONResponse

+from app.core.health_checker import check_database, check_redis
 from app.schemas.common import ApiResponse, success_response

 router = APIRouter()
@@ -13,16 +15,34 @@ router = APIRouter()
@router.get("/health", response_model=ApiResponse[dict])
 async def system_health():
    """系统健康检查（详细版）"""
-    return success_response(
-        data={
-            "status": "healthy",
-            "services": {
-                "api": "up",
-                "database": "unknown",  # TODO: 检查数据库连接
-                "redis": "unknown",  # TODO: 检查 Redis 连接
+    db_ok, db_msg = await check_database()
+    redis_ok, redis_msg = await check_redis()
+
+    services = {
+        "api": "up",
+        "database": "connected" if db_ok else db_msg,
+        "redis": "connected" if redis_ok else redis_msg,
+    }
+
+    if not db_ok:
+        return JSONResponse(
+            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
+            content={
+                "code": status.HTTP_503_SERVICE_UNAVAILABLE,
+                "message": "数据库连接异常",
+                "data": {"status": "unhealthy", "services": services},
            },
-        },
+        )
+
+    if not redis_ok:
+        return success_response(
+            message="Redis 连接异常，服务降级",
+            data={"status": "degraded", "services": services},
+        )
+
+    return success_response(
        message="系统运行正常",
+        data={"status": "healthy", "services": services},
    )


@@ -187,3 +187,87 @@ async def upload_image(
    except Exception as e:
        logger.error(f"[Upload] 图片上传失败: {e}")
        raise HTTPException(status_code=500, detail=f"上传失败: {e}")
+
+
+@router.post("/audio", response_model=ApiResponse[UploadResponse])
+async def upload_audio(
+    file: UploadFile = File(..., description="音频文件"),
+):
+    """
+    上传音频到七牛云
+
+    支持格式：mp3, wav, aac, m4a, ogg, flac
+    """
+    try:
+        allowed_types = {
+            "audio/mpeg",
+            "audio/mp3",
+            "audio/wav",
+            "audio/x-wav",
+            "audio/aac",
+            "audio/mp4",
+            "audio/ogg",
+            "audio/flac",
+            "audio/x-flac",
+        }
+        content_type = file.content_type or ""
+
+        if not content_type:
+            ext = Path(file.filename or "").suffix.lower()
+            ext_to_mime = {
+                ".mp3": "audio/mpeg",
+                ".wav": "audio/wav",
+                ".aac": "audio/aac",
+                ".m4a": "audio/mp4",
+                ".ogg": "audio/ogg",
+                ".flac": "audio/flac",
+            }
+            content_type = ext_to_mime.get(ext, "")
+
+        if content_type not in allowed_types:
+            raise HTTPException(
+                status_code=400,
+                detail=f"不支持的音频格式: {content_type}，请上传 mp3/wav/aac/m4a/ogg/flac",
+            )
+
+        content = await file.read()
+        if not content:
+            raise HTTPException(status_code=400, detail="文件内容为空")
+
+        ext = Path(file.filename or "audio.mp3").suffix or ".mp3"
+        unique_name = f"{uuid.uuid4().hex[:16]}{ext}"
+
+        qiniu = get_qiniu_service()
+        # 复用视频 bucket（或根据配置使用音频 bucket）
+        bucket, domain = qiniu._get_bucket_and_domain("video")
+        key = qiniu.generate_key("audio", unique_name)
+        stream = io.BytesIO(content)
+        result = qiniu.upload_stream(
+            stream=stream,
+            key=key,
+            mime_type=content_type or "audio/mpeg",
+            bucket=bucket,
+            domain=domain,
+        )
+
+        url = result.get("url")
+        key = result.get("key")
+
+        if not url:
+            raise HTTPException(status_code=500, detail="上传到七牛云失败：未返回 URL")
+
+        logger.info(f"[Upload] 音频上传成功: {url[:80]}..., size={len(content)}")
+
+        return success_response(
+            data=UploadResponse(
+                url=url,
+                key=key or unique_name,
+                size=len(content),
+            )
+        )
+
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"[Upload] 音频上传失败: {e}")
+        raise HTTPException(status_code=500, detail=f"上传失败: {e}")
@@ -0,0 +1,34 @@
+"""
+健康检查
+========
+"""
+
+import asyncio
+
+from sqlalchemy import text
+
+from app.core.redis_client import get_redis_client
+from app.db.session import async_engine
+
+
+async def check_database(timeout: float = 2.0) -> tuple[bool, str]:
+    """检查数据库连接"""
+    try:
+        async with asyncio.timeout(timeout):
+            async with async_engine.connect() as conn:
+                await conn.execute(text("SELECT 1"))
+                await conn.commit()
+        return True, "connected"
+    except Exception as e:
+        return False, str(e)
+
+
+async def check_redis(timeout: float = 2.0) -> tuple[bool, str]:
+    """检查 Redis 连接"""
+    try:
+        async with asyncio.timeout(timeout):
+            redis = get_redis_client()
+            await redis.ping()
+        return True, "connected"
+    except Exception as e:
+        return False, str(e)
@@ -64,9 +64,9 @@ def match_material(scene: str, required_duration: float, exclude_urls: list[str]
    根据场景描述和所需时长匹配空镜素材

    策略：
-    1. 收集所有满足时长要求（duration >= required_duration）的素材
-    2. 收集全局差值最近的 5 个素材
-    3. 合并去重后从候选池中随机选取，优先排除已使用的
+    1. 严格匹配分类（scene 必须完全匹配 keywords 中的关键词）
+    2. 过滤掉时长小于 required_duration 的素材
+    3. 从剩余素材中排除已使用的，随机选取

    Args:
        scene: 分镜场景描述
@@ -78,38 +78,26 @@ def match_material(scene: str, required_duration: float, exclude_urls: list[str]
    """
    exclude_urls = exclude_urls or []
    for keyword, slug in _keywords.items():
-        if keyword in scene:
+        if keyword == scene:
            all_materials = _materials.get(slug, [])
            if not all_materials:
                return None

-            # 1. 满足时长要求的素材
+            # 1. 过滤掉时长小于 required_duration 的素材
            matching = [m for m in all_materials if m["duration"] >= required_duration]
+            if not matching:
+                return None

-            # 2. 差值最近的 5 个素材（全局）
-            sorted_by_diff = sorted(all_materials, key=lambda m: abs(m["duration"] - required_duration))
-            closest_5 = sorted_by_diff[:5]
-
-            # 3. 合并候选池并去重（matching 在前，优先保留满足时长的）
-            candidate_pool = []
-            seen = set()
-            for m in matching + closest_5:
-                if m["url"] not in seen:
-                    candidate_pool.append(m)
-                    seen.add(m["url"])
-
-            # 4. 排除已使用的，从中随机选
-            unused = [m for m in candidate_pool if m["url"] not in exclude_urls]
+            # 2. 排除已使用的，从中随机选
+            unused = [m for m in matching if m["url"] not in exclude_urls]
            if unused:
                return random.choice(unused)

-            # 5. 严格模式下不允许返回已排除的素材
+            # 3. 严格模式下不允许返回已排除的素材
            if strict:
                return None

-            # 6. 非严格模式：全部用完则允许重复
-            if candidate_pool:
-                return random.choice(candidate_pool)
+            # 4. 非严格模式：全部用完则允许重复
+            return random.choice(matching)

-            return None
    return None
@@ -25,13 +25,16 @@
    "灯槽灯带": "ceiling",
    "乳胶漆色卡": "paint",
    "墙面工艺": "paint",
+    "刮腻子": "paint",
    "艺术漆选样": "paint",
    "腻子打磨": "putty",
    "橱柜": "cabinet",
    "木工施工": "cabinet",
    "验收标准": "final",
    "网红开篇": "intro",
-    "合同": "contract"
+    "壁纸施工": "wallpaper",
+    "合同": "contract",
+    "签合同": "contract"
  },
  "materials": {
    "ceiling": [
@@ -0,0 +1,105 @@
+# 美家卡智剪 - 开发服务器配置
+# 自包含：PostgreSQL + Redis + API + Scheduler
+# usage: docker compose -f docker-compose.dev.yml up -d --build
+
+version: "3.8"
+
+services:
+  db:
+    image: postgres:15-alpine
+    container_name: meijiaka-dev-db
+    environment:
+      POSTGRES_USER: postgres
+      POSTGRES_PASSWORD: postgres
+      POSTGRES_DB: meijiaka_dev
+    volumes:
+      - postgres_dev_data:/var/lib/postgresql/data
+    ports:
+      - "127.0.0.1:5432:5432"
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U postgres"]
+      interval: 5s
+      timeout: 5s
+      retries: 5
+    networks:
+      - dev-network
+
+  redis:
+    image: redis:7-alpine
+    container_name: meijiaka-dev-redis
+    volumes:
+      - redis_dev_data:/data
+    ports:
+      - "127.0.0.1:6379:6379"
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 5s
+      timeout: 5s
+      retries: 5
+    networks:
+      - dev-network
+
+  api:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: meijiaka-dev-api
+    environment:
+      - ENV=development
+      - DEBUG=true
+      - DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/meijiaka_dev
+      - REDIS_HOST=redis
+      - REDIS_PORT=6379
+      - REDIS_DB=0
+      - SECRET_KEY=dev-secret-key-do-not-use-in-prod
+      - MINIMAX_API_KEY=${MINIMAX_API_KEY}
+      - MINIMAX_BASE_URL=${MINIMAX_BASE_URL:-https://api.minimaxi.com}
+      - VIDU_API_KEY=${VIDU_API_KEY}
+      - VIDU_BASE_URL=${VIDU_BASE_URL:-https://api.vidu.cn}
+      - LOG_LEVEL=DEBUG
+    volumes:
+      - .:/app
+      - ../data:/root/Documents/Meijiaka-zj
+    ports:
+      - "8080:8000"
+    command: gunicorn app.main:app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 --reload
+    depends_on:
+      db:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+    networks:
+      - dev-network
+
+  scheduler:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: meijiaka-dev-scheduler
+    environment:
+      - ENV=development
+      - DEBUG=true
+      - DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/meijiaka_dev
+      - REDIS_HOST=redis
+      - REDIS_PORT=6379
+      - REDIS_DB=0
+      - SECRET_KEY=dev-secret-key-do-not-use-in-prod
+    volumes:
+      - .:/app
+      - ../data:/root/Documents/Meijiaka-zj
+    command: python -m app.scheduler.main
+    depends_on:
+      db:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+    networks:
+      - dev-network
+
+volumes:
+  postgres_dev_data:
+  redis_dev_data:
+
+networks:
+  dev-network:
+    driver: bridge
@@ -0,0 +1,275 @@
+#!/usr/bin/env python3
+"""
+视频片段替换 MVP
+================
+
+基于音频文字内容，用人物视频的对应片段替换空镜视频的对应片段。
+
+前置依赖：
+    pip install openai-whisper
+
+用法示例：
+    python scripts/video-replace-mvp.py \
+        --person person.mp4 \
+        --broll  broll.mp4 \
+        --query  "水电改造要注意"
+
+原理：
+    1. Whisper 识别人物视频音频 → 输出带时间戳的文案
+    2. 文本匹配找到目标时间段 [start, end]
+    3. FFmpeg overlay 滤镜：在 [start, end] 区间用人物画面覆盖空镜画面
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import shutil
+import subprocess
+import sys
+from difflib import SequenceMatcher
+from pathlib import Path
+
+
+def check_dep(name: str) -> str | None:
+    """检查系统命令是否存在"""
+    path = shutil.which(name)
+    return path
+
+
+def ensure_whisper():
+    """确保 whisper 可用"""
+    try:
+        import whisper  # noqa: F401
+        return True
+    except ImportError:
+        print("❌ 未安装 openai-whisper")
+        print("   安装命令：pip install openai-whisper")
+        print("   （首次会自动下载模型，base 模型约 150MB）")
+        return False
+
+
+def run_whisper(video_path: str, model: str = "base") -> list[dict]:
+    """Whisper 识别，返回 segment 列表（含 start/end/text）"""
+    import whisper
+
+    print(f"   加载模型：{model}")
+    model_obj = whisper.load_model(model)
+
+    print(f"   识别中...（模型：{model}，视频：{Path(video_path).name}）")
+    result = model_obj.transcribe(
+        video_path,
+        language="zh",
+        word_timestamps=False,  # segment 级别够用了
+        fp16=False,  # CPU 友好
+    )
+    return result["segments"]
+
+
+def find_time_range(
+    segments: list[dict],
+    query: str,
+    threshold: float = 0.6,
+) -> tuple[float, float, str] | None:
+    """
+    根据查询文字匹配时间段
+
+    匹配策略（优先级递减）：
+    1. 精确子串匹配
+    2. 模糊匹配（最长公共子序列相似度 ≥ threshold）
+    """
+    query = query.strip()
+
+    # 1. 精确子串匹配
+    for seg in segments:
+        text = seg["text"].strip()
+        if query in text:
+            return seg["start"], seg["end"], text
+
+    # 2. 模糊匹配
+    best = None
+    best_score = 0.0
+    for seg in segments:
+        text = seg["text"].strip()
+        score = SequenceMatcher(None, query, text).ratio()
+        if score > best_score and score >= threshold:
+            best_score = score
+            best = seg
+
+    if best:
+        return best["start"], best["end"], best["text"].strip()
+
+    return None
+
+
+def get_video_info(video_path: str) -> dict:
+    """ffprobe 获取视频信息"""
+    cmd = [
+        "ffprobe", "-v", "error",
+        "-select_streams", "v:0",
+        "-show_entries", "stream=width,height,r_frame_rate,duration",
+        "-show_entries", "format=duration",
+        "-of", "json",
+        video_path,
+    ]
+    result = subprocess.run(cmd, capture_output=True, text=True, check=True)
+    data = json.loads(result.stdout)
+
+    stream = data.get("streams", [{}])[0]
+    fmt = data.get("format", {})
+
+    # 解析帧率（如 "25/1" → 25.0）
+    fps_str = stream.get("r_frame_rate", "25/1")
+    if "/" in fps_str:
+        num, den = fps_str.split("/")
+        fps = float(num) / float(den)
+    else:
+        fps = float(fps_str)
+
+    return {
+        "width": stream.get("width", 1920),
+        "height": stream.get("height", 1080),
+        "fps": fps,
+        "duration": float(fmt.get("duration", stream.get("duration", 0))),
+    }
+
+
+def replace_with_overlay(
+    person_video: str,
+    broll_video: str,
+    start: float,
+    end: float,
+    output: str,
+    crf: int = 18,
+):
+    """
+    用 FFmpeg overlay 滤镜替换片段
+
+    逻辑：
+        - 输入0 (broll)：底图 + 音频
+        - 输入1 (person)：被截取的画面片段
+        - [1:v] trim → 截取 [start, end] → setpts 归零 → scale 适配分辨率
+        - [0:v][clip] overlay → 在 between(t,start,end) 时显示 clip
+        - 输出：画面 = 替换后的视频，音频 = 原 broll 音频
+    """
+    duration = end - start
+    broll_info = get_video_info(broll_video)
+    w, h = broll_info["width"], broll_info["height"]
+
+    print(f"   空镜分辨率：{w}x{h}, 帧率：{broll_info['fps']:.2f}fps")
+    print(f"   截取人物片段：{start:.3f}s ~ {end:.3f}s（{duration:.3f}s）")
+    print(f"   正在渲染...（CRF={crf}）")
+
+    # overlay filter
+    # 注意：between(t,start,end) 中的逗号需要转义
+    filter_graph = (
+        f"[1:v]trim=start={start}:end={end},"
+        f"setpts=PTS-STARTPTS,"
+        f"scale={w}:{h}:force_original_aspect_ratio=decrease,"
+        f"pad={w}:{h}:(ow-iw)/2:(oh-ih)/2:black[clip];"
+        f"[0:v][clip]overlay="
+        f"enable='between(t\\,{start}\\,{end})':"
+        f"x=(W-w)/2:y=(H-h)/2[v]"
+    )
+
+    cmd = [
+        "ffmpeg", "-y",
+        "-i", broll_video,
+        "-i", person_video,
+        "-filter_complex", filter_graph,
+        "-map", "[v]",
+        "-map", "0:a",
+        "-c:v", "libx264", "-crf", str(crf), "-preset", "fast",
+        "-c:a", "copy",
+        "-movflags", "+faststart",
+        output,
+    ]
+
+    subprocess.run(cmd, check=True, capture_output=True)
+    print(f"✅ 输出完成：{output}")
+
+
+def save_srt(segments: list[dict], path: str):
+    """保存 SRT 字幕供人工校对"""
+    def fmt(s: float) -> str:
+        h = int(s // 3600)
+        m = int((s % 3600) // 60)
+        sec = int(s % 60)
+        ms = int((s % 1) * 1000)
+        return f"{h:02d}:{m:02d}:{sec:02d},{ms:03d}"
+
+    with open(path, "w", encoding="utf-8") as f:
+        for i, seg in enumerate(segments, 1):
+            f.write(f"{i}\n{fmt(seg['start'])} --> {fmt(seg['end'])}\n{seg['text'].strip()}\n\n")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="基于音频文字的视频片段替换 MVP")
+    parser.add_argument("--person", required=True, help="人物出镜视频路径（提供画面）")
+    parser.add_argument("--broll", required=True, help="空镜视频路径（提供底图+音频）")
+    parser.add_argument("--query", required=True, help="要替换的文案（如：水电改造要注意）")
+    parser.add_argument("--output", default="output_replaced.mp4", help="输出文件路径")
+    parser.add_argument("--model", default="base", choices=["tiny", "base", "small"],
+                        help="Whisper 模型，tiny 最快，small 最准")
+    parser.add_argument("--crf", type=int, default=18, help="视频质量（0=无损，23=默认，越大越小）")
+    parser.add_argument("--threshold", type=float, default=0.6,
+                        help="模糊匹配阈值（0~1），低于此值视为未匹配")
+    args = parser.parse_args()
+
+    # 0. 依赖检查
+    if not check_dep("ffmpeg"):
+        print("❌ 未找到 ffmpeg")
+        sys.exit(1)
+
+    if not check_dep("ffprobe"):
+        print("❌ 未找到 ffprobe")
+        sys.exit(1)
+
+    if not ensure_whisper():
+        sys.exit(1)
+
+    for p in (args.person, args.broll):
+        if not Path(p).exists():
+            print(f"❌ 文件不存在：{p}")
+            sys.exit(1)
+
+    # 1. ASR 识别人物视频
+    print(f"\n🎙️  Step 1/3：识别人物视频音频")
+    segments = run_whisper(args.person, args.model)
+    print(f"   识别到 {len(segments)} 句话")
+
+    # 保存字幕供参考
+    srt_path = str(Path(args.output).with_suffix(".srt"))
+    save_srt(segments, srt_path)
+    print(f"📝 字幕已保存：{srt_path}")
+
+    # 2. 文本匹配
+    print(f"\n🔍 Step 2/3：查找文案「{args.query}」")
+    result = find_time_range(segments, args.query, threshold=args.threshold)
+    if not result:
+        print(f"❌ 未找到匹配文案（阈值 {args.threshold}）")
+        print(f"   建议：查看 {srt_path} 里的实际文案，调整 --query 内容")
+        sys.exit(1)
+
+    start, end, matched_text = result
+    print(f"   匹配文案：「{matched_text}」")
+    print(f"   时间段：  {start:.3f}s ~ {end:.3f}s（时长 {end - start:.3f}s）")
+
+    # 3. FFmpeg 替换
+    print(f"\n🎬 Step 3/3：替换片段")
+    replace_with_overlay(
+        args.person,
+        args.broll,
+        start,
+        end,
+        args.output,
+        crf=args.crf,
+    )
+
+    print(f"\n🎉 全部完成！")
+    print(f"   输出文件：{args.output}")
+    print(f"   字幕参考：{srt_path}")
+
+
+if __name__ == "__main__":
+    main()
@@ -39,13 +39,16 @@ pub struct ComposeVideoResult {
    pub duration: f64,
 }

-/// 上传视频响应
+/// 通用上传响应
 #[derive(Debug, Serialize)]
 #[serde(rename_all = "camelCase")]
-pub struct UploadVideoResult {
+pub struct UploadResult {
    pub url: String,
 }

+/// 兼容旧命名
+pub type UploadVideoResult = UploadResult;
+
 /// 获取项目视频目录
 fn get_project_video_dir(project_id: &str) -> Result<std::path::PathBuf, String> {
    let docs_dir = dirs::document_dir().ok_or("无法获取文档目录")?;
@@ -166,6 +169,53 @@ pub async fn compose_video(
    }
 }

+/// 截取视频片段请求参数
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "camelCase")]
+pub struct ExtractVideoSegmentArgs {
+    pub input_path: String,
+    pub start: f64,
+    pub duration: f64,
+    pub output_path: String,
+}
+
+/// 截取视频片段（FFmpeg clip_video 封装）
+#[tauri::command]
+pub async fn extract_video_segment(
+    app: AppHandle,
+    args: ExtractVideoSegmentArgs,
+) -> ApiResponse<String> {
+    let safe_output = match sanitize_output_path(&args.output_path) {
+        Ok(p) => p,
+        Err(e) => return ApiResponse { code: 500, message: e, data: None },
+    };
+
+    let safe_input = if args.input_path.starts_with("http://") || args.input_path.starts_with("https://") {
+        args.input_path.clone()
+    } else if std::path::Path::new(&args.input_path).exists() {
+        args.input_path.clone()
+    } else {
+        return ApiResponse {
+            code: 500,
+            message: format!("输入文件不存在: {}", args.input_path),
+            data: None,
+        };
+    };
+
+    match ffmpeg_cmd::clip_video(&app, &safe_input, args.start, args.duration, &safe_output).await {
+        Ok(_) => ApiResponse {
+            code: 200,
+            message: "视频片段截取成功".to_string(),
+            data: Some(safe_output),
+        },
+        Err(e) => ApiResponse {
+            code: 500,
+            message: format!("截取视频片段失败: {}", e),
+            data: None,
+        },
+    }
+}
+
 /// 上传视频请求参数
 #[derive(Debug, Deserialize)]
 #[serde(rename_all = "camelCase")]
@@ -173,47 +223,45 @@ pub struct UploadVideoArgs {
    pub local_path: String,
 }

-/// 上传本地视频到后端，后端上传到七牛云并返回 URL
-#[tauri::command]
-pub async fn upload_video_file(
-    local_path: String,
-) -> ApiResponse<UploadVideoResult> {
-    // 读取本地文件
-    let file_bytes = match std::fs::read(&local_path) {
+/// 通用文件上传（本地 → 后端 → 七牛云）
+pub async fn upload_file_to_backend(
+    local_path: &str,
+    endpoint: &str,
+    default_filename: &str,
+    mime_type: &str,
+    read_error_prefix: &str,
+) -> ApiResponse<UploadResult> {
+    let file_bytes = match std::fs::read(local_path) {
        Ok(bytes) => bytes,
        Err(e) => {
            return ApiResponse {
                code: 500,
-                message: format!("读取视频文件失败: {}", e),
+                message: format!("{}: {}", read_error_prefix, e),
                data: None,
            };
        }
    };

-    // 获取文件名
-    let filename = std::path::Path::new(&local_path)
+    let filename = std::path::Path::new(local_path)
        .file_name()
        .and_then(|n| n.to_str())
-        .unwrap_or("video.mp4")
+        .unwrap_or(default_filename)
        .to_string();

-    // 构建 multipart 请求
    let backend_url = crate::PYTHON_API_BASE_URL;
-    let upload_url = format!("{}/upload/video", backend_url);
+    let upload_url = format!("{}{}", backend_url, endpoint);

    let client = reqwest::Client::new();

-    // 构建 multipart form
    let form = reqwest::multipart::Form::new()
        .part(
            "file",
            reqwest::multipart::Part::bytes(file_bytes)
                .file_name(filename)
-                .mime_str("video/mp4")
+                .mime_str(mime_type)
                .unwrap_or_else(|_| reqwest::multipart::Part::bytes(vec![])),
        );

-    // 发送请求
    let response = match client.post(&upload_url).multipart(form).send().await {
        Ok(resp) => resp,
        Err(e) => {
@@ -235,7 +283,6 @@ pub async fn upload_video_file(
        };
    }

-    // 解析响应
    let result: serde_json::Value = match response.json().await {
        Ok(data) => data,
        Err(e) => {
@@ -247,7 +294,6 @@ pub async fn upload_video_file(
        }
    };

-    // 提取 URL
    let url = result
        .get("data")
        .and_then(|d| d.get("url"))
@@ -258,7 +304,7 @@ pub async fn upload_video_file(
        Some(url) => ApiResponse {
            code: 200,
            message: "上传成功".to_string(),
-            data: Some(UploadVideoResult { url }),
+            data: Some(UploadResult { url }),
        },
        None => ApiResponse {
            code: 500,
@@ -268,30 +314,25 @@ pub async fn upload_video_file(
    }
 }

+/// 上传本地视频到后端，后端上传到七牛云并返回 URL
+#[tauri::command]
+pub async fn upload_video_file(
+    local_path: String,
+) -> ApiResponse<UploadVideoResult> {
+    upload_file_to_backend(
+        &local_path,
+        "/upload/video",
+        "video.mp4",
+        "video/mp4",
+        "读取视频文件失败",
+    ).await
+}
+
 /// 上传本地图片到后端，后端上传到七牛云并返回 URL
 #[tauri::command]
 pub async fn upload_image_file(
    local_path: String,
 ) -> ApiResponse<UploadVideoResult> {
-    // 读取本地文件
-    let file_bytes = match std::fs::read(&local_path) {
-        Ok(bytes) => bytes,
-        Err(e) => {
-            return ApiResponse {
-                code: 500,
-                message: format!("读取图片文件失败: {}", e),
-                data: None,
-            };
-        }
-    };
-
-    // 获取文件名和扩展名，推断 mime type
-    let filename = std::path::Path::new(&local_path)
-        .file_name()
-        .and_then(|n| n.to_str())
-        .unwrap_or("image.jpg")
-        .to_string();
-
    let ext = std::path::Path::new(&local_path)
        .extension()
        .and_then(|e| e.to_str())
@@ -306,75 +347,13 @@ pub async fn upload_image_file(
        _ => "image/jpeg",
    };

-    // 构建 multipart 请求
-    let backend_url = crate::PYTHON_API_BASE_URL;
-    let upload_url = format!("{}/upload/image", backend_url);
-
-    let client = reqwest::Client::new();
-
-    // 构建 multipart form
-    let form = reqwest::multipart::Form::new()
-        .part(
-            "file",
-            reqwest::multipart::Part::bytes(file_bytes)
-                .file_name(filename)
-                .mime_str(mime_type)
-                .unwrap_or_else(|_| reqwest::multipart::Part::bytes(vec![])),
-        );
-
-    // 发送请求
-    let response = match client.post(&upload_url).multipart(form).send().await {
-        Ok(resp) => resp,
-        Err(e) => {
-            return ApiResponse {
-                code: 500,
-                message: format!("上传请求失败: {}", e),
-                data: None,
-            };
-        }
-    };
-
-    if !response.status().is_success() {
-        let status = response.status();
-        let error_text = response.text().await.unwrap_or_default();
-        return ApiResponse {
-            code: status.as_u16() as i32,
-            message: format!("上传失败: {} - {}", status, error_text),
-            data: None,
-        };
-    }
-
-    // 解析响应
-    let result: serde_json::Value = match response.json().await {
-        Ok(data) => data,
-        Err(e) => {
-            return ApiResponse {
-                code: 500,
-                message: format!("解析上传响应失败: {}", e),
-                data: None,
-            };
-        }
-    };
-
-    // 提取 URL
-    let url = result
-        .get("data")
-        .and_then(|d| d.get("url"))
-        .and_then(|u| u.as_str())
-        .map(|s| s.to_string());
-
-    match url {
-        Some(url) => ApiResponse {
-            code: 200,
-            message: "上传成功".to_string(),
-            data: Some(UploadVideoResult { url }),
-        },
-        None => ApiResponse {
-            code: 500,
-            message: "上传响应中未找到 URL".to_string(),
-            data: None,
-        },
-    }
+    upload_file_to_backend(
+        &local_path,
+        "/upload/image",
+        "image.jpg",
+        mime_type,
+        "读取图片文件失败",
+    ).await
 }

 /// 下载文件请求参数
@@ -191,3 +191,66 @@ pub async fn get_project_audios_dir(
        },
    }
 }
+
+// --------------------- 音频截取与上传 ---------------------
+
+#[derive(serde::Deserialize)]
+#[serde(rename_all = "camelCase")]
+pub struct ExtractAudioSegmentArgs {
+    pub input_path: String,
+    pub start: f64,
+    pub duration: f64,
+    pub output_path: String,
+}
+
+/// 截取音频片段（FFmpeg）
+#[tauri::command]
+pub async fn extract_audio_segment(
+    app: tauri::AppHandle,
+    args: ExtractAudioSegmentArgs,
+) -> ApiResponse<String> {
+    match crate::ffmpeg_cmd::extract_audio_segment(
+        &app,
+        &args.input_path,
+        args.start,
+        args.duration,
+        &args.output_path,
+    ).await {
+        Ok(_) => ApiResponse {
+            code: 200,
+            message: "Audio segment extracted successfully".to_string(),
+            data: Some(args.output_path),
+        },
+        Err(e) => ApiResponse {
+            code: 500,
+            message: format!("Failed to extract audio segment: {}", e),
+            data: None,
+        },
+    }
+}
+
+/// 上传本地音频文件到后端，后端上传到七牛云并返回 URL
+#[tauri::command]
+pub async fn upload_audio_file(
+    local_path: String,
+) -> ApiResponse<crate::commands::video_compose::UploadVideoResult> {
+    // 验证路径安全
+    let safe_path = match crate::ffmpeg_cmd::sanitize_output_path(&local_path) {
+        Ok(p) => p,
+        Err(e) => {
+            return ApiResponse {
+                code: 500,
+                message: format!("路径验证失败: {}", e),
+                data: None,
+            };
+        }
+    };
+
+    crate::commands::video_compose::upload_file_to_backend(
+        &safe_path,
+        "/upload/audio",
+        "audio.mp3",
+        "audio/mpeg",
+        "读取音频文件失败",
+    ).await
+}
@@ -129,7 +129,7 @@ pub async fn run_ffmpeg(app: &AppHandle, args: Vec<String>) -> Result<String, St
 }

 /**
- * 标准化单个视频片段 (调整为 1080:1920, 30fps, libx264, aac 44100Hz stereo)
+ * 标准化单个视频片段 (调整为 1080:1920, 25fps, libx264, aac 44100Hz stereo)
 */
 pub async fn standardize_video(app: &AppHandle, input_path: &str, output_path: &str) -> Result<(), String> {
    // 验证路径安全
@@ -138,14 +138,14 @@ pub async fn standardize_video(app: &AppHandle, input_path: &str, output_path: &
    
    let args = vec![
        "-i".to_string(), safe_input,
-        "-vf".to_string(), "fps=30,scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2,format=yuv420p".to_string(),
+        "-vf".to_string(), "fps=25,scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2,format=yuv420p".to_string(),
        "-c:v".to_string(), "libx264".to_string(),
        "-c:a".to_string(), "aac".to_string(),
        "-ar".to_string(), "44100".to_string(),
        "-ac".to_string(), "2".to_string(),
        "-preset".to_string(), "veryfast".to_string(),
        "-crf".to_string(), "23".to_string(),
-        "-r".to_string(), "30".to_string(),
+        "-r".to_string(), "25".to_string(),
        "-y".to_string(),
        safe_output
    ];
@@ -233,7 +233,6 @@ pub async fn add_audio_to_video(app: &AppHandle, video_path: &str, audio_path: &
        "-ar".to_string(), "44100".to_string(), // 统一采样率
        "-map".to_string(), "0:v:0".to_string(),
        "-map".to_string(), "1:a:0".to_string(),
-        "-shortest".to_string(),
        "-y".to_string(),
        safe_output
    ];
@@ -241,7 +240,7 @@ pub async fn add_audio_to_video(app: &AppHandle, video_path: &str, audio_path: &
 }

 /**
- * 将封面图转换为一段短视频 (0.5s, 1080x1920, 30fps)
+ * 将封面图转换为一段短视频 (0.5s, 1080x1920, 25fps)
 * 带静音音频轨道，避免 concat 时丢失后续片段音频
 */
 pub async fn create_cover_video(app: &AppHandle, input_path: &str, output_path: &str, duration: &str) -> Result<(), String> {
@@ -259,7 +258,7 @@ pub async fn create_cover_video(app: &AppHandle, input_path: &str, output_path:
        "-t".to_string(), duration.to_string(),
        "-pix_fmt".to_string(), "yuv420p".to_string(),
        "-vf".to_string(), "scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2,setsar=1".to_string(),
-        "-r".to_string(), "30".to_string(),
+        "-r".to_string(), "25".to_string(),
        "-shortest".to_string(),
        "-y".to_string(),
        safe_output
@@ -487,7 +486,6 @@ pub async fn replace_audio_track(
        // 只保留第一个视频流和第一个音频流
        "-map".to_string(), "0:v:0".to_string(),
        "-map".to_string(), "1:a:0".to_string(),
-        "-shortest".to_string(),
        "-y".to_string(),
        safe_output,
    ];
@@ -546,7 +544,7 @@ pub async fn mix_audio_tracks(
 /**
 * 裁剪视频片段（支持本地文件和 HTTP URL）
 *
- * 从起始时间裁剪指定时长，同时标准化输出格式（1080x1920, 30fps, libx264, aac）。
+ * 从起始时间裁剪指定时长，同时标准化输出格式（1080x1920, 25fps, libx264, aac）。
 * 适用于从人物形象素材或空镜素材中提取指定时长的片段。
 */
 pub async fn clip_video(
@@ -575,14 +573,14 @@ pub async fn clip_video(
        "-ss".to_string(), start_str,
        "-t".to_string(), duration_str,
        "-i".to_string(), safe_input,
-        "-vf".to_string(), "fps=30,scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2,format=yuv420p".to_string(),
+        "-vf".to_string(), "fps=25,scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2,format=yuv420p".to_string(),
        "-c:v".to_string(), "libx264".to_string(),
        "-preset".to_string(), "veryfast".to_string(),
        "-crf".to_string(), "23".to_string(),
        "-c:a".to_string(), "aac".to_string(),
        "-ar".to_string(), "44100".to_string(),
        "-ac".to_string(), "2".to_string(),
-        "-r".to_string(), "30".to_string(),
+        "-r".to_string(), "25".to_string(),
        "-pix_fmt".to_string(), "yuv420p".to_string(),
        "-avoid_negative_ts".to_string(), "make_zero".to_string(),
        "-y".to_string(),
@@ -592,6 +590,39 @@ pub async fn clip_video(
    run_ffmpeg(app, args).await.map(|_| ())
 }

+/**
+ * 截取音频片段
+ *
+ * 从指定起始时间截取指定时长的音频，输出为 MP3 格式。
+ */
+pub async fn extract_audio_segment(
+    app: &AppHandle,
+    input_path: &str,
+    start: f64,
+    duration: f64,
+    output_path: &str,
+) -> Result<(), String> {
+    let safe_input = validate_safe_path(input_path)?;
+    let safe_output = sanitize_output_path(output_path)?;
+
+    let start_str = format!("{:.3}", start);
+    let duration_str = format!("{:.3}", duration);
+
+    let args = vec![
+        "-ss".to_string(), start_str,
+        "-t".to_string(), duration_str,
+        "-i".to_string(), safe_input,
+        "-c:a".to_string(), "libmp3lame".to_string(),
+        "-b:a".to_string(), "192k".to_string(),
+        "-ar".to_string(), "44100".to_string(),
+        "-ac".to_string(), "2".to_string(),
+        "-vn".to_string(), // 无视频
+        "-y".to_string(),
+        safe_output,
+    ];
+    run_ffmpeg(app, args).await.map(|_| ())
+}
+
 /**
 * 转码音频为标准格式 (MP3 44.1kHz stereo 192kbps)
 */
@@ -118,12 +118,15 @@ pub fn run() {
            commands::voice::list_project_audios,
            commands::voice::delete_audio,
            commands::voice::get_project_audios_dir,
+            commands::voice::extract_audio_segment,
+            commands::voice::upload_audio_file,
            // 音色素材库
            commands::voice::load_voice_materials,
            commands::voice::save_voice_material,
            commands::voice::delete_voice_material_cmd,
            // 视频合成（Phase 2）
            commands::video_compose::compose_video,
+            commands::video_compose::extract_video_segment,
            commands::video_compose::upload_video_file,
            commands::video_compose::download_file,
            // 音频处理
@@ -0,0 +1,60 @@
+/**
+ * Caption 字幕 API 模块
+ * =====================
+ *
+ * 直接调用后端字幕相关 API（不走 Async Engine）。
+ */
+
+import { client } from '../client';
+
+export interface CaptionUtterance {
+  text: string;
+  startTime: number; // 毫秒（client.ts 自动将后端 snake_case 转为 camelCase）
+  endTime: number;   // 毫秒
+}
+
+export interface AutoAlignResult {
+  code: number;
+  message: string;
+  duration: number; // 秒
+  utterances: CaptionUtterance[];
+}
+
+/**
+ * 自动字幕打轴（完整流程，同步阻塞）
+ *
+ * 为已有音频文本配上时间轴。后端内部轮询，最多等待 120 秒。
+ *
+ * @param audioUrl 音频/视频文件 URL（七牛云）
+ * @param audioText 要打轴的完整字幕文本
+ * @returns 打轴结果，含每句话的时间轴
+ */
+export async function autoAlignCaption(
+  audioUrl: string,
+  audioText: string
+): Promise<AutoAlignResult> {
+  // client.post 已自动提取 ApiResponse.data 并做 snakeToCamel 转换
+  const result = await client.post<{
+    code: number;
+    message: string;
+    duration: number;
+    utterances: CaptionUtterance[];
+  }>('/caption/ata/align', {
+    audioUrl,
+    audioText,
+    captionType: 'speech',
+    staPuncMode: 3,
+  });
+
+  // result.code 是火山引擎打轴结果的状态码（0=成功, 2000=处理中）
+  if (result.code !== 0) {
+    throw new Error(result.message || '打轴失败');
+  }
+
+  return {
+    code: result.code,
+    message: result.message,
+    duration: result.duration,
+    utterances: result.utterances || [],
+  };
+}
@@ -67,6 +67,9 @@ export interface ProjectMeta {
  dubbingAudioUrl?: string;     // 生成后的配音音频七牛云URL
  dubbingAudioPath?: string;    // 生成后的配音音频本地路径
  dubbingVoiceId?: string;      // 生成配音使用的音色ID
+  voiceSpeed?: number;           // 配音语速
+  voiceVolume?: number;          // 配音音量
+  voicePitch?: number;           // 配音音调
  subtitleAlignment?: AlignmentResult; // 全局字幕打轴结果（单视频模式）
  burnedVideoPath?: string;     // 压制字幕后的成品视频路径
  coverConfig?: {
@@ -120,6 +123,13 @@ export interface ProjectSegment {
  alignmentResult?: AlignmentResult; // 字幕打轴结果
  burnedVideoPath?: string; // 压制字幕后的视频路径
  burnedAt?: number;        // 压制字幕的时间戳
+  audioStartTime?: number;  // 在完整配音音频中的开始时间（毫秒）
+  audioEndTime?: number;    // 在完整配音音频中的结束时间（毫秒）
+  actualDuration?: number;  // 实际时长（秒，基于字幕打轴）
+  clipAudioPath?: string;   // 截取后的音频片段本地路径
+  clipAudioUrl?: string;    // 截取后的音频片段七牛云 URL
+  lipSyncTaskId?: string;   // Vidu 对口型任务 ID
+  lipSyncState?: string;    // Vidu 对口型任务状态
 }

 /**
@@ -146,6 +156,7 @@ export const localProjectApi = {
      selectedElementId: meta.selectedElementId,
      selectedVoiceId: meta.selectedVoiceId,
      composedVideoUrl: meta.composedVideoUrl,
+      composedVideoPath: meta.composedVideoPath,
      lipSyncTaskId: meta.lipSyncTaskId,
      lipSyncState: meta.lipSyncState,
      lipSyncedVideoPath: meta.lipSyncedVideoPath,
@@ -153,6 +164,9 @@ export const localProjectApi = {
      dubbingAudioUrl: meta.dubbingAudioUrl,
      dubbingAudioPath: meta.dubbingAudioPath,
      dubbingVoiceId: meta.dubbingVoiceId,
+      voiceSpeed: meta.voiceSpeed,
+      voiceVolume: meta.voiceVolume,
+      voicePitch: meta.voicePitch,
      avatarMaterialPath: meta.avatarMaterialPath,
      avatarMaterialName: meta.avatarMaterialName,
      avatarMaterialDuration: meta.avatarMaterialDuration,
@@ -199,6 +213,11 @@ export const localProjectApi = {
      alignmentResult: s.alignmentResult,
      burnedVideoPath: s.burnedVideoPath,
      burnedAt: s.burnedAt,
+      audioStartTime: s.audioStartTime,
+      audioEndTime: s.audioEndTime,
+      actualDuration: s.actualDuration,
+      clipAudioPath: s.clipAudioPath,
+      clipAudioUrl: s.clipAudioUrl,
    }));
    const jsonContent = JSON.stringify(orderedSegments, null, 2);
    const res = await safeInvoke<ApiResponse<boolean>>('save_project_segments_raw', { 
@@ -84,6 +84,22 @@ export async function uploadImageFile(localPath: string): Promise<string> {
  return res.data!.url;
 }

+/**
+ * 上传本地音频文件到后端（后端上传到七牛云）
+ *
+ * @param localPath 本地音频文件路径
+ * @returns 七牛云 URL
+ */
+export async function uploadAudioFile(localPath: string): Promise<string> {
+  const res = await invoke<ApiResponse<UploadVideoResult>>('upload_audio_file', {
+    localPath,
+  });
+  if (res.code !== 200) {
+    throw new Error(res.message);
+  }
+  return res.data!.url;
+}
+
 /**
 * 从 URL 下载文件到本地
 *
@@ -319,3 +319,26 @@ export async function standardizeAudio(args: StandardizeAudioRequest): Promise<s
  }
  return result.data;
 }
+
+export interface ExtractAudioSegmentRequest {
+  inputPath: string;
+  start: number;
+  duration: number;
+  outputPath: string;
+}
+
+/** 截取音频片段 */
+export async function extractAudioSegment(args: ExtractAudioSegmentRequest): Promise<string> {
+  const result = await invoke<{ code: number; data?: string; message: string }>('extract_audio_segment', {
+    args: {
+      inputPath: args.inputPath,
+      start: args.start,
+      duration: args.duration,
+      outputPath: args.outputPath,
+    },
+  });
+  if (result.code !== 200 || !result.data) {
+    throw new Error(result.message || '截取音频片段失败');
+  }
+  return result.data;
+}
@@ -59,4 +59,11 @@ export interface ScriptShot {
  burnedAt?: number;        // 压制字幕的时间戳
  audioPath?: string;       // 本地配音音频文件路径
  audioUrl?: string;        // 七牛云配音音频 URL
+  audioStartTime?: number;  // 在完整配音音频中的开始时间（毫秒）
+  audioEndTime?: number;    // 在完整配音音频中的结束时间（毫秒）
+  actualDuration?: number;  // 实际时长（秒，基于字幕打轴）
+  clipAudioPath?: string;   // 截取后的音频片段本地路径
+  clipAudioUrl?: string;    // 截取后的音频片段七牛云 URL
+  lipSyncTaskId?: string;   // Vidu 对口型任务 ID
+  lipSyncState?: string;    // Vidu 对口型任务状态
 }
@@ -6,18 +6,18 @@
 * 布局：左侧操作区 + 右侧预览区（使用 step-layout 标准布局）
 */

-import { useState, useEffect, useRef, useMemo } from 'react';
+import { useState, useRef, useMemo } from 'react';
 import { invoke } from '@tauri-apps/api/core';
 import { homeDir } from '@tauri-apps/api/path';
 import { useProjectStore, saveMetaToLocalFile } from '../../store';
 import { getCurrentProjectId } from '../../api/modules/localStorage';
-import { useTask } from '../../hooks/useTask';
+
 import { useLocalVideo } from '../../hooks/useLocalVideo';
 import { useAssJsRenderer } from '../../hooks/useAssJsRenderer';
 import { generateAssFromAlignment, saveAssFile, htmlColorToAss, applyAssJsCompensation } from '../../utils/assGenerator';
 import { useProgressStore } from '../../store/progressStore';
 import { toast } from '../../store/uiStore';
-import type { AlignmentResult } from '../../api/types';
+
 import './SubtitleBurning.css';

 // 解析 Docker 容器内路径 (/root/Documents/...) 转换为本地用户路径
@@ -46,27 +46,26 @@ const SUBTITLE_PRESETS = [
 ];

 export default function SubtitleBurning() {
-  const segments = useProjectStore(state => state.segments);
+
  const projectId = getCurrentProjectId();

-  // 成品视频
-  const lipSyncedVideoUrl = useProjectStore(state => state.lipSyncedVideoUrl);
-  const lipSyncedVideoPath = useProjectStore(state => state.lipSyncedVideoPath);
+  // 成品视频（临时：只用拼接视频，对口型替换验证通过后再启用）
+  const composedVideoUrl = useProjectStore(state => state.composedVideoUrl);
+  const composedVideoPath = useProjectStore(state => state.composedVideoPath);

-  // 打轴状态
-  const storeAlignment = useProjectStore(state => state.subtitleAlignment);
-  const [alignment, setAlignment] = useState<AlignmentResult | undefined>(storeAlignment);
-  const [isAligning, setIsAligning] = useState(false);
+  const actualVideoUrl = composedVideoUrl;
+  const actualVideoPath = composedVideoPath;
+
+  // 打轴结果直接从 Step 2 复用（VoiceDubbing 已保存到 meta）
+  const alignment = useProjectStore(state => state.subtitleAlignment);
  const [isBurning, setIsBurning] = useState(false);

-  const { submit } = useTask();
-
  // 视频播放相关
  const videoRef = useRef<HTMLVideoElement>(null);
  const containerRef = useRef<HTMLDivElement>(null);

-  // 预览用七牛云 URL（加载快）
-  const { videoUrl: loadedVideoUrl } = useLocalVideo(lipSyncedVideoUrl);
+  // 预览用：优先 URL，否则回退到本地路径（useLocalVideo 支持本地路径读取）
+  const { videoUrl: loadedVideoUrl } = useLocalVideo(actualVideoUrl || actualVideoPath);

  // 字幕样式（默认值基于 1080x1920 视频）
  const [subtitleStyle, setSubtitleStyle] = useState<SubtitleStyle>({
@@ -125,13 +124,6 @@ export default function SubtitleBurning() {
    enabled: subtitleEnabled,
  });

-  // 从 store 恢复打轴结果（页面刷新后）
-  useEffect(() => {
-    if (storeAlignment) {
-      setAlignment(storeAlignment);
-    }
-  }, [storeAlignment]);
-
  // 应用预设样式
  const applyPreset = (presetId: string) => {
    const preset = SUBTITLE_PRESETS.find(p => p.id === presetId);
@@ -146,84 +138,13 @@ export default function SubtitleBurning() {
    });
  };

-  // 字幕打轴：对成品视频统一打轴
-  const handleAlign = async () => {
-    if (!projectId) {
-      toast.error('项目ID不存在');
-      return;
-    }
-    if (!lipSyncedVideoUrl) {
-      toast.error('请先完成视频生成');
-      return;
-    }
-
-    // 拼接所有分镜文案
-    const audioText = segments.map(s => s.voiceover).filter(Boolean).join('\n');
-    if (!audioText) {
-      toast.error('没有配音文案');
-      return;
-    }
-
-    setIsAligning(true);
-    useProgressStore.getState().show('字幕打轴');
-
-    const taskId = await submit(
-      'subtitle',
-      {
-        videoPath: lipSyncedVideoUrl,
-        audioText,
-        mode: 'auto_align',
-        language: 'zh',
-      },
-      {
-        showProgress: true,
-        callbacks: {
-          onComplete: (result: unknown) => {
-            const r = result as {
-              utterances?: Array<{ text: string; startTime: number; endTime: number }>;
-              duration?: number;
-            } | undefined;
-
-            const newAlignment: AlignmentResult = {
-              status: 'completed',
-              utterances: r?.utterances?.map(u => ({
-                text: u.text,
-                start_time: u.startTime,
-                end_time: u.endTime,
-              })),
-              duration: r?.duration,
-            };
-            setAlignment(newAlignment);
-            useProjectStore.setState({ subtitleAlignment: newAlignment });
-            saveMetaToLocalFile({ subtitleAlignment: newAlignment });
-            setIsAligning(false);
-          },
-          onError: (error: string) => {
-            const newAlignment: AlignmentResult = {
-              status: 'failed',
-              errorMessage: error,
-            };
-            setAlignment(newAlignment);
-            useProjectStore.setState({ subtitleAlignment: newAlignment });
-            saveMetaToLocalFile({ subtitleAlignment: newAlignment });
-            setIsAligning(false);
-          },
-        },
-      }
-    );
-
-    if (!taskId) {
-      setIsAligning(false);
-    }
-  };
-
  // 压制字幕：单次压制全局字幕到成品视频
  const handleBurn = async () => {
    if (!projectId) {
      toast.error('项目ID不存在');
      return;
    }
-    if (!lipSyncedVideoPath) {
+    if (!actualVideoPath) {
      toast.error('成品视频不存在');
      return;
    }
@@ -265,7 +186,7 @@ export default function SubtitleBurning() {
      const outputPath = outputRes.data;

      // 4. 解析视频路径
-      const resolvedVideoPath = await resolveHostPath(lipSyncedVideoPath);
+      const resolvedVideoPath = await resolveHostPath(actualVideoPath);

      // 5. 调用 Rust 压制字幕
      const burnResult = await invoke<{ code: number; data?: string; message: string }>('burn_subtitle', {
@@ -293,48 +214,10 @@ export default function SubtitleBurning() {
    }
  };

-  // 打轴状态文本
-  const alignmentStatusText = (() => {
-    if (!alignment) return '未打轴';
-    switch (alignment.status) {
-      case 'pending': return '待打轴';
-      case 'aligning': return '打轴中...';
-      case 'completed': return '✓ 已打轴';
-      case 'failed': return '✗ 打轴失败';
-      default: return '未知';
-    }
-  })();
-
  return (
    <div className="step-layout subtitle-burning">
      {/* 左侧操作区 */}
      <div className="step-panel-left">
-        {/* 打轴区 */}
-        <div className="panel-section">
-          <div className="panel-header">
-            <label className="panel-label">字幕打轴</label>
-            <div className="panel-actions">
-              <button
-                className="btn btn-primary btn-sm"
-                onClick={handleAlign}
-                disabled={isAligning || isBurning || !lipSyncedVideoUrl}
-              >
-                {isAligning ? '打轴中...' : '字幕打轴'}
-              </button>
-            </div>
-          </div>
-          <div className="shot-list-hint">
-            <span className={alignment?.status === 'failed' ? 'text-error' : ''}>
-              {alignmentStatusText}
-            </span>
-            {alignment?.status === 'failed' && alignment.errorMessage && (
-              <span style={{ color: 'var(--error)', marginLeft: 'var(--spacing-sm)', fontSize: 'var(--font-xs)' }}>
-                {alignment.errorMessage}
-              </span>
-            )}
-          </div>
-        </div>
-
        {/* 字幕样式设置 */}
        <div className="style-section">
          <label className="panel-label">字幕样式</label>
@@ -386,7 +269,7 @@ export default function SubtitleBurning() {
        <button
          className="btn btn-primary burn-btn"
          onClick={handleBurn}
-          disabled={isAligning || isBurning || alignment?.status !== 'completed'}
+          disabled={isBurning || !alignment?.utterances?.length}
        >
          {isBurning ? '压制中...' : '压制字幕'}
        </button>
@@ -409,14 +292,15 @@ export default function SubtitleBurning() {
            ) : (
              <>
                <video
+                  key={loadedVideoUrl}
                  ref={videoRef}
                  src={loadedVideoUrl}
                  className="preview-video"
                  controls
-                  autoPlay={false}
+                  autoPlay
                />
-                {/* 未打轴时显示提示遮罩 */}
-                {alignment?.status !== 'completed' && (
+                {/* 无打轴数据时显示提示遮罩 */}
+                {!alignment?.utterances?.length && (
                  <div
                    style={{
                      position: 'absolute',
@@ -440,8 +324,8 @@ export default function SubtitleBurning() {
                      <circle cx="12" cy="12" r="10" />
                      <path d="M12 6v6l4 2" />
                    </svg>
-                    <p style={{ fontSize: 'var(--font-md)', fontWeight: 600, lineHeight: 1.4, color: '#fff' }}>字幕预览需要先完成打轴</p>
-                    <p style={{ fontSize: 'var(--font-sm)', color: 'rgba(255,255,255,0.75)', marginTop: 'var(--spacing-xs)' }}>请点击左侧「字幕打轴」按钮</p>
+                    <p style={{ fontSize: 'var(--font-md)', fontWeight: 600, lineHeight: 1.4, color: '#fff' }}>字幕数据尚未就绪</p>
+                    <p style={{ fontSize: 'var(--font-sm)', color: 'rgba(255,255,255,0.75)', marginTop: 'var(--spacing-xs)' }}>请返回第2步重新生成配音</p>
                  </div>
                )}
                {/* 字幕由 ASS.js 自动渲染在 video 上方的 canvas 层 */}
@@ -6,31 +6,64 @@ import { toast } from '../../store/uiStore';
 import { useProjectStore, saveMetaToLocalFile, type ProjectState } from '../../store';
 import { useProgressStore } from '../../store/progressStore';
 import { useVideoGeneration } from '../../hooks/useVideoGeneration';
+import { useLocalVideo } from '../../hooks/useLocalVideo';
 import { matchMaterial } from '../../api/modules/materials';

 import {
  composeVideo,
  uploadVideoFile,
-  uploadImageFile,
  downloadFile,
  type ComposeSegment,
 } from '../../api/modules/videoCompose';
-import { submitLipSync, queryLipSyncTask } from '../../api/modules/vidu';
+import { submitLipSync } from '../../api/modules/vidu';
 import { invoke } from '@tauri-apps/api/core';

 import { getCurrentProjectId, localProjectApi } from '../../api/modules/localStorage';
 import './VideoGeneration.css';

-
+/**
+ * 统一填补法：计算每段在全局时间轴上的播放区间
+ * - 分镜（segment）：严格 [audioStartTime, audioEndTime]
+ * - 空镜（empty_shot）：
+ *   - assignedStart = 上一段.audioEndTime（首段为 0）
+ *   - assignedEnd = 下一段是空镜 ? 自己.audioEndTime : 下一段.audioStartTime
+ */
+function computeAssignedIntervals(
+  segs: Array<{ id: number | string; type: string; audioStartTime?: number; audioEndTime?: number; actualDuration?: number; duration?: string | number }>
+) {
+  const result: Record<string | number, { assignedStart: number; assignedEnd: number }> = {};
+  for (let i = 0; i < segs.length; i++) {
+    const seg = segs[i];
+    const prev = i > 0 ? segs[i - 1] : null;
+    const next = i < segs.length - 1 ? segs[i + 1] : null;
+    if (seg.type === 'segment') {
+      result[seg.id] = {
+        assignedStart: seg.audioStartTime ?? 0,
+        assignedEnd: seg.audioEndTime ?? ((seg.actualDuration ?? 5) * 1000),
+      };
+    } else {
+      const assignedStart = prev ? (prev.audioEndTime ?? 0) : 0;
+      let assignedEnd: number;
+      if (next && next.type === 'empty_shot') {
+        assignedEnd = seg.audioEndTime ?? ((seg.actualDuration ?? 5) * 1000);
+      } else if (next) {
+        assignedEnd = next.audioStartTime ?? (seg.audioEndTime ?? assignedStart);
+      } else {
+        assignedEnd = seg.audioEndTime ?? assignedStart;
+      }
+      result[seg.id] = { assignedStart, assignedEnd };
+    }
+  }
+  return result;
+}

 export default function VideoGeneration() {
  const segments = useProjectStore(state => state.segments);
  const selectedElementId = useProjectStore(state => state.selectedElementId);
  const selectedVoiceId = useProjectStore(state => state.selectedVoiceId);
  const dubbingAudioUrl = useProjectStore(state => state.dubbingAudioUrl);
-  const lipSyncTaskId = useProjectStore(state => state.lipSyncTaskId);
-  const lipSyncState = useProjectStore(state => state.lipSyncState);
-  const lipSyncedVideoUrl = useProjectStore(state => state.lipSyncedVideoUrl);
+  const composedVideoPath = useProjectStore(state => state.composedVideoPath);
+  const { videoUrl: composedVideoBlobUrl } = useLocalVideo(composedVideoPath);
  const updateSegment = useProjectStore(state => state.updateSegment);
  const projectId = getCurrentProjectId();

@@ -69,9 +102,6 @@ export default function VideoGeneration() {
            dubbingAudioUrl: meta.dubbingAudioUrl ?? undefined,
            composedVideoUrl: meta.composedVideoUrl ?? undefined,
            composedVideoPath: meta.composedVideoPath ?? undefined,
-            lipSyncTaskId: meta.lipSyncTaskId ?? undefined,
-            lipSyncState: meta.lipSyncState ?? undefined,
-            lipSyncedVideoUrl: meta.lipSyncedVideoUrl ?? undefined,
            lipSyncedVideoPath: meta.lipSyncedVideoPath ?? undefined,
          };

@@ -125,76 +155,15 @@ export default function VideoGeneration() {



-  // Vidu 对口型任务轮询
-  useEffect(() => {
-    if (!lipSyncTaskId || !projectId) return;
-    if (lipSyncState === 'succeeded' || lipSyncState === 'failed') return;
-
-    let canceled = false;
-    const interval = setInterval(async () => {
-      if (canceled) return;
-      try {
-        const result = await queryLipSyncTask(lipSyncTaskId);
-        console.log('[VideoGeneration] 轮询状态:', result.state);
-
-        if (result.state === 'succeeded') {
-          clearInterval(interval);
-          const videoUrl = result.videoUrl ?? result.creations?.[0]?.url;
-          if (!videoUrl) {
-            console.error('[VideoGeneration] 对口型任务成功但无视频 URL');
-            useProjectStore.setState({ lipSyncState: 'failed' });
-            await saveMetaToLocalFile({ lipSyncState: 'failed' });
-            toast.error('对口型视频 URL 为空');
-            return;
-          }
-
-          // 1. 获取本地保存路径（通过 IPC，自动处理路径展开）
-          const pathRes = await invoke<{ code: number; data?: string; message: string }>('get_video_save_path', {
-            projectId,
-            filename: `lip_synced_${Date.now()}.mp4`,
-          });
-          if (pathRes.code !== 200 || !pathRes.data) {
-            throw new Error(pathRes.message || '获取保存路径失败');
-          }
-          const localPath = pathRes.data;
-
-          // 2. 下载 Vidu 视频到本地
-          await downloadFile(videoUrl, localPath);
-          console.log('[VideoGeneration] 对口型视频下载完成:', localPath);
-
-          // 3. 上传七牛云
-          const qiniuUrl = await uploadVideoFile(localPath);
-          console.log('[VideoGeneration] 对口型视频上传完成:', qiniuUrl);
-
-          // 4. 更新 store 和 meta.json
-          useProjectStore.setState({
-            lipSyncState: 'succeeded',
-            lipSyncedVideoUrl: qiniuUrl,
-            lipSyncedVideoPath: localPath,
-          });
-          await saveMetaToLocalFile({
-            lipSyncedVideoPath: localPath,
-            lipSyncedVideoUrl: qiniuUrl,
-            lipSyncState: 'succeeded',
-          });
-          toast.success('对口型视频生成完成');
-        } else if (result.state === 'failed') {
-          clearInterval(interval);
-          useProjectStore.setState({ lipSyncState: 'failed' });
-          await saveMetaToLocalFile({ lipSyncState: 'failed' });
-          toast.error('对口型生成失败');
-        }
-        // pending/processing: 继续轮询
-      } catch (e) {
-        console.error('[VideoGeneration] 轮询失败:', e);
-      }
-    }, 3000); // 3 秒轮询
-
-    return () => {
-      canceled = true;
-      clearInterval(interval);
-    };
-  }, [lipSyncTaskId, lipSyncState, projectId]);
+  // TODO: Vidu 对口型任务轮询（待回调通知上线后处理）
+  // 当前逻辑：每个 segment 单独提交对口型任务，taskId 保存在 segment.lipSyncTaskId 中
+  // 上线后：收到 Vidu 回调通知时，根据 taskId 找到对应 segment，
+  // 下载对口型后的视频，再根据 audioStartTime/audioEndTime 替换到合成视频的对应位置
+  // useEffect(() => {
+  //   const segmentShots = shots.filter(s => s.type === 'segment' && s.lipSyncTaskId && s.lipSyncState !== 'succeeded' && s.lipSyncState !== 'failed');
+  //   if (segmentShots.length === 0 || !projectId) return;
+  //   // 轮询所有未完成的任务...
+  // }, [shots, projectId]);

  // 自动匹配空镜素材（调用后端接口）
  // 同步 activeScene 与 shots 数据
@@ -252,15 +221,20 @@ export default function VideoGeneration() {
    async function doMatch() {
      const newMap: Record<string, { url: string; duration: number } | null> = {};
      const usedUrls: string[] = [];
+      const assignedMap = computeAssignedIntervals(shots);
      // 串行匹配避免并发导致重复（同分类素材尽量分散）
      for (const shot of emptyShots) {
        if (canceled) break;
-        const duration =
-          typeof shot.duration === 'number'
-            ? shot.duration
-            : parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5;
+        const assigned = assignedMap[shot.id];
+        const requiredDuration = assigned
+          ? (assigned.assignedEnd - assigned.assignedStart) / 1000
+          : (shot.actualDuration ?? (
+              typeof shot.duration === 'number'
+                ? shot.duration
+                : parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5
+            ));
        try {
-          const result = await matchMaterial(shot.scene || '', duration, usedUrls);
+          const result = await matchMaterial(shot.scene || '', requiredDuration, usedUrls);
          newMap[String(shot.id)] = result;
          if (result) usedUrls.push(result.url);
        } catch (err) {
@@ -409,10 +383,15 @@ export default function VideoGeneration() {
    const shot = shots.find((s) => String(s.id) === shotId);
    if (!shot) return;

-    const duration =
-      typeof shot.duration === 'number'
-        ? shot.duration
-        : parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5;
+    const assignedMap = computeAssignedIntervals(shots);
+    const assigned = assignedMap[shot.id];
+    const requiredDuration = assigned
+      ? (assigned.assignedEnd - assigned.assignedStart) / 1000
+      : (shot.actualDuration ?? (
+          typeof shot.duration === 'number'
+            ? shot.duration
+            : parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5
+        ));

    const currentUrl = materialMatchMap[shotId]?.url;

@@ -423,7 +402,7 @@ export default function VideoGeneration() {
    if (currentUrl) usedUrls.push(currentUrl);

    try {
-      const result = await matchMaterial(shot.scene || '', duration, usedUrls, true);
+      const result = await matchMaterial(shot.scene || '', requiredDuration, usedUrls, true);
      setMaterialMatchMap((prev) => ({
        ...prev,
        [shotId]: result || prev[shotId],
@@ -516,17 +495,86 @@ export default function VideoGeneration() {
    progress.show('视频生成');

    try {
-      // Step 1: 合成视频（裁剪 + 拼接）
-      progress.update('正在处理视频素材...');
-
      // 人物视频总时长，用于计算随机截取起始点
      const avatarDuration = selectedAvatarMaterial.duration;

-      const segments: ComposeSegment[] = shots.map((shot) => {
-        const duration =
+      // ========== Step 1: 对每个 segment 提交对口型任务 ==========
+      const segmentShots = shots.filter((s) => s.type === 'segment');
+
+      for (let i = 0; i < segmentShots.length; i++) {
+        const shot = segmentShots[i];
+        progress.update(`正在提交对口型任务 (${i + 1}/${segmentShots.length})...`);
+
+        const duration = shot.actualDuration ?? (
          typeof shot.duration === 'number'
            ? shot.duration
-            : parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5;
+            : parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5
+        );
+
+        // 1a. 获取切割后视频保存路径
+        const pathRes = await invoke<{ code: number; data?: string; message: string }>(
+          'get_video_save_path',
+          { projectId, filename: `segment_clip_${shot.id}_${Date.now()}.mp4` }
+        );
+        if (pathRes.code !== 200 || !pathRes.data) {
+          throw new Error(pathRes.message || '获取保存路径失败');
+        }
+        const clipPath = pathRes.data;
+
+        // 1b. 从人物素材中随机截取对应时长的片段
+        const maxStart = Math.max(0, avatarDuration - duration);
+        const startTime = maxStart > 0 ? Math.random() * maxStart : 0;
+
+        const clipRes = await invoke<{ code: number; message: string }>('extract_video_segment', {
+          args: {
+            inputPath: selectedAvatarMaterial.path,
+            start: startTime,
+            duration,
+            outputPath: clipPath,
+          },
+        });
+        if (clipRes.code !== 200) {
+          throw new Error(`截取视频片段失败: ${clipRes.message}`);
+        }
+
+        // 1c. 上传切割后的视频到七牛云
+        const clipUrl = await uploadVideoFile(clipPath);
+        console.log(`[VideoGeneration] Segment ${shot.id} 视频上传完成:`, clipUrl);
+
+        // 1d. 提交对口型任务（仅当该分镜有 clipAudioUrl 时）
+        if (!shot.clipAudioUrl) {
+          console.warn(`[VideoGeneration] Segment ${shot.id} 无 clipAudioUrl，跳过对口型`);
+          continue;
+        }
+
+        const lipSyncRes = await submitLipSync({
+          videoUrl: clipUrl,
+          audioUrl: shot.clipAudioUrl,
+        });
+        console.log(`[VideoGeneration] Segment ${shot.id} 对口型任务提交成功:`, lipSyncRes.taskId);
+
+        // 1e. 保存 taskId 到 segment
+        updateSegment(shot.id, { lipSyncTaskId: lipSyncRes.taskId, lipSyncState: 'processing' });
+      }
+
+      // 保存 segments.json（包含 lipSyncTaskId）
+      const currentSegments = useProjectStore.getState().segments;
+      await localProjectApi.saveSegments(projectId, currentSegments);
+
+      // ========== Step 2: 拼接所有片段 ==========
+      progress.update('正在拼接视频...');
+
+      const assignedMap = computeAssignedIntervals(shots);
+
+      const composeSegments: ComposeSegment[] = shots.map((shot) => {
+        const assigned = assignedMap[shot.id];
+        const duration = assigned
+          ? (assigned.assignedEnd - assigned.assignedStart) / 1000
+          : (shot.actualDuration ?? (
+              typeof shot.duration === 'number'
+                ? shot.duration
+                : parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5
+            ));

        if (shot.type === 'empty_shot') {
          const matched = materialMatchMap[String(shot.id)];
@@ -538,102 +586,81 @@ export default function VideoGeneration() {
            startTime: 0,
          };
        } else {
-          // 人物出镜：从人物视频中随机截取，确保不超出素材时长
-          const maxStart = Math.max(0, avatarDuration - duration);
+          const segDuration = shot.actualDuration ?? (
+            typeof shot.duration === 'number'
+              ? shot.duration
+              : parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5
+          );
+          const maxStart = Math.max(0, avatarDuration - segDuration);
          const startTime = maxStart > 0 ? Math.random() * maxStart : 0;
          return {
            id: String(shot.id),
            type: 'segment' as const,
-            duration,
+            duration: segDuration,
            source: selectedAvatarMaterial.path,
            startTime,
          };
        }
      });

-      const composeResult = await composeVideo(projectId, segments);
-      console.log('[VideoGeneration] 视频合成完成:', composeResult.outputPath);
+      const composeResult = await composeVideo(projectId, composeSegments);
+      console.log('[VideoGeneration] 视频拼接完成:', composeResult.outputPath);

-      // Step 2: 上传拼接视频到七牛云
-      progress.update('正在智能拼接视频...');
-      const composedVideoUrl = await uploadVideoFile(composeResult.outputPath);
-      console.log('[VideoGeneration] 视频上传完成:', composedVideoUrl);
+      // ========== Step 3: 将完整音频合成到拼接视频中 ==========
+      let finalVideoPath = composeResult.outputPath;

-      // 保存拼接视频的本地路径和七牛云 URL 到 meta.json
-      // 注意：拼接视频是中间产物，不放入 products 目录（products 只存 Step 6 最终成品）
-      await saveMetaToLocalFile({
-        composedVideoUrl,
-        composedVideoPath: composeResult.outputPath,
-      });
+      if (dubbingAudioUrl) {
+        progress.update('正在合成音频...');

-      // Step 3: 截取人物素材首帧并上传，作为 Vidu 对口型的 ref_photo_url
-      let refPhotoUrl: string | undefined;
-      console.log('[VideoGeneration] selectedAvatarMaterial:', selectedAvatarMaterial);
-      if (selectedAvatarMaterial) {
-        try {
-          progress.update('正在提取人物参考图...');
-          // 获取首帧保存路径
-          const framePathRes = await invoke<{ code: number; data?: string; message: string }>(
-            'get_image_save_path',
-            {
-              projectId,
-              filename: `ref_frame_${Date.now()}.jpg`,
-            }
-          );
-          console.log('[VideoGeneration] get_image_save_path result:', framePathRes);
-          if (framePathRes.code === 200 && framePathRes.data) {
-            const framePath = framePathRes.data;
-            console.log('[VideoGeneration] 首帧保存路径:', framePath);
-            // 截取首帧
-            const extractRes = await invoke<{ code: number; data?: string; message: string }>(
-              'extract_video_first_frame',
-              {
-                request: {
-                  video_path: selectedAvatarMaterial.path,
-                  output_path: framePath,
-                },
-              }
-            );
-            console.log('[VideoGeneration] extract_video_first_frame result:', extractRes);
-            if (extractRes.code === 200) {
-              console.log('[VideoGeneration] 首帧提取成功，开始上传...');
-              // 上传首帧到七牛云
-              refPhotoUrl = await uploadImageFile(framePath);
-              console.log('[VideoGeneration] 首帧上传完成:', refPhotoUrl);
-            } else {
-              console.warn('[VideoGeneration] 首帧提取失败:', extractRes.message);
-            }
-          } else {
-            console.warn('[VideoGeneration] 获取首帧保存路径失败:', framePathRes?.message);
-          }
-        } catch (err) {
-          console.warn('[VideoGeneration] 首帧提取/上传失败，继续提交对口型任务:', err);
-          // 失败则忽略 refPhotoUrl，继续提交
+        // 下载配音音频到本地
+        const audioPathRes = await invoke<{ code: number; data?: string; message: string }>(
+          'get_video_save_path',
+          { projectId, filename: `dubbing_${Date.now()}.mp3` }
+        );
+        if (audioPathRes.code !== 200 || !audioPathRes.data) {
+          throw new Error(audioPathRes.message || '获取音频保存路径失败');
        }
-      } else {
-        console.log('[VideoGeneration] 未选择人物素材，跳过首帧提取');
+        const audioLocalPath = audioPathRes.data;
+        await downloadFile(dubbingAudioUrl, audioLocalPath);
+
+        // 获取最终输出路径
+        const outputPathRes = await invoke<{ code: number; data?: string; message: string }>(
+          'get_video_save_path',
+          { projectId, filename: `composed_with_audio_${Date.now()}.mp4` }
+        );
+        if (outputPathRes.code !== 200 || !outputPathRes.data) {
+          throw new Error(outputPathRes.message || '获取输出路径失败');
+        }
+        const composedWithAudioPath = outputPathRes.data;
+
+        // 替换音频轨道
+        const replaceRes = await invoke<{ code: number; message: string }>('replace_audio_track', {
+          request: {
+            video_path: composeResult.outputPath,
+            audio_path: audioLocalPath,
+            output_path: composedWithAudioPath,
+          },
+        });
+        if (replaceRes.code !== 200) {
+          throw new Error(`音频替换失败: ${replaceRes.message}`);
+        }
+
+        finalVideoPath = composedWithAudioPath;
+        console.log('[VideoGeneration] 音频合成完成:', finalVideoPath);
      }

-      // Step 4: 提交 Vidu 对口型任务
-      if (!dubbingAudioUrl) {
-        throw new Error('请先完成语音配音步骤');
-      }
-
-      progress.update('正在合成视频...');
-      const lipSyncRes = await submitLipSync({
-        videoUrl: composedVideoUrl,
-        audioUrl: dubbingAudioUrl,
-        refPhotoUrl,
+      // 保存最终结果到 meta.json
+      await saveMetaToLocalFile({
+        composedVideoPath: finalVideoPath,
+        currentStep: 3,
      });
-      console.log('[VideoGeneration] 对口型任务提交成功:', lipSyncRes.taskId);

-      // 保存对口型任务 ID 和状态到 meta.json
-      await saveMetaToLocalFile({ lipSyncTaskId: lipSyncRes.taskId, lipSyncState: 'processing' });
+      // TODO: 对口型结果替换（待 Vidu 回调通知上线后处理）
+      // 逻辑：收到回调通知后，前端获取对口型后的视频 URL，
+      // 根据 segment 的 audioStartTime / audioEndTime 确定时间位置和时间范围，
+      // 使用 FFmpeg 将合成视频中对应位置的视频替换为对口型后的视频。

-      // 保存当前步骤进度（finalVideoPath 只在 Step 6 最终合成时写入）
-      await saveMetaToLocalFile({ currentStep: 3 });
-
-      progress.success('视频生成任务已提交');
+      progress.success('视频生成完成');
    } catch (error) {
      const msg = error instanceof Error ? error.message : '视频生成失败';
      console.error('[VideoGeneration] 生成失败:', error);
@@ -1046,13 +1073,13 @@ export default function VideoGeneration() {
                strokeLinecap="round"
                strokeLinejoin="round"
              >
-                {lipSyncedVideoUrl ? (
+                {composedVideoPath ? (
                  <path d="M23 4v6h-6M1 20v-6h6M3.51 9a9 9 0 0114.85-3.36L23 10M1 14l4.64 4.36A9 9 0 0020.49 15" />
                ) : (
                  <polygon points="5 3 19 12 5 21 5 3" />
                )}
              </svg>
-              {lipSyncedVideoUrl ? '重新生成' : '生成视频'}
+              {composedVideoPath ? '重新生成' : '生成视频'}
            </>
          )}
        </button>
@@ -1064,10 +1091,10 @@ export default function VideoGeneration() {
          <div className="video-preview-header">视频预览</div>
          <div className="video-preview-wrapper">
            <div className="video-preview-container">
-            {lipSyncedVideoUrl ? (
+            {composedVideoBlobUrl ? (
              <video
-                key={lipSyncedVideoUrl}
-                src={lipSyncedVideoUrl}
+                key={composedVideoBlobUrl}
+                src={composedVideoBlobUrl}
                className="preview-video"
                controls
                autoPlay
@@ -10,29 +10,34 @@ import { useProjectStore } from '../../store';
 import { useVoiceStore } from '../../store/voiceStore';
 import { getCurrentProjectId } from '../../api/modules/localStorage';
 import { saveMetaToLocalFile } from '../../store/projectStore';
-import { synthesizeTTS, saveAudio, uploadAudio } from '../../api/modules/voice';
+import { synthesizeTTS, saveAudio, uploadAudio, extractAudioSegment } from '../../api/modules/voice';
 import { toast } from '../../store/uiStore';
+import type { AlignmentResult } from '../../api/types';
 import { useProgressStore } from '../../store/progressStore';
+import { autoAlignCaption } from '../../api/modules/caption';
+import { matchSegmentsToUtterances } from '../../utils/audioAlign';
+import { uploadAudioFile } from '../../api/modules/videoCompose';
+import { localProjectApi } from '../../api/modules/localStorage';
 import './VoiceDubbing.css';

 export default function VoiceDubbing() {
  const projectId = getCurrentProjectId();
  const segments = useProjectStore(state => state.segments);
  const updateSegment = useProjectStore(state => state.updateSegment);
+  const selectedVoiceId = useProjectStore(state => state.selectedVoiceId);
+  const speed = useProjectStore(state => state.voiceSpeed);
+  const volume = useProjectStore(state => state.voiceVolume);
+  const pitch = useProjectStore(state => state.voicePitch);
+  const setSelectedVoiceId = useProjectStore(state => state.setSelectedVoiceId);
+  const setSpeed = useProjectStore(state => state.setVoiceSpeed);
+  const setVolume = useProjectStore(state => state.setVoiceVolume);
+  const setPitch = useProjectStore(state => state.setVoicePitch);

  const {
    presetVoices,
    voiceMaterials,
-    selectedVoiceId,
-    speed,
-    volume,
-    pitch,
    loadPresetVoices,
    loadVoiceMaterials,
-    setSelectedVoiceId,
-    setSpeed,
-    setVolume,
-    setPitch,
    loadProjectAudios,
    setAudioMapping,
  } = useVoiceStore();
@@ -95,23 +100,141 @@ export default function VoiceDubbing() {
    setPlayingVoiceId(voiceId);
  }, [playingVoiceId]);

+  const handleAlignAndClip = useCallback(async (
+    dubbingAudioUrl: string,
+    dubbingAudioPath: string
+  ) => {
+    if (!projectId) return;
+    const progress = useProgressStore.getState();
+
+    try {
+      // 1. 拼接完整文本用于打轴（需与 TTS 文本顺序完全一致，包括 empty_shot）
+      const fullText = segments
+        .filter(s => s.voiceover?.trim())
+        .map(s => s.voiceover!.trim())
+        .join('\n');
+      if (!fullText) return;
+
+      // 2. 字幕打轴
+      progress.update('正在处理字幕...');
+      const alignResult = await autoAlignCaption(dubbingAudioUrl, fullText);
+      if (!alignResult.utterances?.length) {
+        console.warn('[VoiceDubbing] 打轴返回空结果');
+        progress.error('字幕处理异常');
+        return;
+      }
+
+      // 3. 文本匹配
+      const matchSegments = segments
+        .filter(s => s.voiceover?.trim())
+        .map(s => ({ id: s.id, voiceover: s.voiceover || '' }));
+      const matched = matchSegmentsToUtterances(matchSegments, alignResult.utterances);
+      if (!matched.length) {
+        console.warn('[VoiceDubbing] 文本匹配无结果');
+        progress.error('音频对齐失败');
+        return;
+      }
+
+      // 4. 截取音频片段并上传
+      progress.update('正在整理音频...');
+      const audiosDir = dubbingAudioPath.replace(/\\/g, '/').split('/').slice(0, -1).join('/');
+
+      for (const m of matched) {
+        const seg = segments.find(s => s.id === m.segmentId);
+        if (!seg) continue;
+
+        // 所有分镜都保存实际时间信息（供后续步骤精确匹配素材）
+        updateSegment(m.segmentId, {
+          audioStartTime: m.startTime,
+          audioEndTime: m.endTime,
+          actualDuration: m.actualDuration,
+        });
+
+        // 只截取【分镜】类型的音频片段，空镜跳过
+        if (seg.type !== 'segment') continue;
+
+        const outputPath = `${audiosDir}/segment_${m.segmentId}.mp3`;
+
+        try {
+          await extractAudioSegment({
+            inputPath: dubbingAudioPath,
+            start: m.startTime / 1000,
+            duration: m.actualDuration,
+            outputPath,
+          });
+
+          const clipUrl = await uploadAudioFile(outputPath);
+
+          updateSegment(m.segmentId, {
+            clipAudioPath: outputPath,
+            clipAudioUrl: clipUrl,
+          });
+        } catch (e) {
+          console.error(`[VoiceDubbing] Segment ${m.segmentId} 截取/上传失败:`, e);
+        }
+      }
+
+      // 5. 保存 segments.json
+      const currentSegments = useProjectStore.getState().segments;
+      await localProjectApi.saveSegments(projectId, currentSegments);
+
+      // 6. 保存字幕打轴结果到 meta，供 Step 4 直接复用
+      const subtitleAlignment: AlignmentResult = {
+        status: 'completed',
+        utterances: alignResult.utterances.map(u => ({
+          text: u.text,
+          start_time: u.startTime,
+          end_time: u.endTime,
+        })),
+        duration: alignResult.duration,
+      };
+      useProjectStore.setState({ subtitleAlignment });
+      await saveMetaToLocalFile({ subtitleAlignment });
+
+      progress.success('音频处理完成');
+    } catch (err) {
+      console.error('[VoiceDubbing] 打轴截取流程失败:', err);
+      progress.error(err instanceof Error ? err.message : '音频处理失败');
+    }
+  }, [projectId, segments, updateSegment]);
+
  const handleGenerate = useCallback(async () => {
    if (!projectId) { toast.warning('请先创建项目'); return; }
-    const realText = segments.map(s => s.voiceover?.trim()).filter(Boolean).join('\n');
+    // 拼接 TTS 文本，根据镜头切换类型插入停顿标记
+    const realText = segments
+      .filter(s => s.voiceover?.trim())
+      .map((s, i, arr) => {
+        const text = s.voiceover!.trim();
+        if (i === arr.length - 1) return text;
+        const next = arr[i + 1];
+        // segment ↔ empty_shot 切换：长停顿，让观众看清画面
+        if (s.type !== next?.type) {
+          return text + '<#0.5#>';
+        }
+        // 同类型之间（segment→segment / empty_shot→empty_shot）：短停顿，保持节奏
+        return text + '<#0.3#>';
+      })
+      .join('\n');
    if (!realText) { toast.warning('没有需要合成的旁白文本'); return; }
-    // Kling TTS 限制单次 ≤1000 字，超长自动截断
-    const truncatedText = realText.length > 1000 ? realText.slice(0, 1000) : realText;
+    // Vidu TTS 限制单次 ≤10000 字符，超长自动截断
+    const truncatedText = realText.length > 10000 ? realText.slice(0, 10000) : realText;
+
+    // 直接从 store 获取最新音色，避免闭包捕获旧值
+    const currentVoiceId = useProjectStore.getState().selectedVoiceId;
+    const currentSpeed = useProjectStore.getState().voiceSpeed;
+    const currentVolume = useProjectStore.getState().voiceVolume;
+    const currentPitch = useProjectStore.getState().voicePitch;

    const progress = useProgressStore.getState();
    setIsGenerating(true);
    progress.show('生成配音');

    try {
-      progress.update('正在合成语音...');
-      const result = await synthesizeTTS({ text: truncatedText, voiceId: selectedVoiceId, speed, volume, pitch });
+      progress.update('正在生成配音...');
+      const result = await synthesizeTTS({ text: truncatedText, voiceId: currentVoiceId, speed: currentSpeed, volume: currentVolume, pitch: currentPitch });
      if (!result.audioUrl) throw new Error('未返回音频 URL');

-      progress.update('正在保存音频...');
+      progress.update('正在处理音频...');
      // 下载音频 blob
      const response = await fetch(result.audioUrl);
      if (!response.ok) throw new Error('下载音频失败');
@@ -135,7 +258,7 @@ export default function VoiceDubbing() {
      const audioId = `voice_${Date.now()}`;
      const meta = await saveAudio({
        projectId, audioId, audioData: base64,
-        name: `配音-${segments.length}段`, voiceId: selectedVoiceId, duration: 0,
+        name: `配音-${segments.length}段`, voiceId: currentVoiceId || 'tianxin_xiaoling', duration: 0,
        skipList: true,
      });

@@ -143,12 +266,12 @@ export default function VoiceDubbing() {
      useProjectStore.setState({
        dubbingAudioUrl: qiniuUrl,
        dubbingAudioPath: meta.filePath,
-        dubbingVoiceId: selectedVoiceId,
+        dubbingVoiceId: currentVoiceId,
      });
      await saveMetaToLocalFile({
        dubbingAudioUrl: qiniuUrl,
        dubbingAudioPath: meta.filePath,
-        dubbingVoiceId: selectedVoiceId,
+        dubbingVoiceId: currentVoiceId,
      });

      // 配音音频是项目级别的，不写入各分镜
@@ -160,13 +283,17 @@ export default function VoiceDubbing() {
      }

      setGeneratedAudioUrl(qiniuUrl);
-      progress.success('配音生成完成');
+
+      // 生成完成后自动执行打轴+截取
+      await handleAlignAndClip(qiniuUrl, meta.filePath);
+
+      progress.success('配音已就绪');
    } catch (err) {
      progress.error(err instanceof Error ? err.message : '生成失败');
    } finally {
      setIsGenerating(false);
    }
-  }, [projectId, segments, selectedVoiceId, speed, volume, pitch, setAudioMapping, updateSegment]);
+  }, [projectId, segments, setAudioMapping, updateSegment, handleAlignAndClip]);

  const handleToggleGeneratedAudio = useCallback(() => {
    if (!generatedAudioUrl) return;
@@ -215,7 +342,7 @@ export default function VoiceDubbing() {
            {activeVoiceTab === 'preset' && (
              <div className="voice-list">
                {presetVoices.map(v => (
-                  <div key={v.voiceId} className={`voice-row ${v.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => setSelectedVoiceId(v.voiceId)}>
+                  <div key={v.voiceId} className={`voice-row ${v.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => { setSelectedVoiceId(v.voiceId); saveMetaToLocalFile({ selectedVoiceId: v.voiceId }); }}>
                    <div className="voice-row-main">
                      <div className="voice-row-info">
                        <div className="voice-row-name">
@@ -238,7 +365,7 @@ export default function VoiceDubbing() {
                  <div className="voice-empty">暂无私有音色<br /><small>去素材库上传音频并克隆音色</small></div>
                ) : (
                  voiceMaterials.filter(m => m.status === 'ready').map(m => (
-                    <div key={m.voiceId} className={`voice-row ${m.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => setSelectedVoiceId(m.voiceId)}>
+                    <div key={m.voiceId} className={`voice-row ${m.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => { setSelectedVoiceId(m.voiceId); saveMetaToLocalFile({ selectedVoiceId: m.voiceId }); }}>
                      <div className="voice-row-main">
                        <div className="voice-row-info">
                          <div className="voice-row-name">
@@ -274,7 +401,7 @@ export default function VoiceDubbing() {
                max={20}
                step={1}
                value={Math.round(speed * 10)}
-                onChange={e => setSpeed(parseInt(e.target.value) / 10)}
+                onChange={e => { const v = parseInt(e.target.value) / 10; setSpeed(v); saveMetaToLocalFile({ voiceSpeed: v }); }}
                style={{ '--slider-percent': `${((Math.round(speed * 10) - 5) / 15) * 100}%` } as React.CSSProperties}
              />
              <span>2.0x</span>
@@ -296,7 +423,7 @@ export default function VoiceDubbing() {
                max={10}
                step={1}
                value={volume}
-                onChange={e => setVolume(parseInt(e.target.value))}
+                onChange={e => { const v = parseInt(e.target.value); setVolume(v); saveMetaToLocalFile({ voiceVolume: v }); }}
                style={{ '--slider-percent': `${(volume / 10) * 100}%` } as React.CSSProperties}
              />
              <span>10</span>
@@ -318,7 +445,7 @@ export default function VoiceDubbing() {
                max={12}
                step={1}
                value={pitch}
-                onChange={e => setPitch(parseInt(e.target.value))}
+                onChange={e => { const v = parseInt(e.target.value); setPitch(v); saveMetaToLocalFile({ voicePitch: v }); }}
                style={{ '--slider-percent': `${((pitch + 12) / 24) * 100}%` } as React.CSSProperties}
              />
              <span>12</span>
@@ -35,7 +35,7 @@ function VideoCreationContent() {
  // 直接订阅 segments，避免 project getter 无法追踪的问题
  const segments = useProjectStore(state => state.segments);
  const isLoading = useProjectStore(state => state._isLoading);
-  const lipSyncedVideoUrl = useProjectStore(state => state.lipSyncedVideoUrl);
+  const composedVideoPath = useProjectStore(state => state.composedVideoPath);
  const dubbingAudioUrl = useProjectStore(state => state.dubbingAudioUrl);

  // 页面刷新后从 meta.json 恢复项目状态（projectStore persist 不保存业务数据）
@@ -80,8 +80,8 @@ function VideoCreationContent() {
  const isStep1Complete = segments.length > 0;
  // Step 2 音频合成：必须有配音音频
  const isStep2Complete = isStep1Complete && !!dubbingAudioUrl;
-  // Step 3 视频生成：对口型成品视频已生成（七牛云 URL）
-  const isStep3Complete = isStep2Complete && !!lipSyncedVideoUrl;
+  // Step 3 视频生成：拼接视频已生成（对口型结果替换上线后再改回 lipSyncedVideoUrl）
+  const isStep3Complete = isStep2Complete && !!composedVideoPath;
  // Step 4+ 字幕压制：需要视频生成完成
  // 判断用户能否进入某步骤
  const canAccessStep = (stepId: number) => {
@@ -45,6 +45,9 @@ export interface ProjectState {
  currentStep: number; // 当前视频创作步骤 (1-6)
  scriptDuration?: number; // 脚本生成参数：视频时长
  scriptType?: string;     // 脚本生成参数：脚本类型
+  voiceSpeed: number;      // 配音语速
+  voiceVolume: number;     // 配音音量
+  voicePitch: number;      // 配音音调
 }

 interface ProjectActions {
@@ -57,12 +60,16 @@ interface ProjectActions {
  setCurrentStep: (step: number) => void;
  setSelectedHumanId: (id: string | undefined) => void;
  setSelectedAvatarInfo: (humanId?: string, elementId?: number, voiceId?: string) => void;
+  setSelectedVoiceId: (voiceId: string) => void;
  setFinalVideoPath: (path: string | undefined) => void;
  setCoverPath: (path: string | undefined) => void;
  setCoverConfig: (config: ProjectMeta['coverConfig']) => void;
  setExportedAt: (timestamp: number | undefined) => void;
  setScriptDuration: (duration: number) => void;
  setScriptType: (type: string) => void;
+  setVoiceSpeed: (speed: number) => void;
+  setVoiceVolume: (volume: number) => void;
+  setVoicePitch: (pitch: number) => void;
  setIsLoading: (loading: boolean) => void;
  setHasHydrated: (hydrated: boolean) => void;

@@ -83,6 +90,9 @@ const initialState: Omit<
  segments: [],
  currentStep: 1,
  _isLoading: false,
+  voiceSpeed: 1.0,
+  voiceVolume: 0,
+  voicePitch: 0,
  _hasHydrated: false,
 };

@@ -112,6 +122,9 @@ interface SaveState {
  burnedVideoPath?: string;
  scriptDuration?: number;
  scriptType?: string;
+  voiceSpeed?: number;
+  voiceVolume?: number;
+  voicePitch?: number;
  currentStep?: number;
  coverConfig?: ProjectMeta['coverConfig'];
  segments: ScriptShot[];
@@ -168,6 +181,9 @@ export async function saveMetaToLocalFile(overrides: Partial<Omit<SaveState, 'se
        subtitleAlignment: 'subtitleAlignment' in overrides ? overrides.subtitleAlignment : existingMeta?.subtitleAlignment,
        burnedVideoPath: 'burnedVideoPath' in overrides ? overrides.burnedVideoPath : existingMeta?.burnedVideoPath,
        coverConfig: 'coverConfig' in overrides ? overrides.coverConfig : existingMeta?.coverConfig,
+        voiceSpeed: 'voiceSpeed' in overrides ? overrides.voiceSpeed : existingMeta?.voiceSpeed,
+        voiceVolume: 'voiceVolume' in overrides ? overrides.voiceVolume : existingMeta?.voiceVolume,
+        voicePitch: 'voicePitch' in overrides ? overrides.voicePitch : existingMeta?.voicePitch,
        scriptDuration: 'scriptDuration' in overrides ? overrides.scriptDuration : existingMeta?.scriptDuration,
        scriptType: 'scriptType' in overrides ? overrides.scriptType : existingMeta?.scriptType,
      };
@@ -264,6 +280,12 @@ export const useProjectStore = create<ProjectStore>()(
            state.updatedAt = Date.now();
          });
        },
+        setSelectedVoiceId: voiceId => {
+          set(state => {
+            state.selectedVoiceId = voiceId;
+            state.updatedAt = Date.now();
+          });
+        },
        setFinalVideoPath: path => {
          set(state => {
            state.finalVideoPath = path;
@@ -302,7 +324,21 @@ export const useProjectStore = create<ProjectStore>()(
            state.scriptType = type;
            state.updatedAt = Date.now();
          }),
-
+        setVoiceSpeed: speed =>
+          set(state => {
+            state.voiceSpeed = speed;
+            state.updatedAt = Date.now();
+          }),
+        setVoiceVolume: volume =>
+          set(state => {
+            state.voiceVolume = volume;
+            state.updatedAt = Date.now();
+          }),
+        setVoicePitch: pitch =>
+          set(state => {
+            state.voicePitch = pitch;
+            state.updatedAt = Date.now();
+          }),

      }),
      {
@@ -364,6 +400,9 @@ export async function initProjectStore(projectId?: string): Promise<void> {
      dubbingAudioUrl: meta?.dubbingAudioUrl,
      dubbingAudioPath: meta?.dubbingAudioPath,
      dubbingVoiceId: meta?.dubbingVoiceId,
+      voiceSpeed: meta?.voiceSpeed ?? 1.0,
+      voiceVolume: meta?.voiceVolume ?? 0,
+      voicePitch: meta?.voicePitch ?? 0,
      avatarMaterialPath: meta?.avatarMaterialPath,
      avatarMaterialName: meta?.avatarMaterialName,
      avatarMaterialDuration: meta?.avatarMaterialDuration,
@@ -397,7 +436,10 @@ export async function createNewProject(topic?: string, segments?: ScriptShot[]):
    topic: undefined,
    selectedHumanId: undefined,
    selectedElementId: undefined,
-    selectedVoiceId: undefined,
+    selectedVoiceId: 'tianxin_xiaoling',
+    voiceSpeed: 1.0,
+    voiceVolume: 0,
+    voicePitch: 0,
    finalVideoPath: undefined,
    coverPath: undefined,
    coverConfig: undefined,
@@ -13,7 +13,6 @@ import * as voiceApi from '../api/modules/voice';
 interface VoiceState {
  // 预设音色列表
  presetVoices: VoiceInfo[];
-  selectedVoiceId: string;

  // 项目音频文件列表
  projectAudios: AudioMeta[];
@@ -25,15 +24,6 @@ interface VoiceState {
  // 当前项目 ID
  currentProjectId: string | null;

-  // 语速
-  speed: number;
-
-  // 音量 (0.5-10.0)
-  volume: number;
-
-  // 音调 (-10 到 10)
-  pitch: number;
-
  // 加载状态
  isLoadingVoices: boolean;
  isLoadingAudios: boolean;
@@ -46,16 +36,6 @@ interface VoiceState {
 interface VoiceActions {
  // 音色操作
  loadPresetVoices: () => Promise<void>;
-  setSelectedVoiceId: (id: string) => void;
-
-  // 语速
-  setSpeed: (speed: number) => void;
-
-  // 音量
-  setVolume: (volume: number) => void;
-
-  // 音调
-  setPitch: (pitch: number) => void;

  // 素材库操作
  loadVoiceMaterials: () => Promise<void>;
@@ -87,13 +67,9 @@ interface VoiceActions {

 const initialState: VoiceState = {
  presetVoices: [],
-  selectedVoiceId: 'tianxin_xiaoling', // 甜心小玲
  projectAudios: [],
  audioMapping: {},
  currentProjectId: null,
-  speed: 1.0,
-  volume: 0,
-  pitch: 0,
  isLoadingVoices: false,
  isLoadingAudios: false,
  voiceMaterials: [],
@@ -171,17 +147,6 @@ export const useVoiceStore = create<VoiceState & VoiceActions>()(
      }
    },

-    setSelectedVoiceId: (id) => set({ selectedVoiceId: id }),
-
-    // ====================== 语速 ======================
-    setSpeed: (speed: number) => set({ speed }),
-
-    // ====================== 音量 ======================
-    setVolume: (volume: number) => set({ volume }),
-
-    // ====================== 音调 ======================
-    setPitch: (pitch: number) => set({ pitch }),
-
    // ====================== 素材库操作 ======================
    loadVoiceMaterials: async () => {
      set({ isLoadingMaterials: true });
@@ -358,10 +323,8 @@ export function usePresetVoices() {
  return useVoiceStore(
    useShallow(state => ({
      voices: state.presetVoices,
-      selectedVoiceId: state.selectedVoiceId,
      isLoading: state.isLoadingVoices,
      load: state.loadPresetVoices,
-      setSelected: state.setSelectedVoiceId,
    }))
  );
 }
@@ -0,0 +1,118 @@
+/**
+ * 音频打轴文本匹配工具
+ * =====================
+ *
+ * 将火山引擎打轴返回的 utterances 按 segment voiceover 进行分组匹配，
+ * 计算每个 segment 在完整配音音频中的时间范围。
+ */
+
+export interface Utterance {
+  text: string;
+  startTime: number; // 毫秒
+  endTime: number;   // 毫秒
+}
+
+export interface SegmentAudioRange {
+  segmentId: number;
+  startTime: number; // 毫秒
+  endTime: number;   // 毫秒
+  actualDuration: number; // 秒
+  matchedText: string;
+  matchedUtteranceCount: number;
+}
+
+// 中文数字映射（只处理单个数字，不处理组合如"十五"）
+const CN_NUMBERS: Record<string, string> = {
+  '零': '0', '〇': '0', '一': '1', '二': '2', '两': '2', '三': '3',
+  '四': '4', '五': '5', '六': '6', '七': '7', '八': '8', '九': '9',
+};
+
+// 语气词列表
+const PARTICLES = new Set([
+  '啊', '呢', '吧', '嗯', '哦', '哈', '啦', '嘛', '呗', '咧', '哟',
+]);
+
+/**
+ * 文本归一化：去标点、去空格、中文数字转阿拉伯数字、去语气词
+ */
+function normalizeText(text: string): string {
+  return text
+    // 去标点（保留中文、英文、数字）
+    .replace(/[^\u4e00-\u9fa5a-zA-Z0-9]/g, '')
+    // 中文数字转阿拉伯数字
+    .replace(/[零〇一二两三四五六七八九十]/g, (match) => CN_NUMBERS[match] || match)
+    // 去语气词
+    .split('')
+    .filter((c) => !PARTICLES.has(c))
+    .join('')
+    .toLowerCase();
+}
+
+/**
+ * 将 segment voiceover 列表与打轴 utterances 进行顺序匹配
+ *
+ * @param segments 分镜列表，每个含 id 和 voiceover
+ * @param utterances 打轴返回的字幕时间轴
+ * @returns 每个 segment 对应的时间范围，匹配失败的 segment 不包含在结果中
+ */
+export function matchSegmentsToUtterances(
+  segments: Array<{ id: number; voiceover: string }>,
+  utterances: Utterance[]
+): SegmentAudioRange[] {
+  const results: SegmentAudioRange[] = [];
+  let uIdx = 0; // 当前 utterance 索引
+
+  for (const seg of segments) {
+    const targetNorm = normalizeText(seg.voiceover);
+    if (!targetNorm) continue;
+
+    let accumulatedNorm = '';
+    const matchedUtterances: Utterance[] = [];
+
+    // 贪婪累加 utterances，直到匹配或耗尽
+    while (uIdx < utterances.length) {
+      const u = utterances[uIdx];
+      const uNorm = normalizeText(u.text);
+      accumulatedNorm += uNorm;
+      matchedUtterances.push(u);
+      uIdx++;
+
+      // 检查是否包含目标文本
+      if (accumulatedNorm.includes(targetNorm)) {
+        break;
+      }
+    }
+
+    // 判断是否匹配成功
+    // 条件1：accumulated 文本包含 target 文本（最可靠）
+    // 条件2：长度差异 ≤ 10% 且 accumulated 以 target 开头（避免完全不相关的文本被错误匹配）
+    const isMatch = accumulatedNorm.includes(targetNorm);
+    const lengthDiff = Math.abs(accumulatedNorm.length - targetNorm.length);
+    const lengthTolerance = Math.max(1, Math.floor(targetNorm.length * 0.1));
+    const isApproxMatch =
+      lengthDiff <= lengthTolerance &&
+      (accumulatedNorm.startsWith(targetNorm.slice(0, Math.min(3, targetNorm.length))) ||
+       targetNorm.startsWith(accumulatedNorm.slice(0, Math.min(3, accumulatedNorm.length))));
+
+    if (isMatch || isApproxMatch) {
+      const firstU = matchedUtterances[0];
+      const lastU = matchedUtterances[matchedUtterances.length - 1];
+      results.push({
+        segmentId: seg.id,
+        startTime: firstU.startTime,
+        endTime: lastU.endTime,
+        actualDuration: (lastU.endTime - firstU.startTime) / 1000,
+        matchedText: matchedUtterances.map((u) => u.text).join(''),
+        matchedUtteranceCount: matchedUtterances.length,
+      });
+    } else {
+      // 匹配失败：已消费的 utterances 确实属于这个失败的 segment，不回退 uIdx
+      console.warn(
+        `[audioAlign] Segment ${seg.id} 匹配失败: ` +
+        `target="${targetNorm}" accumulated="${accumulatedNorm}"`
+      );
+    }
+  }
+
+  return results;
+}