feat: 接入 Vidu TTS/复刻/对口型，替换 MiniMax 语音能力

- 新增 ViduProvider: TTS同步、声音复刻、对口型、任务查询 - 新增 ViduTTSService: 业务封装，6个精选中文预设音色 - Voice API 路由全面切换至 Vidu - 新增 /voice/lip-sync 对口型异步接口 - 前端适配: 16个音色→6个、slider范围更新、音量默认0 - 添加 vidu-tts-api.md 开发文档 - docker-compose 新增 VIDU_API_KEY 环境变量映射
2026-04-21 23:26:24 +08:00
parent bb08d0f586
commit 189fdf5ed6
9 changed files with 1715 additions and 509 deletions
@@ -0,0 +1,290 @@
+# Vidu TTS API 开发文档
+
+> 来源：https://platform.vidu.cn/docs/text-to-speech  
+> 更新时间：2026-04-21
+
+## 一、概述
+
+Vidu（生数科技）提供语音合成（TTS）和声音复刻能力，所有接口均为**同步接口**，直接返回结果，无需轮询。
+
+- **Base URL**: `https://api.vidu.cn`
+- **认证方式**: `Authorization: Token {your_api_key}`
+- **Content-Type**: `application/json`
+
+---
+
+## 二、语音合成 TTS
+
+### 端点
+
+```
+POST /ent/v2/audio-tts
+```
+
+### 请求头
+
+| 字段 | 值 | 描述 |
+|------|-----|------|
+| Content-Type | application/json | 数据交换格式 |
+| Authorization | Token {your_api_key} | API Key 认证 |
+
+### 请求体
+
+| 参数名称 | 类型 | 必填 | 描述 |
+|----------|------|------|------|
+| text | String | 是 | 待合成文本，**< 10000 字符**。支持 `<#x#>` 停顿标记，x 为停顿时长（秒），范围 [0.01, 99.99] |
+| voice_setting_voice_id | String | 是 | 音色 ID |
+| voice_setting_speed | Float | 否 | 语速，默认 1.0，范围 [0.5, 2] |
+| voice_setting_volume | Int | 否 | 音量，默认 0（正常音量），范围 [0, 10]，值越大音量越高 |
+| voice_setting_pitch | Int | 否 | 语调，默认 0（原音色），范围 [-12, 12] |
+| voice_setting_emotion | String | 否 | 情绪控制：`happy`/`sad`/`angry`/`fearful`/`disgusted`/`surprised`/`calm`。一般无需手动指定，模型自动匹配 |
+| pronunciation_dict_tone | list | 否 | 多音字发音定义，如 `["燕少飞/(yan4)(shao3)(fei1)"]` |
+| payload | String | 否 | 透传参数，最多 1048576 字符 |
+
+### 响应体
+
+```json
+{
+  "task_id": "your_task_id_here",
+  "state": "success",
+  "file_url": "https://...",
+  "credits": 10,
+  "payload": "",
+  "created_at": "2025-01-01T15:41:31.968916Z"
+}
+```
+
+| 字段 | 类型 | 描述 |
+|------|------|------|
+| task_id | String | Vidu 生成的任务 ID |
+| state | String | `queueing` / `success` / `failed` |
+| file_url | String | 音频文件 URL |
+| credits | Int | 本次调用消耗的积分数 |
+| payload | String | 透传参数 |
+| created_at | String | 任务创建时间 |
+
+### Curl 示例
+
+```bash
+curl -X POST https://api.vidu.cn/ent/v2/audio-tts \
+  -H "Authorization: Token {your_api_key}" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "text": "你好，欢迎使用vidu开放平台",
+    "voice_setting_voice_id": "female-tianmei"
+  }'
+```
+
+---
+
+## 三、声音复刻
+
+### 端点
+
+```
+POST /ent/v2/audio-clone
+```
+
+### 请求体
+
+| 参数名称 | 类型 | 必填 | 描述 |
+|----------|------|------|------|
+| audio_url | String | 是 | 原音频 URL（需可访问）。格式：mp3/m4a/wav；时长：10秒~5分钟；大小：≤20MB |
+| voice_id | String | 是 | 自定义声音 ID。长度 [8, 256]；首字符必须为英文字母；允许数字、字母、横线、下划线；末位不可为 `-`、`_`；不可与已有 ID 重复 |
+| prompt_audio_url | String | 否 | 音色复刻示例音频（< 8秒），可增强音色相似度和稳定性 |
+| prompt_text | String | 否 | 示例音频对应文本，需与音频内容一致，句末需有标点 |
+| text | String | 是 | 复刻试听文本，≤1000 字符。使用复刻后的音色朗读，返回试听音频 |
+| payload | String | 否 | 透传参数 |
+
+### 响应体
+
+```json
+{
+  "task_id": "your_task_id_here",
+  "state": "success",
+  "voice_id": "vidu01",
+  "demo_audio": "https://...",
+  "payload": "",
+  "created_at": "2025-01-01T15:41:31.968916Z"
+}
+```
+
+| 字段 | 类型 | 描述 |
+|------|------|------|
+| task_id | String | 任务 ID |
+| state | String | `queueing` / `success` / `failed` |
+| voice_id | String | 用户自定义的 voice_id（任务失败时不返回）|
+| demo_audio | String | 试听音频链接（仅当请求传入 text 时返回）|
+| payload | String | 透传参数 |
+| created_at | String | 创建时间 |
+
+### Curl 示例
+
+```bash
+curl -X POST https://api.vidu.cn/ent/v2/audio-clone \
+  -H "Authorization: Token {your_api_key}" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "audio_url": "your_audio_url",
+    "voice_id": "vidu01",
+    "text": "你好，欢迎使用vidu开放平台"
+  }'
+```
+
+---
+
+## 四、预设音色列表
+
+共 **16 个中文（普通话）**音色，分标准版和 Beta（精品）版。
+
+### 标准版
+
+| voice_id | 音色名称 |
+|----------|----------|
+| male-qn-qingse | 青涩青年音色 |
+| male-qn-jingying | 精英青年音色 |
+| male-qn-badao | 霸道青年音色 |
+| male-qn-daxuesheng | 青年大学生音色 |
+| female-shaonv | 少女音色 |
+| female-yujie | 御姐音色 |
+| female-chengshu | 成熟女性音色 |
+| female-tianmei | 甜美女性音色 |
+
+### Beta（精品）版
+
+| voice_id | 音色名称 |
+|----------|----------|
+| male-qn-qingse-jingpin | 青涩青年音色-beta |
+| male-qn-jingying-jingpin | 精英青年音色-beta |
+| male-qn-badao-jingpin | 霸道青年音色-beta |
+| male-qn-daxuesheng-jingpin | 青年大学生音色-beta |
+| female-shaonv-jingpin | 少女音色-beta |
+| female-yujie-jingpin | 御姐音色-beta |
+| female-chengshu-jingpin | 成熟女性音色-beta |
+| female-tianmei-jingpin | 甜美女性音色-beta |
+
+> 音色试听示例 URL 格式：`https://scene.vidu.zone/media-asset/{id}.mp3`（见飞书表格原始链接）
+
+---
+
+## 五、与 MiniMax 对比（接入参考）
+
+| 维度 | Vidu | MiniMax |
+|------|------|---------|
+| Base URL | `https://api.vidu.cn` | `https://api.minimaxi.com` |
+| 认证 | `Token {key}` | `Bearer {key}` |
+| TTS 端点 | `POST /ent/v2/audio-tts` | `POST /v1/t2a_v2` |
+| 同步/异步 | 同步 | 同步 + 异步 |
+| 文本上限 | 10000 字符 | 10000 字符（同步）|
+| 语速范围 | 0.5 ~ 2.0 (Float) | 需传 Int |
+| 音量范围 | 0 ~ 10 (Int，0=正常) | 需传 Int |
+| 语调范围 | -12 ~ 12 (Int) | 需传 Int |
+| 情绪控制 | 7 种情绪可选 | 不支持 |
+| 多音字 | 支持 `pronunciation_dict_tone` | 不支持 |
+| 声音复刻 | 同步，自定义 voice_id | 异步，系统分配 voice_id |
+| 复刻音频要求 | 10秒~5分钟，≤20MB | 约 10秒~5分钟 |
+| 预设音色 | 16 个中文 | 6 个中文 |
+| 响应音频字段 | `file_url` | `audio` |
+
+---
+
+## 六、对口型（Lip Sync）
+
+### 端点
+
+```
+POST /ent/v2/lip-sync
+```
+
+**⚠️ 异步接口**，创建后返回 task_id，需要通过查询接口轮询或使用 callback_url 接收回调。
+
+### 请求体
+
+| 参数名称 | 类型 | 必填 | 描述 |
+|----------|------|------|------|
+| video_url | String | 是 | 原视频 URL（需可访问）。格式：mp4/mov/avi；时长：1~600秒（建议 10~120秒）；大小：≤5G；分辨率：360p~4096p；编码：H.264 |
+| audio_url | String | 否 | 音频文件 URL。格式：wav/mp3/wma/m4a/aac/ogg；时长：>1s 且 <600s；大小：≤100MB |
+| text | String | 否 | 文本内容，4~2000 字符。与 audio_url 同时有值时，以 audio_url 为准 |
+| speed | Float | 否 | 语速，默认 1.0，范围 [0.5, 2]。仅文字生成时生效 |
+| voice_id | String | 否 | 音色 ID。仅文字生成时生效 |
+| volume | Int | 否 | 音量，默认 0（正常音量），范围 [0, 10]。仅文字生成时生效 |
+| ref_photo_url | String | 否 | 人脸参考图 URL（jpg/jpeg/png/bmp/webp，192~4096px，≤10MB）。视频中有多张人脸时，用于指定对口型目标人物 |
+| callback_url | String | 否 | 回调地址，任务状态变化时 POST 回调 |
+
+### 视频素材规范
+
+- 真人出镜（卡通人物需五官比例接近真人）
+- 人脸正对镜头，水平转动不超过 45°，俯仰不超过 15°
+- 人脸尽量不遮挡，面部光线稳定
+
+### 创建响应
+
+```json
+{
+  "task_id": "your_task_id_here",
+  "state": "created",
+  "payload": "",
+  "created_at": "2025-01-01T15:41:31.968916Z"
+}
+```
+
+### 查询任务状态
+
+```
+GET /ent/v2/tasks/{task_id}/creations
+```
+
+**响应体**：
+
+| 字段 | 类型 | 描述 |
+|------|------|------|
+| id | String | 任务 ID |
+| state | String | `created`/`queueing`/`processing`/`success`/`failed` |
+| err_code | String | 错误码 |
+| credits | Int | 消耗的积分数 |
+| payload | String | 透传参数 |
+| bgm | Bool | 是否使用 BGM |
+| off_peak | Bool | 是否使用错峰模式 |
+| creations | Array | 生成物结果列表 |
+| creations[].id | String | 生成物 ID |
+| creations[].url | String | 生成物 URL（24小时有效期） |
+| creations[].cover_url | String | 生成物封面 URL（24小时有效期） |
+| creations[].watermarked_url | String | 带水印的生成物 URL |
+
+### Curl 示例（音频驱动）
+
+```bash
+curl -X POST https://api.vidu.cn/ent/v2/lip-sync \
+  -H "Authorization: Token {your_api_key}" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "video_url": "your_video_url",
+    "audio_url": "your_audio_url"
+  }'
+```
+
+### Curl 示例（文字驱动）
+
+```bash
+curl -X POST https://api.vidu.cn/ent/v2/lip-sync \
+  -H "Authorization: Token {your_api_key}" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "video_url": "your_video_url",
+    "text": "你好，欢迎使用vidu开放平台",
+    "voice_id": "female-tianmei",
+    "speed": 1.0
+  }'
+```
+
+---
+
+## 七、接入建议
+
+1. **Vidu 优势**：情绪控制、多音字标注、16 个音色（含精品版）、同步复刻、对口型
+2. **Vidu 劣势**：没有独立的"查询音色列表"API，音色通过飞书表格维护
+3. **接口类型差异**：
+   - TTS / 声音复刻：**同步接口**，直接返回结果
+   - 对口型：**异步接口**，需轮询 `GET /tasks/{id}/creations` 或使用 callback
+4. **速度/音量/音调类型**：Vidu 的速度是 **Float**，音量和音调是 **Int**（和 MiniMax 不同，MiniMax 三者都要求 Int）
+5. **前端适配**：语速 slider 范围改为 0.5~2.0；音量改为 0~10；音调改为 -12~12
@@ -0,0 +1,184 @@
+"""
+Vidu API Provider
+=================
+
+封装 Vidu 语音/视频相关 HTTP API：
+- 同步 TTS（/ent/v2/audio-tts）
+- 声音复刻（/ent/v2/audio-clone）
+- 对口型（/ent/v2/lip-sync）
+- 查询任务（/ent/v2/tasks/{id}/creations）
+
+认证方式：Token {api_key}（Authorization Header）
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+import aiohttp
+
+from app.config import get_settings
+
+logger = logging.getLogger(__name__)
+
+
+class ViduProvider:
+    """Vidu API 客户端封装"""
+
+    def __init__(self, api_key: str | None = None, base_url: str | None = None):
+        settings = get_settings()
+        self.api_key = api_key or settings.VIDU_API_KEY
+        self.base_url = (base_url or settings.VIDU_BASE_URL).rstrip("/")
+
+    def _get_headers(self) -> dict[str, str]:
+        return {
+            "Authorization": f"Token {self.api_key}",
+            "Content-Type": "application/json",
+        }
+
+    # ==================== TTS 语音合成 ====================
+
+    async def tts_sync(
+        self,
+        text: str,
+        voice_id: str,
+        speed: float = 1.0,
+        volume: int = 0,
+        pitch: int = 0,
+        emotion: str | None = None,
+        pronunciation_dict_tone: list[str] | None = None,
+        payload: str | None = None,
+    ) -> dict[str, Any]:
+        """
+        同步语音合成
+
+        POST /ent/v2/audio-tts
+        """
+        url = f"{self.base_url}/ent/v2/audio-tts"
+
+        body: dict[str, Any] = {
+            "text": text,
+            "voice_setting_voice_id": voice_id,
+            "voice_setting_speed": speed,
+            "voice_setting_volume": volume,
+            "voice_setting_pitch": pitch,
+        }
+        if emotion:
+            body["voice_setting_emotion"] = emotion
+        if pronunciation_dict_tone:
+            body["pronunciation_dict_tone"] = pronunciation_dict_tone
+        if payload:
+            body["payload"] = payload
+
+        async with aiohttp.ClientSession() as session:
+            async with session.post(url, json=body, headers=self._get_headers()) as resp:
+                data = await resp.json()
+                if resp.status != 200 or data.get("state") == "failed":
+                    msg = data.get("err_code") or data.get("message") or f"HTTP {resp.status}"
+                    raise Exception(f"Vidu TTS error: {msg}")
+                return data
+
+    # ==================== 声音复刻 ====================
+
+    async def clone_voice(
+        self,
+        audio_url: str,
+        voice_id: str,
+        text: str,
+        prompt_audio_url: str | None = None,
+        prompt_text: str | None = None,
+        payload: str | None = None,
+    ) -> dict[str, Any]:
+        """
+        声音复刻（同步接口）
+
+        POST /ent/v2/audio-clone
+        """
+        url = f"{self.base_url}/ent/v2/audio-clone"
+
+        body: dict[str, Any] = {
+            "audio_url": audio_url,
+            "voice_id": voice_id,
+            "text": text,
+        }
+        if prompt_audio_url:
+            body["prompt_audio_url"] = prompt_audio_url
+        if prompt_text:
+            body["prompt_text"] = prompt_text
+        if payload:
+            body["payload"] = payload
+
+        async with aiohttp.ClientSession() as session:
+            async with session.post(url, json=body, headers=self._get_headers()) as resp:
+                data = await resp.json()
+                if resp.status != 200 or data.get("state") == "failed":
+                    msg = data.get("err_code") or data.get("message") or f"HTTP {resp.status}"
+                    raise Exception(f"Vidu clone error: {msg}")
+                return data
+
+    # ==================== 对口型 ====================
+
+    async def lip_sync(
+        self,
+        video_url: str,
+        audio_url: str | None = None,
+        text: str | None = None,
+        voice_id: str | None = None,
+        speed: float = 1.0,
+        volume: int = 0,
+        ref_photo_url: str | None = None,
+        callback_url: str | None = None,
+        payload: str | None = None,
+    ) -> dict[str, Any]:
+        """
+        对口型（异步接口）
+
+        POST /ent/v2/lip-sync
+        """
+        url = f"{self.base_url}/ent/v2/lip-sync"
+
+        body: dict[str, Any] = {"video_url": video_url}
+
+        if audio_url:
+            body["audio_url"] = audio_url
+        if text:
+            body["text"] = text
+        if voice_id:
+            body["voice_id"] = voice_id
+        if speed != 1.0:
+            body["speed"] = speed
+        if volume != 0:
+            body["volume"] = volume
+        if ref_photo_url:
+            body["ref_photo_url"] = ref_photo_url
+        if callback_url:
+            body["callback_url"] = callback_url
+        if payload:
+            body["payload"] = payload
+
+        async with aiohttp.ClientSession() as session:
+            async with session.post(url, json=body, headers=self._get_headers()) as resp:
+                data = await resp.json()
+                if resp.status != 200 or data.get("state") == "failed":
+                    msg = data.get("err_code") or data.get("message") or f"HTTP {resp.status}"
+                    raise Exception(f"Vidu lip-sync error: {msg}")
+                return data
+
+    # ==================== 查询任务 ====================
+
+    async def query_task(self, task_id: str) -> dict[str, Any]:
+        """
+        查询任务状态及生成物
+
+        GET /ent/v2/tasks/{task_id}/creations
+        """
+        url = f"{self.base_url}/ent/v2/tasks/{task_id}/creations"
+
+        async with aiohttp.ClientSession() as session:
+            async with session.get(url, headers=self._get_headers()) as resp:
+                data = await resp.json()
+                if resp.status != 200:
+                    msg = data.get("err_code") or data.get("message") or f"HTTP {resp.status}"
+                    raise Exception(f"Vidu query task error: {msg}")
+                return data
@@ -3,19 +3,24 @@
 =======================

 提供 TTS 语音合成、批量合成、声音克隆等功能。
-基于 Kling AI TTS 和声音克隆 API。
+基于 MiniMax TTS 和声音克隆 API。
+（Kling AI 语音相关代码保留但已废弃，仅视频/形象克隆仍使用 Kling）
 """

 import logging
 import tempfile
+import uuid
 from pathlib import Path

-from fastapi import APIRouter, HTTPException
+from fastapi import APIRouter, File, Form, HTTPException, UploadFile
 from pydantic import BaseModel, Field

 from app.schemas.common import ApiResponse, success_response
-from app.services.tts_service import TTSService
-from app.services.voice_clone_service import VoiceCloneService
+from app.services.qiniu_service import QiniuService
+from app.services.vidu_tts_service import ViduTTSService
+from app.services.minimax_tts_service import MiniMaxTTSService  # noqa: F401 历史兼容
+from app.services.tts_service import TTSService  # noqa: F401 历史兼容
+from app.services.voice_clone_service import VoiceCloneService  # noqa: F401 历史兼容

 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/voice", tags=["Voice"])
@@ -27,10 +32,12 @@ router = APIRouter(prefix="/voice", tags=["Voice"])
 class TTSSynthesizeRequest(BaseModel):
    """TTS 合成请求"""

-    text: str = Field(..., min_length=1, max_length=1000, description="待合成文本（≤1000字）")
-    voice_id: str | None = Field(None, description="音色 ID（默认：温柔女声）")
-    speed: float = Field(default=1.0, ge=0.8, le=2.0, description="语速 0.8-2.0")
+    text: str = Field(..., min_length=1, max_length=10000, description="待合成文本（≤10000字符）")
+    voice_id: str | None = Field(None, description="音色 ID（默认：甜美女性）")
+    speed: float = Field(default=1.0, ge=0.5, le=2.0, description="语速 0.5-2.0")
    voice_language: str = Field(default="zh", description="音色语种 (zh/en)")
+    volume: int = Field(default=0, ge=0, le=10, description="音量 0-10（0=正常）")
+    pitch: int = Field(default=0, ge=-12, le=12, description="音调 -12 到 12")


 class TTSBatchSegment(BaseModel):
@@ -46,7 +53,9 @@ class TTSBatchRequest(BaseModel):

    segments: list[TTSBatchSegment] = Field(..., min_length=1, description="段落列表")
    voice_id: str | None = Field(None, description="音色 ID")
-    speed: float = Field(default=1.0, ge=0.8, le=2.0, description="语速")
+    speed: float = Field(default=1.0, ge=0.5, le=2.0, description="语速")
+    volume: int = Field(default=0, ge=0, le=10, description="音量 0-10")
+    pitch: int = Field(default=0, ge=-12, le=12, description="音调 -12 到 12")


 class VoiceCloneSubmitRequest(BaseModel):
@@ -77,6 +86,13 @@ class VoiceCloneTaskResponse(BaseModel):
    error_message: str | None = None


+class VoiceUploadResponse(BaseModel):
+    """音频上传响应"""
+
+    url: str = Field(..., description="七牛云访问 URL")
+    key: str = Field(..., description="存储 Key")
+
+
 class VoiceInfo(BaseModel):
    """音色信息"""

@@ -85,11 +101,109 @@ class VoiceInfo(BaseModel):
    description: str = ""
    language: str = "zh"
    recommended: bool = False
+    previewUrl: str | None = None
+
+
+class LipSyncRequest(BaseModel):
+    """对口型请求"""
+
+    video_url: str = Field(..., description="原视频 URL")
+    audio_url: str | None = Field(None, description="音频 URL（与 text 二选一）")
+    text: str | None = Field(None, description="文本内容（与 audio_url 二选一）")
+    voice_id: str | None = Field(None, description="音色 ID（文字驱动时生效）")
+    speed: float = Field(default=1.0, ge=0.5, le=2.0, description="语速")
+    volume: int = Field(default=0, ge=0, le=10, description="音量")
+    ref_photo_url: str | None = Field(None, description="人脸参考图 URL")
+
+
+class LipSyncResponse(BaseModel):
+    """对口型响应"""
+
+    task_id: str
+    state: str
+
+
+class LipSyncQueryResponse(BaseModel):
+    """对口型查询响应"""
+
+    task_id: str
+    state: str
+    video_url: str | None = None
+    cover_url: str | None = None


 # ========== API 路由 ==========


+@router.post("/upload", response_model=ApiResponse[VoiceUploadResponse])
+async def upload_voice_file(
+    file: UploadFile = File(...),
+    file_type: str = Form(default="audio", description="文件类型: audio | video"),
+):
+    """
+    上传音频/视频文件到七牛云
+
+    接收音频（mp3/wav）或视频（mp4/mov）文件，上传至七牛云 media bucket，
+    返回公开访问 URL。
+    """
+    try:
+        file_type = file_type.lower().strip()
+        if file_type not in ("audio", "video"):
+            raise HTTPException(status_code=400, detail="file_type 必须是 audio 或 video")
+
+        # 根据类型校验 MIME
+        if file_type == "audio":
+            allowed_types = {"audio/mpeg", "audio/mp3", "audio/wav"}
+            max_size = 50 * 1024 * 1024  # 50MB
+            prefix = "meijiaka-zj/voice"
+            type_label = "音频"
+        else:
+            allowed_types = {"video/mp4", "video/quicktime"}
+            max_size = 200 * 1024 * 1024  # 200MB
+            prefix = "meijiaka-zj/avatar"
+            type_label = "视频"
+
+        content_type = file.content_type or "application/octet-stream"
+        if content_type not in allowed_types:
+            raise HTTPException(
+                status_code=400,
+                detail=f"不支持的{type_label}格式: {content_type}，仅支持 {', '.join(allowed_types)}",
+            )
+
+        # 读取文件内容
+        content = await file.read()
+        if len(content) > max_size:
+            raise HTTPException(status_code=400, detail=f"{type_label}文件大小不能超过 {max_size // 1024 // 1024}MB")
+
+        # 生成存储 key
+        ext = content_type.split("/")[-1].replace("quicktime", "mov").replace("mpeg", "mp3")
+        key = f"{prefix}/{uuid.uuid4().hex}.{ext}"
+
+        # 上传到七牛云
+        qiniu = QiniuService()
+        from io import BytesIO
+
+        qiniu.upload_stream(
+            stream=BytesIO(content),
+            key=key,
+            mime_type=content_type,
+        )
+
+        # 获取公开 URL（media bucket 使用 video_domain）
+        url = qiniu.get_file_url(qiniu.video_domain, key)
+
+        return success_response(
+            data=VoiceUploadResponse(url=url, key=key),
+            message="上传成功",
+        )
+
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"[Voice] 上传失败: {e}")
+        raise HTTPException(status_code=500, detail=f"上传失败: {str(e)}")
+
+
@router.get("/voices", response_model=ApiResponse[list[VoiceInfo]])
 async def list_voices():
    """
@@ -97,13 +211,26 @@ async def list_voices():

    返回预设的音色选项，用户可选择喜欢的音色进行 TTS 合成。
    """
-    voices = TTSService.get_preset_voices()
+    voices = ViduTTSService.get_preset_voices()
    return success_response(
        data=[VoiceInfo(**v) for v in voices],
        message="获取音色列表成功",
    )


+@router.get("/preset-voices/raw", response_model=ApiResponse[list[dict]])
+async def list_preset_voices_raw():
+    """
+    【已废弃】KlingAI 官方预置音色列表
+
+    语音功能已迁移至 Vidu，此端点保留仅作历史兼容。
+    """
+    return success_response(
+        data=[],
+        message="语音功能已迁移至 Vidu，请使用 /voices 获取音色列表",
+    )
+
+
@router.post("/synthesize", response_model=ApiResponse[dict])
 async def synthesize_speech(request: TTSSynthesizeRequest):
    """
@@ -113,12 +240,13 @@ async def synthesize_speech(request: TTSSynthesizeRequest):
    适用于短文本（≤1000字），长文本建议使用 /synthesize-batch。
    """
    try:
-        service = TTSService()
+        service = ViduTTSService()
        audio_url = await service.synthesize_sync(
            text=request.text,
            voice_id=request.voice_id,
            speed=request.speed,
-            voice_language=request.voice_language,
+            volume=request.volume,
+            pitch=request.pitch,
        )

        return success_response(
@@ -126,7 +254,7 @@ async def synthesize_speech(request: TTSSynthesizeRequest):
                "audio_url": audio_url,
                "format": "mp3",
                "text": request.text,
-                "voice_id": request.voice_id or "829826751244537879",
+                "voice_id": request.voice_id or ViduTTSService.DEFAULT_VOICE_ID,
            },
            message="合成成功",
        )
@@ -154,13 +282,31 @@ async def synthesize_batch(request: TTSBatchRequest):

        segments_data = [s.model_dump() for s in request.segments]

-        service = TTSService()
-        results = await service.batch_synthesize(
-            segments=segments_data,
-            output_dir=output_dir,
-            voice_id=request.voice_id,
-            speed=request.speed,
-        )
+        service = ViduTTSService()
+        # Vidu 暂不支持批量合成，逐段调用
+        results = []
+        for seg in segments_data:
+            try:
+                audio_url = await service.synthesize_sync(
+                    text=seg["text"],
+                    voice_id=request.voice_id,
+                    speed=request.speed,
+                    volume=request.volume,
+                    pitch=request.pitch,
+                )
+                results.append({
+                    "index": seg.get("index", 0),
+                    "success": True,
+                    "audio_url": audio_url,
+                    "filename": seg.get("filename"),
+                })
+            except Exception as e:
+                results.append({
+                    "index": seg.get("index", 0),
+                    "success": False,
+                    "error": str(e),
+                    "filename": seg.get("filename"),
+                })

        success_count = sum(1 for r in results if r["success"])
        failed_count = len(results) - success_count
@@ -188,20 +334,28 @@ async def synthesize_to_file(request: TTSSynthesizeRequest, output_path: str):
    将文本转换为语音并保存到指定文件路径。
    """
    try:
-        service = TTSService()
-        saved_path = await service.synthesize_to_file(
+        service = ViduTTSService()
+        audio_url = await service.synthesize_sync(
            text=request.text,
-            output_path=output_path,
            voice_id=request.voice_id,
            speed=request.speed,
-            voice_language=request.voice_language,
+            volume=request.volume,
+            pitch=request.pitch,
        )

+        # 下载音频并保存到指定路径
+        import httpx
+        async with httpx.AsyncClient() as client:
+            response = await client.get(audio_url)
+            response.raise_for_status()
+            Path(output_path).parent.mkdir(parents=True, exist_ok=True)
+            Path(output_path).write_bytes(response.content)
+
        return success_response(
            data={
-                "file_path": str(saved_path),
+                "file_path": output_path,
                "text": request.text,
-                "voice_id": request.voice_id or "829826751244537879",
+                "voice_id": request.voice_id or ViduTTSService.DEFAULT_VOICE_ID,
            },
            message="文件保存成功",
        )
@@ -217,26 +371,26 @@ async def synthesize_to_file(request: TTSSynthesizeRequest, output_path: str):
@router.post("/clone/submit", response_model=ApiResponse[VoiceCloneTaskResponse])
 async def submit_clone_task(request: VoiceCloneSubmitRequest):
    """
-    提交声音克隆任务
+    提交声音克隆任务（Vidu）

-    提交音频/视频 URL 进行声音克隆，返回任务 ID 用于后续查询。
-    支持三种来源：source_audio_url、source_video_url、video_id。
+    Vidu 声音复刻是同步接口，直接返回结果。
    """
    try:
-        service = VoiceCloneService()
-        task_id = await service.submit_clone_task(
-            source_audio_url=request.source_audio_url,
-            source_video_url=request.source_video_url,
-            video_id=request.video_id,
-            voice_name=request.voice_name,
+        service = ViduTTSService()
+        result = await service.clone_voice(
+            audio_url=request.source_audio_url or "",
+            voice_id=request.voice_name or f"vidu_{uuid.uuid4().hex[:8]}",
        )

+        # Vidu 同步返回，状态直接为 succeeded
        return success_response(
            data=VoiceCloneTaskResponse(
-                task_id=task_id,
-                status="pending",
+                task_id=result.get("task_id", ""),
+                status="succeeded",
+                voice_id=result.get("voice_id"),
+                trial_url=result.get("demo_audio"),
            ),
-            message="克隆任务已提交",
+            message="克隆成功",
        )

    except ValueError as e:
@@ -250,29 +404,17 @@ async def submit_clone_task(request: VoiceCloneSubmitRequest):
@router.get("/clone/query/{task_id}", response_model=ApiResponse[VoiceCloneTaskResponse])
 async def query_clone_task(task_id: str, blocking: bool = False):
    """
-    查询声音克隆任务状态
+    查询声音克隆任务状态（Vidu）

-    Args:
-        task_id: 任务 ID
-        blocking: 是否阻塞等待完成（默认 False）
+    Vidu 声音复刻是同步接口，此端点仅做兼容，直接返回成功状态。
    """
-    try:
-        service = VoiceCloneService()
-        result = await service.query_clone_task(task_id, blocking=blocking)
-
-        return success_response(
-            data=VoiceCloneTaskResponse(
-                task_id=result["task_id"],
-                status=result["status"],
-                voice_id=result.get("voice_id"),
-                trial_url=result.get("trial_url"),
-                error_message=result.get("error_message"),
-            )
-        )
-
-    except Exception as e:
-        logger.error(f"[Voice] 查询克隆任务失败: {e}")
-        raise HTTPException(status_code=500, detail=f"查询失败: {str(e)}")
+    return success_response(
+        data=VoiceCloneTaskResponse(
+            task_id=task_id,
+            status="succeeded",
+        ),
+        message="克隆已完成",
+    )


@router.post("/clone/clone-and-wait", response_model=ApiResponse[VoiceCloneTaskResponse])
@@ -284,24 +426,20 @@ async def clone_and_wait(request: VoiceCloneSubmitRequest, poll_interval: float
    适用于需要等待克隆完成的场景。
    """
    try:
-        service = VoiceCloneService()
-        result = await service.wait_for_clone(
-            source_audio_url=request.source_audio_url,
-            source_video_url=request.source_video_url,
-            video_id=request.video_id,
-            voice_name=request.voice_name,
-            poll_interval=poll_interval,
+        service = ViduTTSService()
+        result = await service.clone_voice(
+            audio_url=request.source_audio_url or "",
+            voice_id=request.voice_name or f"vidu_{uuid.uuid4().hex[:8]}",
        )

        return success_response(
            data=VoiceCloneTaskResponse(
-                task_id=result["task_id"],
-                status=result["status"],
+                task_id=result.get("task_id", ""),
+                status="succeeded",
                voice_id=result.get("voice_id"),
-                trial_url=result.get("trial_url"),
-                error_message=result.get("error_message"),
+                trial_url=result.get("demo_audio"),
            ),
-            message=f"克隆任务完成，状态: {result['status']}",
+            message="克隆成功",
        )

    except ValueError as e:
@@ -312,4 +450,73 @@ async def clone_and_wait(request: VoiceCloneSubmitRequest, poll_interval: float
        raise HTTPException(status_code=500, detail=f"克隆失败: {str(e)}")


+# ==================== 对口型 ====================
+
+
+@router.post("/lip-sync", response_model=ApiResponse[LipSyncResponse])
+async def create_lip_sync(request: LipSyncRequest):
+    """
+    创建对口型任务（异步接口）
+
+    输入视频 + 音频/文字，生成对口型视频。
+    返回 task_id，需通过 /lip-sync/{task_id} 查询结果。
+    """
+    try:
+        if not request.audio_url and not request.text:
+            raise ValueError("audio_url 和 text 至少传一个")
+
+        service = ViduTTSService()
+        task_id = await service.lip_sync_create(
+            video_url=request.video_url,
+            audio_url=request.audio_url,
+            text=request.text,
+            voice_id=request.voice_id,
+            speed=request.speed,
+            volume=request.volume,
+            ref_photo_url=request.ref_photo_url,
+        )
+
+        return success_response(
+            data=LipSyncResponse(task_id=task_id, state="created"),
+            message="对口型任务已创建",
+        )
+
+    except ValueError as e:
+        logger.warning(f"[Voice] 对口型参数错误: {e}")
+        raise HTTPException(status_code=422, detail=str(e))
+    except Exception as e:
+        logger.error(f"[Voice] 对口型任务创建失败: {e}")
+        raise HTTPException(status_code=500, detail=f"创建失败: {str(e)}")
+
+
+@router.get("/lip-sync/{task_id}", response_model=ApiResponse[LipSyncQueryResponse])
+async def query_lip_sync(task_id: str):
+    """
+    查询对口型任务状态
+
+    返回任务状态及生成物 URL（24小时有效期）。
+    """
+    try:
+        service = ViduTTSService()
+        result = await service.lip_sync_query(task_id)
+
+        state = result.get("state", "unknown")
+        creations = result.get("creations", [])
+        video_url = creations[0].get("url") if creations else None
+        cover_url = creations[0].get("cover_url") if creations else None
+
+        return success_response(
+            data=LipSyncQueryResponse(
+                task_id=task_id,
+                state=state,
+                video_url=video_url,
+                cover_url=cover_url,
+            ),
+            message=f"任务状态: {state}",
+        )
+
+    except Exception as e:
+        logger.error(f"[Voice] 查询对口型任务失败: {e}")
+        raise HTTPException(status_code=500, detail=f"查询失败: {str(e)}")
+

@@ -119,6 +119,20 @@ class Settings(BaseSettings):
    KLINGAI_ACCESS_KEY: str | None = Field(default=None, description="KlingAI Access Key")
    KLINGAI_SECRET_KEY: str | None = Field(default=None, description="KlingAI Secret Key")

+    # MiniMax 配置
+    MINIMAX_API_KEY: str | None = Field(default=None, description="MiniMax API Key")
+    MINIMAX_BASE_URL: str = Field(
+        default="https://api.minimaxi.com",
+        description="MiniMax Base URL（国内: api.minimaxi.com, 国际: api.minimax.io）",
+    )
+
+    # Vidu 配置
+    VIDU_API_KEY: str | None = Field(default=None, description="Vidu API Key")
+    VIDU_BASE_URL: str = Field(
+        default="https://api.vidu.cn",
+        description="Vidu Base URL",
+    )
+
    # 七牛云存储配置
    QINIU_ACCESS_KEY: str | None = Field(default=None, description="七牛云 Access Key")
    QINIU_SECRET_KEY: str | None = Field(default=None, description="七牛云 Secret Key")
@@ -0,0 +1,241 @@
+"""
+Vidu TTS 服务封装
+=================
+
+业务层封装：
+- 同步 TTS
+- 声音复刻
+- 对口型（异步，需轮询）
+- 预设音色列表
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from app.ai.providers.vidu_provider import ViduProvider
+
+logger = logging.getLogger(__name__)
+
+# Vidu 预设音色（底层为 MiniMax，兼容 MiniMax 音色 ID）
+VIDU_PRESET_VOICES = [
+    {
+        "voice_id": "tianxin_xiaoling",
+        "name": "甜心小玲",
+        "language": "zh",
+        "description": "甜美可爱，活泼俏皮",
+        "recommended": True,
+        "previewUrl": "https://media.liche.cn/meijiaka-zj/voice/tianxin_xiaoling.mp3",
+    },
+    {
+        "voice_id": "danya_xuejie",
+        "name": "淡雅学姐",
+        "language": "zh",
+        "description": "淡雅知性，温婉柔和",
+        "recommended": False,
+        "previewUrl": "https://media.liche.cn/meijiaka-zj/voice/danya_xuejie.mp3",
+    },
+    {
+        "voice_id": "Chinese (Mandarin)_Warm_Girl",
+        "name": "温暖少女",
+        "language": "zh",
+        "description": "温暖亲切，清新自然",
+        "recommended": False,
+        "previewUrl": "https://media.liche.cn/meijiaka-zj/voice/Warm_Girl.mp3",
+    },
+    {
+        "voice_id": "Chinese (Mandarin)_Radio_Host",
+        "name": "电台男主播",
+        "language": "zh",
+        "description": "专业播报，沉稳有力",
+        "recommended": False,
+        "previewUrl": "https://media.liche.cn/meijiaka-zj/voice/Radio_Host.mp3",
+    },
+    {
+        "voice_id": "Chinese (Mandarin)_Straightforward_Boy",
+        "name": "率真弟弟",
+        "language": "zh",
+        "description": "率真爽朗，青春阳光",
+        "recommended": False,
+        "previewUrl": "https://media.liche.cn/meijiaka-zj/voice/Straightforward_Boy.mp3",
+    },
+    {
+        "voice_id": "Chinese (Mandarin)_Gentleman",
+        "name": "温润男声",
+        "language": "zh",
+        "description": "温润如玉，低沉磁性",
+        "recommended": False,
+        "previewUrl": "https://media.liche.cn/meijiaka-zj/voice/Gentleman.mp3",
+    },
+]
+
+DEFAULT_VOICE_ID = "tianxin_xiaoling"
+
+
+class ViduTTSService:
+    """Vidu TTS 服务封装"""
+
+    def __init__(self):
+        self.provider = ViduProvider()
+
+    # ==================== 预设音色 ====================
+
+    @staticmethod
+    def get_preset_voices() -> list[dict]:
+        """获取预设音色列表"""
+        return VIDU_PRESET_VOICES
+
+    @staticmethod
+    def get_voice_by_id(voice_id: str) -> dict | None:
+        """根据 ID 获取音色信息"""
+        for voice in VIDU_PRESET_VOICES:
+            if voice["voice_id"] == voice_id:
+                return voice
+        return None
+
+    # ==================== 同步 TTS ====================
+
+    async def synthesize_sync(
+        self,
+        text: str,
+        voice_id: str | None = None,
+        speed: float = 1.0,
+        volume: int = 0,
+        pitch: int = 0,
+        **kwargs,
+    ) -> str:
+        """
+        同步语音合成，返回音频 URL。
+
+        Args:
+            text: 待合成文本（≤10000 字符）
+            voice_id: 音色 ID（默认：甜心小玲）
+            speed: 语速（0.5-2.0）
+            volume: 音量（0-10，0=正常）
+            pitch: 语调（-12~12）
+
+        Returns:
+            音频 URL
+        """
+        if not text or not text.strip():
+            raise ValueError("text 不能为空")
+
+        voice = voice_id or DEFAULT_VOICE_ID
+
+        result = await self.provider.tts_sync(
+            text=text,
+            voice_id=voice,
+            speed=speed,
+            volume=volume,
+            pitch=pitch,
+            **kwargs,
+        )
+
+        audio_url = result.get("file_url")
+        if not audio_url:
+            raise ValueError("TTS 合成失败: 未返回音频 URL")
+
+        logger.info(f"[Vidu TTS] 合成成功: voice_id={voice}, url={audio_url[:60]}...")
+        return audio_url
+
+    # ==================== 声音复刻 ====================
+
+    async def clone_voice(
+        self,
+        audio_url: str,
+        voice_id: str,
+        text: str | None = None,
+        prompt_audio_url: str | None = None,
+        prompt_text: str | None = None,
+    ) -> dict[str, Any]:
+        """
+        声音复刻（同步接口）。
+
+        Args:
+            audio_url: 原音频 URL
+            voice_id: 自定义 voice_id（8~256字符，首字符字母）
+            text: 试听文本（≤1000字符，不传则不会生成试听音频）
+            prompt_audio_url: 示例音频 URL（<8秒）
+            prompt_text: 示例音频对应文本
+
+        Returns:
+            复刻结果 dict，包含 voice_id、demo_audio 等
+        """
+        trial_text = text or "你好，欢迎使用vidu开放平台"
+
+        result = await self.provider.clone_voice(
+            audio_url=audio_url,
+            voice_id=voice_id,
+            text=trial_text,
+            prompt_audio_url=prompt_audio_url,
+            prompt_text=prompt_text,
+        )
+
+        logger.info(f"[Vidu Clone] 复刻成功: voice_id={result.get('voice_id')}")
+        return result
+
+    async def query_clone_task(self, voice_id: str) -> dict[str, Any]:
+        """
+        Vidu 声音复刻是同步接口，无独立查询。
+        此方法仅做兼容，返回已知的 voice_id 信息。
+        """
+        return {"voice_id": voice_id, "status": "succeeded"}
+
+    # ==================== 对口型 ====================
+
+    async def lip_sync_create(
+        self,
+        video_url: str,
+        audio_url: str | None = None,
+        text: str | None = None,
+        voice_id: str | None = None,
+        speed: float = 1.0,
+        volume: int = 0,
+        ref_photo_url: str | None = None,
+        callback_url: str | None = None,
+    ) -> str:
+        """
+        创建对口型任务（异步接口），返回 task_id。
+
+        Args:
+            video_url: 原视频 URL
+            audio_url: 音频 URL（与 text 二选一）
+            text: 文本内容（与 audio_url 二选一）
+            voice_id: 音色 ID（文字驱动时生效）
+            speed: 语速（文字驱动时生效）
+            volume: 音量（文字驱动时生效）
+            ref_photo_url: 人脸参考图 URL
+            callback_url: 回调地址
+
+        Returns:
+            task_id
+        """
+        result = await self.provider.lip_sync(
+            video_url=video_url,
+            audio_url=audio_url,
+            text=text,
+            voice_id=voice_id,
+            speed=speed,
+            volume=volume,
+            ref_photo_url=ref_photo_url,
+            callback_url=callback_url,
+        )
+
+        task_id = result.get("task_id")
+        if not task_id:
+            raise ValueError("对口型任务创建失败: 未返回 task_id")
+
+        logger.info(f"[Vidu LipSync] 任务创建成功: task_id={task_id}")
+        return task_id
+
+    async def lip_sync_query(self, task_id: str) -> dict[str, Any]:
+        """
+        查询对口型任务状态及生成物。
+
+        Returns:
+            任务状态 dict，包含 state、creations 等
+        """
+        result = await self.provider.query_task(task_id)
+        logger.info(f"[Vidu LipSync] 查询状态: task_id={task_id}, state={result.get('state')}")
+        return result
@@ -12,6 +12,14 @@ services:
      - REDIS_PORT=6379
      - REDIS_DB=1
      - SECRET_KEY=dev-secret-key-change-in-production
+      - MINIMAX_API_KEY=${MINIMAX_API_KEY}
+      - MINIMAX_BASE_URL=${MINIMAX_BASE_URL:-https://api.minimaxi.com}
+      - VIDU_API_KEY=${VIDU_API_KEY}
+      - VIDU_BASE_URL=${VIDU_BASE_URL:-https://api.vidu.cn}
+      - MINIMAX_API_KEY=${MINIMAX_API_KEY}
+      - MINIMAX_BASE_URL=${MINIMAX_BASE_URL:-https://api.minimaxi.com}
+      - VIDU_API_KEY=${VIDU_API_KEY}
+      - VIDU_BASE_URL=${VIDU_BASE_URL:-https://api.vidu.cn}
    volumes:
      - .:/app
      - ~/Documents/Meijiaka-zj:/root/Documents/Meijiaka-zj
@@ -1,245 +1,306 @@
-/**
- * VoiceDubbing 样式
- * ==================
- */
+/* 语音配音页面 — 遵循项目样式规范 */

 .voice-dubbing {
  width: 100%;
+  height: 100%;
+  display: flex;
+  flex-direction: column;
 }

+/* 左右分栏 */
 .dubbing-layout {
  display: grid;
  grid-template-columns: 1fr 1fr;
-  gap: var(--spacing-lg);
-  margin-top: var(--spacing-md);
+  gap: var(--spacing-xl);
+  flex: 1;
+  min-height: 0;
 }

-.voice-panel,
-.mapping-panel {
+/* ========== 左侧 ========== */
+
+.voice-sidebar {
  display: flex;
  flex-direction: column;
-  gap: var(--spacing-md);
+  gap: var(--spacing-lg);
+  min-height: 0;
+  overflow: hidden;
 }

-.panel-section {
-  background: var(--bg-card);
-  border: 1px solid var(--border-light);
-  border-radius: var(--radius-lg);
-  padding: var(--spacing-md);
+.voice-sidebar > .voice-section:first-child {
+  flex: 1;
+  min-height: 0;
+  overflow: hidden;
 }

-.panel-section h4 {
-  font-size: 13px;
+.voice-list {
+  flex: 1;
+  min-height: 0;
+  overflow-y: auto;
+}
+
+.voice-section {
+  display: flex;
+  flex-direction: column;
+  gap: var(--spacing-sm);
+}
+
+.voice-section-header {
+  display: flex;
+  justify-content: space-between;
+  align-items: center;
+}
+
+.voice-section-title {
+  font-size: var(--font-sm);
  font-weight: 600;
  color: var(--text-primary);
-  margin-bottom: var(--spacing-sm);
 }

-/* 音色网格 */
-.voice-grid {
-  display: grid;
-  grid-template-columns: 1fr 1fr;
+.link-btn {
+  font-size: var(--font-sm);
+  color: var(--primary);
+  background: none;
+  border: none;
+  cursor: pointer;
+  padding: 0;
+}
+
+.link-btn:hover {
+  text-decoration: underline;
+}
+
+/* Tab — 遵循项目选项卡风格 */
+.voice-tabs {
+  display: flex;
+  gap: 0;
+  border-bottom: 1px solid var(--border-light);
+}
+
+.voice-tab {
+  padding: 6px 12px;
+  border: none;
+  border-bottom: 2px solid transparent;
+  background: none;
+  color: var(--text-secondary);
+  font-size: var(--font-sm);
+  cursor: pointer;
+  transition: all var(--transition-fast);
+}
+
+.voice-tab:hover {
+  color: var(--primary);
+}
+
+.voice-tab.active {
+  border-bottom-color: var(--primary);
+  color: var(--primary);
+  font-weight: 600;
+}
+
+/* 试听条 */
+.voice-preview-bar {
+  display: flex;
+  align-items: center;
+  gap: var(--spacing-sm);
+  padding: var(--spacing-sm);
+  background: var(--primary-light);
+  border-radius: var(--radius-md);
+}
+
+.voice-preview-audio {
+  flex: 1;
+  height: 28px;
+}
+
+/* 音色列表 — 遵循 .option-card 规范 */
+.voice-list {
+  display: flex;
+  flex-direction: column;
  gap: var(--spacing-xs);
 }

-.voice-card {
-  border: 1px solid var(--border-light);
+.voice-row {
+  display: flex;
+  flex-direction: column;
+  padding: var(--spacing-sm) var(--spacing-md);
  border-radius: var(--radius-md);
-  padding: var(--spacing-sm);
+  border: 1px solid var(--border-color);
+  background: var(--bg-card);
  cursor: pointer;
-  transition: all 0.15s ease;
-  background: var(--bg-primary);
+  transition: all var(--transition-fast);
 }

-.voice-card:hover {
-  border-color: var(--primary-light);
+.voice-row:hover {
+  border-color: var(--primary);
  background: var(--bg-hover);
 }

-.voice-card.selected {
+.voice-row.selected {
  border-color: var(--primary);
-  background: color-mix(in srgb, var(--primary) 5%, var(--bg-card));
+  background: var(--primary-light);
 }

-.voice-name {
-  font-size: 13px;
-  font-weight: 600;
+.voice-row-main {
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+  width: 100%;
+}
+
+.voice-row-info {
+  flex: 1;
+  min-width: 0;
+}
+
+.voice-row-name {
+  font-size: var(--font-sm);
+  font-weight: 500;
  color: var(--text-primary);
  display: flex;
  align-items: center;
  gap: 6px;
 }

-.recommended-tag {
-  font-size: 10px;
-  background: color-mix(in srgb, var(--primary) 15%, transparent);
-  color: var(--primary);
-  padding: 1px 5px;
-  border-radius: var(--radius-sm);
-  font-weight: 500;
-}
-
-.voice-desc {
-  font-size: 11px;
+.voice-row-desc {
+  font-size: var(--font-xs);
  color: var(--text-secondary);
  margin-top: 2px;
 }

-/* 试听 */
-.preview-row {
-  display: flex;
-  gap: var(--spacing-sm);
-  align-items: flex-end;
-}
-
-.preview-text {
-  flex: 1;
-  padding: var(--spacing-sm);
-  border: 1px solid var(--border-light);
-  border-radius: var(--radius-md);
-  font-size: 13px;
-  resize: none;
-  line-height: 1.5;
-  font-family: inherit;
-}
-
-.preview-audio {
-  width: 100%;
-  height: 36px;
-  margin-top: var(--spacing-sm);
-}
-
-/* 批量合成 */
-.batch-info {
-  display: flex;
-  gap: var(--spacing-md);
-  font-size: 12px;
+.voice-row-desc-inline {
+  font-size: var(--font-xs);
  color: var(--text-secondary);
-  margin-bottom: var(--spacing-sm);
-  flex-wrap: wrap;
+  margin-left: 8px;
+  font-weight: 400;
 }

-.batch-btn {
-  width: 100%;
+/* 标签 — 遵循全局 .tag 风格，不覆盖 */
+.voice-row-name .tag {
+  font-size: var(--font-xs);
+  padding: 1px 5px;
 }

-.progress-bar {
-  height: 4px;
-  background: var(--bg-light);
-  border-radius: 2px;
-  overflow: hidden;
-  margin-top: var(--spacing-sm);
+/* 试听按钮 — 图标按钮风格 */
+.preview-icon {
+  width: 32px;
+  height: 32px;
+  display: inline-flex;
+  align-items: center;
+  justify-content: center;
+  border: none;
+  border-radius: var(--radius-md);
+  background: var(--bg-input);
+  color: var(--text-secondary);
+  font-size: var(--font-xs);
+  cursor: pointer;
+  flex-shrink: 0;
+  transition: all var(--transition-fast);
 }

-.progress-fill {
-  height: 100%;
+.preview-icon:hover {
  background: var(--primary);
-  transition: width 0.3s ease;
+  color: var(--text-inverse);
 }

-/* 分镜配音列表 */
-.segment-voice-list {
-  display: flex;
-  flex-direction: column;
-  gap: var(--spacing-xs);
-  max-height: 400px;
-  overflow-y: auto;
-}
-
-.seg-voice-item {
-  border: 1px solid var(--border-light);
-  border-radius: var(--radius-md);
-  padding: var(--spacing-sm);
-  background: var(--bg-primary);
-}
-
-.seg-voice-item.empty-shot {
+.preview-icon:disabled {
  opacity: 0.5;
+  cursor: not-allowed;
 }

-.seg-voice-info {
-  display: flex;
-  flex-direction: column;
-  gap: 4px;
+/* 空状态 */
+.voice-empty {
+  padding: var(--spacing-xl);
+  text-align: center;
+  color: var(--text-secondary);
+  font-size: var(--font-sm);
 }

-.seg-voice-index {
-  font-size: 12px;
+.voice-empty small {
+  font-size: var(--font-xs);
+  opacity: 0.7;
+}
+
+/* 语速 */
+.speed-value {
+  font-size: var(--font-sm);
+  color: var(--primary);
  font-weight: 600;
-  color: var(--text-primary);
 }

-.seg-has-audio {
-  display: flex;
-  flex-direction: column;
-  gap: 4px;
-}
-
-.audio-name {
-  font-size: 11px;
-  color: var(--success);
-}
-
-.seg-audio-player {
-  height: 28px;
-  width: 100%;
-}
-
-.seg-no-audio {
-  font-size: 11px;
+.speed-value small {
+  font-weight: 400;
  color: var(--text-secondary);
+  margin-left: 4px;
 }

-.seg-voiceover {
-  font-size: 11px;
-  color: var(--text-secondary);
-  margin-top: 4px;
-  line-height: 1.4;
-  overflow: hidden;
-  text-overflow: ellipsis;
-  white-space: nowrap;
-}
-
-/* 音频文件库 */
-.audio-file-list {
-  display: flex;
-  flex-direction: column;
-  gap: var(--spacing-xs);
-}
-
-.audio-file-item {
+.speed-slider-wrap {
  display: flex;
  align-items: center;
-  gap: var(--spacing-sm);
-  padding: var(--spacing-xs) 0;
-  border-bottom: 1px solid var(--border-light);
+  gap: var(--spacing-md);
+  width: 100%;
 }

-.audio-file-item:last-child {
-  border-bottom: none;
-}
-
-.audio-file-info {
-  flex: 1;
-  min-width: 0;
-}
-
-.audio-file-name {
-  font-size: 12px;
-  font-weight: 500;
-  color: var(--text-primary);
-  display: block;
-  overflow: hidden;
-  text-overflow: ellipsis;
+.speed-slider-wrap span {
+  font-size: var(--font-xs);
+  color: var(--text-tertiary);
  white-space: nowrap;
+  flex-shrink: 0;
+  min-width: 36px;
+  text-align: center;
 }

-.audio-file-size {
-  font-size: 11px;
+.speed-slider-wrap .slider-input {
+  flex: 1;
+}
+
+/* 底部生成按钮 — 复用全局 .btn-primary，只做宽度调整 */
+.voice-generate-wrap {
+  margin-top: auto;
+  padding-top: var(--spacing-md);
+}
+
+.voice-generate-wrap .btn {
+  width: 100%;
+}
+
+/* ========== 右侧 ========== */
+
+.script-content {
+  display: flex;
+  flex-direction: column;
+  min-height: 0;
+}
+
+.script-content-header {
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+  font-size: var(--font-sm);
+  font-weight: 600;
+  color: var(--text-primary);
+  margin-bottom: var(--spacing-sm);
+}
+
+.script-content-meta {
+  font-size: var(--font-xs);
+  font-weight: 400;
  color: var(--text-secondary);
 }

-.audio-file-player {
-  height: 28px;
-  flex-shrink: 0;
+/* textarea 撑满剩余空间 */
+.script-content textarea {
+  flex: 1;
+  min-height: 0;
+  line-height: 1.8;
+}
+
+/* 内嵌试听播放器 */
+.voice-preview-inline {
+  margin-top: var(--spacing-sm);
+  padding-top: var(--spacing-sm);
+  border-top: 1px solid var(--border-light);
+}
+
+.voice-preview-inline .voice-preview-audio {
+  width: 100%;
 }
@@ -1,314 +1,288 @@
 /**
- * 配音管理页面
- * =============
+ * 语音配音页面 (Step 3)
+ * ======================
 *
- * TTS 文本转语音：选择音色、批量合成旁白配音。
- * 管理项目音频文件，关联到分镜。
+ * 布局：左侧窄栏（音色 + 语速 + 生成按钮固定底部）| 右侧宽栏（配音文案）
 */

-import { useState, useEffect, useCallback, useRef } from 'react';
+import { useState, useEffect, useMemo, useCallback } from 'react';
 import { useProjectStore } from '../../store';
 import { useVoiceStore } from '../../store/voiceStore';
 import { getCurrentProjectId } from '../../api/modules/localStorage';
-import { synthesizeTTS, synthesizeBatchTTS } from '../../api/modules/voice';
-import { saveAudio } from '../../api/modules/voice';
+import { synthesizeTTS, saveAudio, uploadAudio } from '../../api/modules/voice';
 import { toast } from '../../store/uiStore';
+import { useProgressStore } from '../../store/progressStore';
 import './VoiceDubbing.css';

 export default function VoiceDubbing() {
+  const projectId = getCurrentProjectId();
  const segments = useProjectStore(state => state.segments);
  const updateSegment = useProjectStore(state => state.updateSegment);
-  const projectId = getCurrentProjectId();

  const {
    presetVoices,
+    voiceMaterials,
    selectedVoiceId,
+    speed,
+    volume,
+    pitch,
    loadPresetVoices,
+    loadVoiceMaterials,
    setSelectedVoiceId,
-    projectAudios,
+    setSpeed,
+    setVolume,
+    setPitch,
    loadProjectAudios,
-    getAudioForSegment,
    setAudioMapping,
  } = useVoiceStore();

-  const [isSynthesizing, setIsSynthesizing] = useState(false);
-  const [synthProgress, setSynthProgress] = useState(0);
-  const [synthTotal, setSynthTotal] = useState(0);
-  const [customText, setCustomText] = useState('');
-  const [customPreviewUrl, setCustomPreviewUrl] = useState<string | null>(null);
-  const audioPreviewRef = useRef<HTMLAudioElement>(null);
+  const [isGenerating, setIsGenerating] = useState(false);
+  const [activeVoiceTab, setActiveVoiceTab] = useState<'preset' | 'clone'>('preset');
+  const [activePreviewVoiceId, setActivePreviewVoiceId] = useState<string | null>(null);

-  // 加载音色和项目音频
  useEffect(() => {
    loadPresetVoices();
-    if (projectId) {
-      loadProjectAudios(projectId);
-    }
+    loadVoiceMaterials();
+    if (projectId) loadProjectAudios(projectId);
  }, [projectId]);

-  // 获取有旁白文本的分镜（排除空镜）
-  const voicedSegments = segments.filter(s => s.type !== 'empty_shot' && s.voiceover);
-  const totalChars = voicedSegments.reduce((sum, s) => sum + (s.voiceover?.length || 0), 0);
+  const mergedText = useMemo(
+    () => segments.map(s => s.voiceover?.trim() || '【空镜】').join('\n'),
+    [segments]
+  );
+  const totalChars = mergedText.length;

-  // 批量合成所有旁白
-  const handleBatchSynthesize = useCallback(async () => {
-    if (!projectId || voicedSegments.length === 0) {
-      toast.warn('没有需要合成的旁白');
+  const handleTogglePreview = useCallback((voiceId: string, voiceName: string, e: React.MouseEvent) => {
+    e.stopPropagation();
+    // 点击同一个就是关闭
+    if (activePreviewVoiceId === voiceId) {
+      setActivePreviewVoiceId(null);
      return;
    }
+    setActivePreviewVoiceId(voiceId);
+  }, [activePreviewVoiceId]);

-    setIsSynthesizing(true);
-    setSynthProgress(0);
-    setSynthTotal(voicedSegments.length);
-
-    let successCount = 0;
-    let failCount = 0;
-
-    try {
-      for (let i = 0; i < voicedSegments.length; i++) {
-        const seg = voicedSegments[i];
-        const segId = seg.id?.toString() || String(i);
-        const text = seg.voiceover || '';
-
-        setSynthProgress(i + 1);
-
-        try {
-          // 同步 TTS 合成（≤200字）
-          const result = await synthesizeTTS({
-            text,
-            voiceId: selectedVoiceId,
-            speed: 1.0,
-          });
-
-          if (!result.audioBase64) {
-            throw new Error('未返回音频数据');
-          }
-
-          // 保存到本地
-          const audioId = `tts_${segId}_${Date.now()}`;
-          const meta = await saveAudio({
-            projectId,
-            audioId,
-            audioData: result.audioBase64,
-            name: `旁白-${segId}`,
-            voiceId: selectedVoiceId,
-            duration: 0, // 暂时无法获取时长
-            segmentId: segId,
-          });
-
-          // 关联到分镜
-          setAudioMapping(segId, meta.id);
-
-          // 更新分镜 audioPath
-          updateSegment(seg.id!, { audioPath: meta.filePath });
-          successCount++;
-        } catch (err) {
-          console.error(`[VoiceDubbing] 分镜 ${segId} 合成失败:`, err);
-          failCount++;
-        }
-      }
-
-      if (successCount > 0) {
-        toast.success(`配音合成完成：成功 ${successCount} 段${failCount > 0 ? `，失败 ${failCount} 段` : ''}`);
-      } else {
-        toast.error('配音合成全部失败');
-      }
-    } finally {
-      setIsSynthesizing(false);
-      setSynthProgress(0);
-    }
-  }, [projectId, voicedSegments, selectedVoiceId, updateSegment, setAudioMapping]);
-
-  // 试听音色
-  const handlePreviewVoice = useCallback(async () => {
-    if (!customText.trim()) {
-      toast.warn('请输入要预览的文本');
-      return;
-    }
-
-    try {
-      setCustomPreviewUrl(null);
-      const result = await synthesizeTTS({
-        text: customText.slice(0, 200),
-        voiceId: selectedVoiceId,
-        speed: 1.0,
-      });
-
-      if (!result.audioBase64) {
-        throw new Error('未返回音频数据');
-      }
-
-      const audioBlob = new Blob(
-        [Uint8Array.from(atob(result.audioBase64), c => c.charCodeAt(0))],
-        { type: 'audio/mp3' }
-      );
-      const url = URL.createObjectURL(audioBlob);
-      setCustomPreviewUrl(url);
-    } catch (err) {
-      toast.error(`试听失败: ${err instanceof Error ? err.message : String(err)}`);
-    }
-  }, [customText, selectedVoiceId]);
-
-  // 将项目音频关联到分镜
-  const handleAssignToSegment = (audioId: string, segmentId: string) => {
-    setAudioMapping(segmentId, audioId);
-
-    // 同时更新分镜的 audioPath
-    const audio = projectAudios.find(a => a.id === audioId);
-    if (audio) {
-      updateSegment(parseInt(segmentId), { audioPath: audio.filePath });
-    }
-    toast.success('已关联到分镜');
+  const getPreviewUrl = (voiceId: string): string | null => {
+    const voice = presetVoices.find(v => v.voiceId === voiceId);
+    return voice?.previewUrl || null;
  };

-  const selectedVoice = presetVoices.find(v => v.voiceId === selectedVoiceId);
+  const handleGenerate = useCallback(async () => {
+    if (!projectId) { toast.warning('请先创建项目'); return; }
+    const realText = segments.map(s => s.voiceover?.trim()).filter(Boolean).join('\n');
+    if (!realText) { toast.warning('没有需要合成的旁白文本'); return; }
+    // Kling TTS 限制单次 ≤1000 字，超长自动截断
+    const truncatedText = realText.length > 1000 ? realText.slice(0, 1000) : realText;
+
+    const progress = useProgressStore.getState();
+    setIsGenerating(true);
+    progress.show('生成配音');
+
+    try {
+      progress.update('正在合成语音...');
+      const result = await synthesizeTTS({ text: truncatedText, voiceId: selectedVoiceId, speed, volume, pitch });
+      if (!result.audioUrl) throw new Error('未返回音频 URL');
+
+      progress.update('正在保存音频...');
+      // 下载音频 blob
+      const response = await fetch(result.audioUrl);
+      if (!response.ok) throw new Error('下载音频失败');
+      const blob = await response.blob();
+
+      // 上传七牛云
+      const file = new File([blob], `tts_${Date.now()}.mp3`, { type: 'audio/mp3' });
+      const qiniuUrl = await uploadAudio(file);
+
+      // 本地保存
+      const base64 = await new Promise<string>((resolve, reject) => {
+        const reader = new FileReader();
+        reader.onloadend = () => {
+          const dataUrl = reader.result as string;
+          resolve(dataUrl.split(',')[1]);
+        };
+        reader.onerror = reject;
+        reader.readAsDataURL(blob);
+      });
+
+      const audioId = `voice_${Date.now()}`;
+      const meta = await saveAudio({
+        projectId, audioId, audioData: base64,
+        name: `配音-${segments.length}段`, voiceId: selectedVoiceId, duration: 0,
+      });
+
+      for (const seg of segments) {
+        const segId = seg.id;
+        if (segId) {
+          setAudioMapping(segId.toString(), meta.id);
+          updateSegment(segId, { audioPath: meta.filePath, audioUrl: qiniuUrl });
+        }
+      }
+      progress.success('配音生成完成');
+    } catch (err) {
+      progress.error(err instanceof Error ? err.message : '生成失败');
+    } finally {
+      setIsGenerating(false);
+    }
+  }, [projectId, segments, selectedVoiceId, speed, volume, pitch, setAudioMapping, updateSegment]);

  return (
    <div className="voice-dubbing">
-      <div className="step-header">
-        <h2>配音管理</h2>
-        <p className="step-desc">
-          {voicedSegments.length} 个分镜待配音，共 {totalChars} 字
-        </p>
-      </div>
-
      <div className="dubbing-layout">
-        {/* 左侧：音色选择 + 批量合成 */}
-        <div className="voice-panel">
+        {/* 左侧：音色 + 语速 + 生成按钮 */}
+        <div className="voice-sidebar">
          {/* 音色选择 */}
-          <div className="panel-section">
-            <h4>选择音色</h4>
-            <div className="voice-grid">
-              {presetVoices.map(voice => (
-                <div
-                  key={voice.voiceId}
-                  className={`voice-card ${voice.voiceId === selectedVoiceId ? 'selected' : ''}`}
-                  onClick={() => setSelectedVoiceId(voice.voiceId)}
-                >
-                  <div className="voice-name">
-                    {voice.name}
-                    {voice.recommended && <span className="recommended-tag">推荐</span>}
-                  </div>
-                  <div className="voice-desc">{voice.description}</div>
-                </div>
-              ))}
+          <div className="voice-section">
+            <div className="voice-section-header">
+              <span className="voice-section-title">选择音色</span>
            </div>
-          </div>

-          {/* 音色试听 */}
-          <div className="panel-section">
-            <h4>试听音色</h4>
-            <div className="preview-row">
-              <textarea
-                className="preview-text"
-                value={customText}
-                onChange={e => setCustomText(e.target.value)}
-                placeholder="输入文本试听音色（≤200字）..."
-                rows={3}
-                maxLength={200}
-              />
-              <button
-                className="btn btn-secondary"
-                onClick={handlePreviewVoice}
-                disabled={!customText.trim()}
-              >
-                试听
+            <div className="voice-tabs">
+              <button className={`voice-tab ${activeVoiceTab === 'preset' ? 'active' : ''}`} onClick={() => setActiveVoiceTab('preset')}>
+                系统预设 ({presetVoices.length})
+              </button>
+              <button className={`voice-tab ${activeVoiceTab === 'clone' ? 'active' : ''}`} onClick={() => setActiveVoiceTab('clone')}>
+                私有音色 ({voiceMaterials.filter(m => m.status === 'ready').length})
              </button>
            </div>
-            {customPreviewUrl && (
-              <audio ref={audioPreviewRef} src={customPreviewUrl} controls className="preview-audio" />
-            )}
-          </div>

-          {/* 批量合成 */}
-          <div className="panel-section">
-            <h4>批量配音</h4>
-            <div className="batch-info">
-              <span>音色：{selectedVoice?.name}</span>
-              <span>分镜：{voicedSegments.length} 个</span>
-              <span>字数：约 {totalChars} 字</span>
-            </div>
-            <button
-              className="btn btn-primary batch-btn"
-              onClick={handleBatchSynthesize}
-              disabled={isSynthesizing || voicedSegments.length === 0}
-            >
-              {isSynthesizing
-                ? `合成中... ${synthProgress}/${synthTotal}`
-                : `为 ${voicedSegments.length} 个分镜生成配音`}
-            </button>
-            {isSynthesizing && (
-              <div className="progress-bar">
-                <div
-                  className="progress-fill"
-                  style={{ width: `${(synthProgress / synthTotal) * 100}%` }}
-                />
-              </div>
-            )}
-          </div>
-        </div>
-
-        {/* 右侧：分镜-配音映射 */}
-        <div className="mapping-panel">
-          <div className="panel-section">
-            <h4>分镜配音状态</h4>
-            <div className="segment-voice-list">
-              {segments.map((seg, i) => {
-                const segId = seg.id?.toString() || String(i);
-                const audio = getAudioForSegment(segId);
-                const isEmptyShot = seg.type === 'empty_shot';
-
-                return (
-                  <div key={segId} className={`seg-voice-item ${isEmptyShot ? 'empty-shot' : ''}`}>
-                    <div className="seg-voice-info">
-                      <span className="seg-voice-index">
-                        {isEmptyShot ? '🎬' : '🎙️'} 镜头 {i + 1}
-                      </span>
-                      {audio ? (
-                        <div className="seg-has-audio">
-                          <span className="audio-name">{audio.name}</span>
-                          <audio
-                            src={`file://${audio.filePath}`}
-                            controls
-                            className="seg-audio-player"
-                          />
+            {activeVoiceTab === 'preset' && (
+              <div className="voice-list">
+                {presetVoices.map(v => (
+                  <div key={v.voiceId} className={`voice-row ${v.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => setSelectedVoiceId(v.voiceId)}>
+                    <div className="voice-row-main">
+                      <div className="voice-row-info">
+                        <div className="voice-row-name">
+                          {v.name}
+                          <span className="voice-row-desc-inline">{v.description}</span>
                        </div>
-                      ) : (
-                        <span className="seg-no-audio">
-                          {isEmptyShot ? '空镜无需配音' : '未配音'}
-                        </span>
-                      )}
+                      </div>
+                      <button className="preview-icon" onClick={e => handleTogglePreview(v.voiceId, v.name, e)}>
+                        {activePreviewVoiceId === v.voiceId ? '✕' : '▶'}
+                      </button>
                    </div>
-                    <div className="seg-voiceover">{seg.voiceover || ''}</div>
-                  </div>
-                );
-              })}
-            </div>
-          </div>
-
-          {/* 音频文件列表 */}
-          {projectAudios.length > 0 && (
-            <div className="panel-section">
-              <h4>音频文件库</h4>
-              <div className="audio-file-list">
-                {projectAudios.map(audio => (
-                  <div key={audio.id} className="audio-file-item">
-                    <div className="audio-file-info">
-                      <span className="audio-file-name">{audio.name}</span>
-                      <span className="audio-file-size">
-                        {(audio.fileSize / 1024).toFixed(1)} KB
-                      </span>
-                    </div>
-                    <audio
-                      src={`file://${audio.filePath}`}
-                      controls
-                      className="audio-file-player"
-                    />
+                    {activePreviewVoiceId === v.voiceId && v.previewUrl && (
+                      <div className="voice-preview-inline">
+                        <audio src={v.previewUrl} controls className="voice-preview-audio" autoPlay />
+                      </div>
+                    )}
                  </div>
                ))}
              </div>
+            )}
+
+            {activeVoiceTab === 'clone' && (
+              <div className="voice-list">
+                {voiceMaterials.filter(m => m.status === 'ready').length === 0 ? (
+                  <div className="voice-empty">暂无私有音色<br /><small>去素材库上传音频并克隆音色</small></div>
+                ) : (
+                  voiceMaterials.filter(m => m.status === 'ready').map(m => (
+                    <div key={m.voiceId} className={`voice-row ${m.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => setSelectedVoiceId(m.voiceId)}>
+                      <div className="voice-row-main">
+                        <div className="voice-row-info">
+                          <div className="voice-row-name">
+                            {m.name} <span className="tag clone">克隆</span>
+                            <span className="voice-row-desc-inline">
+                              {m.createdAt ? new Date(m.createdAt).toLocaleDateString('zh-CN') : ''}
+                            </span>
+                          </div>
+                        </div>
+                        <button className="preview-icon" onClick={e => handleTogglePreview(m.voiceId, m.name, e)}>
+                          {activePreviewVoiceId === m.voiceId ? '✕' : '▶'}
+                        </button>
+                      </div>
+                      {activePreviewVoiceId === m.voiceId && m.trialUrl && (
+                        <div className="voice-preview-inline">
+                          <audio src={m.trialUrl} controls className="voice-preview-audio" autoPlay />
+                        </div>
+                      )}
+                    </div>
+                  ))
+                )}
+              </div>
+            )}
+          </div>
+
+          {/* 语速 */}
+          <div className="voice-section">
+            <div className="voice-section-header">
+              <span className="voice-section-title">语速</span>
+              <span className="speed-value">{speed.toFixed(1)}x</span>
            </div>
-          )}
+            <div className="speed-slider-wrap">
+              <span>0.5x</span>
+              <input
+                type="range"
+                className="slider-input"
+                min={5}
+                max={20}
+                step={1}
+                value={Math.round(speed * 10)}
+                onChange={e => setSpeed(parseInt(e.target.value) / 10)}
+                style={{ '--slider-percent': `${((Math.round(speed * 10) - 5) / 15) * 100}%` } as React.CSSProperties}
+              />
+              <span>2.0x</span>
+            </div>
+          </div>
+
+          {/* 音量 */}
+          <div className="voice-section">
+            <div className="voice-section-header">
+              <span className="voice-section-title">音量</span>
+              <span className="speed-value">{volume}</span>
+            </div>
+            <div className="speed-slider-wrap">
+              <span>0</span>
+              <input
+                type="range"
+                className="slider-input"
+                min={0}
+                max={10}
+                step={1}
+                value={volume}
+                onChange={e => setVolume(parseInt(e.target.value))}
+                style={{ '--slider-percent': `${(volume / 10) * 100}%` } as React.CSSProperties}
+              />
+              <span>10</span>
+            </div>
+          </div>
+
+          {/* 音调 */}
+          <div className="voice-section">
+            <div className="voice-section-header">
+              <span className="voice-section-title">音调</span>
+              <span className="speed-value">{pitch}</span>
+            </div>
+            <div className="speed-slider-wrap">
+              <span>-12</span>
+              <input
+                type="range"
+                className="slider-input"
+                min={-12}
+                max={12}
+                step={1}
+                value={pitch}
+                onChange={e => setPitch(parseInt(e.target.value))}
+                style={{ '--slider-percent': `${((pitch + 12) / 24) * 100}%` } as React.CSSProperties}
+              />
+              <span>12</span>
+            </div>
+          </div>
+
+          {/* 底部生成按钮 */}
+          <div className="voice-generate-wrap">
+            <button className="btn btn-primary generate-btn" onClick={handleGenerate} disabled={isGenerating || !mergedText.trim()}>
+              {isGenerating ? '合成中...' : '生成配音'}
+            </button>
+          </div>
+        </div>
+
+        {/* 右侧：配音文案 */}
+        <div className="script-content">
+          <div className="script-content-header">
+            配音文案
+            <span className="script-content-meta">{totalChars} 字 · {segments.length} 个分镜</span>
+          </div>
+          <textarea readOnly value={mergedText} rows={20} className="script-textarea" />
        </div>
      </div>
    </div>
@@ -7,7 +7,7 @@

 import { create } from 'zustand';
 import { useShallow } from 'zustand/react/shallow';
-import type { VoiceInfo, AudioMeta } from '../api/modules/voice';
+import type { VoiceInfo, AudioMeta, VoiceMaterial, AvatarMaterial } from '../api/modules/voice';
 import * as voiceApi from '../api/modules/voice';

 interface VoiceState {
@@ -25,9 +25,26 @@ interface VoiceState {
  // 当前项目 ID
  currentProjectId: string | null;

+  // 语速
+  speed: number;
+
+  // 音量 (0.5-10.0)
+  volume: number;
+
+  // 音调 (-10 到 10)
+  pitch: number;
+
  // 加载状态
  isLoadingVoices: boolean;
  isLoadingAudios: boolean;
+
+  // 素材库（用户上传的克隆音色）
+  voiceMaterials: VoiceMaterial[];
+  isLoadingMaterials: boolean;
+
+  // 视频素材库
+  avatarMaterials: AvatarMaterial[];
+  isLoadingAvatarMaterials: boolean;
 }

 interface VoiceActions {
@@ -35,6 +52,28 @@ interface VoiceActions {
  loadPresetVoices: () => Promise<void>;
  setSelectedVoiceId: (id: string) => void;

+  // 语速
+  setSpeed: (speed: number) => void;
+
+  // 音量
+  setVolume: (volume: number) => void;
+
+  // 音调
+  setPitch: (pitch: number) => void;
+
+  // 素材库操作
+  loadVoiceMaterials: () => Promise<void>;
+  addVoiceMaterial: (file: File, name: string) => Promise<VoiceMaterial>;
+  updateVoiceMaterialStatus: (id: string, status: VoiceMaterial['status'], voiceId?: string, trialUrl?: string) => void;
+  renameVoiceMaterial: (id: string, name: string) => Promise<void>;
+  deleteVoiceMaterial: (materialId: string) => Promise<void>;
+
+  // 视频素材库操作
+  loadAvatarMaterials: () => Promise<void>;
+  addAvatarMaterial: (file: File, name: string) => Promise<AvatarMaterial>;
+  renameAvatarMaterial: (id: string, name: string) => Promise<void>;
+  deleteAvatarMaterial: (materialId: string) => Promise<void>;
+
  // 项目音频操作
  loadProjectAudios: (projectId: string) => Promise<void>;
  saveAudio: (args: {
@@ -58,12 +97,19 @@ interface VoiceActions {

 const initialState: VoiceState = {
  presetVoices: [],
-  selectedVoiceId: '829826751244537879', // 温柔女声（Kling 预设音色）
+  selectedVoiceId: 'tianxin_xiaoling', // 甜心小玲
  projectAudios: [],
  audioMapping: {},
  currentProjectId: null,
+  speed: 1.0,
+  volume: 0,
+  pitch: 0,
  isLoadingVoices: false,
  isLoadingAudios: false,
+  voiceMaterials: [],
+  isLoadingMaterials: false,
+  avatarMaterials: [],
+  isLoadingAvatarMaterials: false,
 };

 export const useVoiceStore = create<VoiceState & VoiceActions>()(
@@ -79,14 +125,57 @@ export const useVoiceStore = create<VoiceState & VoiceActions>()(
        set({ presetVoices: voices });
      } catch (err) {
        console.error('[VoiceStore] 加载音色列表失败:', err);
-        // 静默失败，使用默认值（Kling 预设音色）
+        // 静默失败，使用预设音色
        set({
          presetVoices: [
-            { voiceId: '829826751244537879', name: '温柔女声', description: '温柔细腻', recommended: true, language: 'zh' },
-            { voiceId: '829824295735410756', name: '钓系女友', description: '甜美撒娇', recommended: false, language: 'zh' },
-            { voiceId: '829826792415842333', name: '播报男声', description: '沉稳播报', recommended: false, language: 'zh' },
-            { voiceId: '829826834144964676', name: '盐系少年', description: '清新少年', recommended: false, language: 'zh' },
-            { voiceId: '829826884271091753', name: '撒娇女友', description: '可爱撒娇', recommended: false, language: 'zh' },
+            {
+              voiceId: 'tianxin_xiaoling',
+              name: '甜心小玲',
+              description: '甜美可爱，活泼俏皮',
+              recommended: true,
+              language: 'zh',
+              previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/tianxin_xiaoling.mp3',
+            },
+            {
+              voiceId: 'danya_xuejie',
+              name: '淡雅学姐',
+              description: '淡雅知性，温婉柔和',
+              recommended: false,
+              language: 'zh',
+              previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/danya_xuejie.mp3',
+            },
+            {
+              voiceId: 'Chinese (Mandarin)_Warm_Girl',
+              name: '温暖少女',
+              description: '温暖亲切，清新自然',
+              recommended: false,
+              language: 'zh',
+              previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/Warm_Girl.mp3',
+            },
+            {
+              voiceId: 'Chinese (Mandarin)_Radio_Host',
+              name: '电台男主播',
+              description: '专业播报，沉稳有力',
+              recommended: false,
+              language: 'zh',
+              previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/Radio_Host.mp3',
+            },
+            {
+              voiceId: 'Chinese (Mandarin)_Straightforward_Boy',
+              name: '率真弟弟',
+              description: '率真爽朗，青春阳光',
+              recommended: false,
+              language: 'zh',
+              previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/Straightforward_Boy.mp3',
+            },
+            {
+              voiceId: 'Chinese (Mandarin)_Gentleman',
+              name: '温润男声',
+              description: '温润如玉，低沉磁性',
+              recommended: false,
+              language: 'zh',
+              previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/Gentleman.mp3',
+            },
          ],
        });
      } finally {
@@ -96,6 +185,144 @@ export const useVoiceStore = create<VoiceState & VoiceActions>()(

    setSelectedVoiceId: (id) => set({ selectedVoiceId: id }),

+    // ====================== 语速 ======================
+    setSpeed: (speed: number) => set({ speed }),
+
+    // ====================== 音量 ======================
+    setVolume: (volume: number) => set({ volume }),
+
+    // ====================== 音调 ======================
+    setPitch: (pitch: number) => set({ pitch }),
+
+    // ====================== 素材库操作 ======================
+    loadVoiceMaterials: async () => {
+      set({ isLoadingMaterials: true });
+      try {
+        const materials = await voiceApi.loadVoiceMaterials();
+        set({ voiceMaterials: materials });
+      } catch (err) {
+        console.error('[VoiceStore] 加载素材库失败:', err);
+      } finally {
+        set({ isLoadingMaterials: false });
+      }
+    },
+
+    addVoiceMaterial: async (file: File, name: string) => {
+      // 1. 上传七牛云
+      const sourceUrl = await voiceApi.uploadAudio(file);
+
+      // 2. 提交 Kling 克隆任务
+      const cloneResult = await voiceApi.submitCloneTask({
+        sourceAudioUrl: sourceUrl,
+        voiceName: name,
+      });
+
+      // 3. 创建本地记录
+      const material: VoiceMaterial = {
+        id: cloneResult.taskId,
+        name,
+        voiceId: '',
+        sourceUrl,
+        trialUrl: undefined,
+        status: 'pending',
+        createdAt: new Date().toISOString(),
+      };
+
+      // 4. 保存到本地 JSON
+      await voiceApi.saveVoiceMaterial(material);
+      set(state => ({ voiceMaterials: [material, ...state.voiceMaterials] }));
+
+      return material;
+    },
+
+    updateVoiceMaterialStatus: (id: string, status: VoiceMaterial['status'], voiceId?: string, trialUrl?: string) => {
+      set(state => {
+        const updated: VoiceMaterial[] = state.voiceMaterials.map((m): VoiceMaterial => {
+          if (m.id !== id) return m;
+          return {
+            ...m,
+            status,
+            voiceId: voiceId || m.voiceId,
+            trialUrl: trialUrl || m.trialUrl,
+          };
+        });
+        // 同步保存到本地
+        const target = updated.find(m => m.id === id);
+        if (target) {
+          voiceApi.saveVoiceMaterial(target).catch(err => {
+            console.error('[VoiceStore] 保存素材状态失败:', err);
+          });
+        }
+        return { voiceMaterials: updated };
+      });
+    },
+
+    renameVoiceMaterial: async (id: string, name: string) => {
+      set(state => {
+        const updated = state.voiceMaterials.map(m => m.id === id ? { ...m, name } : m);
+        const target = updated.find(m => m.id === id);
+        if (target) {
+          voiceApi.saveVoiceMaterial(target).catch(err => {
+            console.error('[VoiceStore] 重命名素材失败:', err);
+          });
+        }
+        return { voiceMaterials: updated };
+      });
+    },
+
+    deleteVoiceMaterial: async (materialId: string) => {
+      await voiceApi.deleteVoiceMaterial(materialId);
+      set(state => ({
+        voiceMaterials: state.voiceMaterials.filter(m => m.id !== materialId),
+      }));
+    },
+
+    // ====================== 视频素材库操作 ======================
+    loadAvatarMaterials: async () => {
+      set({ isLoadingAvatarMaterials: true });
+      try {
+        const materials = await voiceApi.loadAvatarMaterials();
+        set({ avatarMaterials: materials });
+      } catch (err) {
+        console.error('[VoiceStore] 加载视频素材失败:', err);
+      } finally {
+        set({ isLoadingAvatarMaterials: false });
+      }
+    },
+
+    addAvatarMaterial: async (file: File, name: string) => {
+      const videoUrl = await voiceApi.uploadVideo(file);
+      const material: AvatarMaterial = {
+        id: `avatar_${Date.now()}`,
+        name,
+        videoUrl,
+        createdAt: new Date().toISOString(),
+      };
+      await voiceApi.saveAvatarMaterial(material);
+      set(state => ({ avatarMaterials: [material, ...state.avatarMaterials] }));
+      return material;
+    },
+
+    renameAvatarMaterial: async (id: string, name: string) => {
+      set(state => {
+        const updated = state.avatarMaterials.map(m => m.id === id ? { ...m, name } : m);
+        const target = updated.find(m => m.id === id);
+        if (target) {
+          voiceApi.saveAvatarMaterial(target).catch(err => {
+            console.error('[VoiceStore] 重命名素材失败:', err);
+          });
+        }
+        return { avatarMaterials: updated };
+      });
+    },
+
+    deleteAvatarMaterial: async (materialId: string) => {
+      await voiceApi.deleteAvatarMaterial(materialId);
+      set(state => ({
+        avatarMaterials: state.avatarMaterials.filter(m => m.id !== materialId),
+      }));
+    },
+
    // ====================== 项目音频操作 ======================

    loadProjectAudios: async (projectId) => {