feat: 接入 Vidu TTS/复刻/对口型,替换 MiniMax 语音能力

- 新增 ViduProvider: TTS同步、声音复刻、对口型、任务查询
- 新增 ViduTTSService: 业务封装,6个精选中文预设音色
- Voice API 路由全面切换至 Vidu
- 新增 /voice/lip-sync 对口型异步接口
- 前端适配: 16个音色→6个、slider范围更新、音量默认0
- 添加 vidu-tts-api.md 开发文档
- docker-compose 新增 VIDU_API_KEY 环境变量映射
This commit is contained in:
小鱼开发
2026-04-21 23:26:24 +08:00
parent bb08d0f586
commit 189fdf5ed6
9 changed files with 1715 additions and 509 deletions
+290
View File
@@ -0,0 +1,290 @@
# Vidu TTS API 开发文档
> 来源:https://platform.vidu.cn/docs/text-to-speech
> 更新时间:2026-04-21
## 一、概述
Vidu(生数科技)提供语音合成(TTS)和声音复刻能力,所有接口均为**同步接口**,直接返回结果,无需轮询。
- **Base URL**: `https://api.vidu.cn`
- **认证方式**: `Authorization: Token {your_api_key}`
- **Content-Type**: `application/json`
---
## 二、语音合成 TTS
### 端点
```
POST /ent/v2/audio-tts
```
### 请求头
| 字段 | 值 | 描述 |
|------|-----|------|
| Content-Type | application/json | 数据交换格式 |
| Authorization | Token {your_api_key} | API Key 认证 |
### 请求体
| 参数名称 | 类型 | 必填 | 描述 |
|----------|------|------|------|
| text | String | 是 | 待合成文本,**< 10000 字符**。支持 `<#x#>` 停顿标记,x 为停顿时长(秒),范围 [0.01, 99.99] |
| voice_setting_voice_id | String | 是 | 音色 ID |
| voice_setting_speed | Float | 否 | 语速,默认 1.0,范围 [0.5, 2] |
| voice_setting_volume | Int | 否 | 音量,默认 0(正常音量),范围 [0, 10],值越大音量越高 |
| voice_setting_pitch | Int | 否 | 语调,默认 0(原音色),范围 [-12, 12] |
| voice_setting_emotion | String | 否 | 情绪控制:`happy`/`sad`/`angry`/`fearful`/`disgusted`/`surprised`/`calm`。一般无需手动指定,模型自动匹配 |
| pronunciation_dict_tone | list | 否 | 多音字发音定义,如 `["燕少飞/(yan4)(shao3)(fei1)"]` |
| payload | String | 否 | 透传参数,最多 1048576 字符 |
### 响应体
```json
{
"task_id": "your_task_id_here",
"state": "success",
"file_url": "https://...",
"credits": 10,
"payload": "",
"created_at": "2025-01-01T15:41:31.968916Z"
}
```
| 字段 | 类型 | 描述 |
|------|------|------|
| task_id | String | Vidu 生成的任务 ID |
| state | String | `queueing` / `success` / `failed` |
| file_url | String | 音频文件 URL |
| credits | Int | 本次调用消耗的积分数 |
| payload | String | 透传参数 |
| created_at | String | 任务创建时间 |
### Curl 示例
```bash
curl -X POST https://api.vidu.cn/ent/v2/audio-tts \
-H "Authorization: Token {your_api_key}" \
-H "Content-Type: application/json" \
-d '{
"text": "你好,欢迎使用vidu开放平台",
"voice_setting_voice_id": "female-tianmei"
}'
```
---
## 三、声音复刻
### 端点
```
POST /ent/v2/audio-clone
```
### 请求体
| 参数名称 | 类型 | 必填 | 描述 |
|----------|------|------|------|
| audio_url | String | 是 | 原音频 URL(需可访问)。格式:mp3/m4a/wav;时长:10秒~5分钟;大小:≤20MB |
| voice_id | String | 是 | 自定义声音 ID。长度 [8, 256];首字符必须为英文字母;允许数字、字母、横线、下划线;末位不可为 `-``_`;不可与已有 ID 重复 |
| prompt_audio_url | String | 否 | 音色复刻示例音频(< 8秒),可增强音色相似度和稳定性 |
| prompt_text | String | 否 | 示例音频对应文本,需与音频内容一致,句末需有标点 |
| text | String | 是 | 复刻试听文本,≤1000 字符。使用复刻后的音色朗读,返回试听音频 |
| payload | String | 否 | 透传参数 |
### 响应体
```json
{
"task_id": "your_task_id_here",
"state": "success",
"voice_id": "vidu01",
"demo_audio": "https://...",
"payload": "",
"created_at": "2025-01-01T15:41:31.968916Z"
}
```
| 字段 | 类型 | 描述 |
|------|------|------|
| task_id | String | 任务 ID |
| state | String | `queueing` / `success` / `failed` |
| voice_id | String | 用户自定义的 voice_id(任务失败时不返回)|
| demo_audio | String | 试听音频链接(仅当请求传入 text 时返回)|
| payload | String | 透传参数 |
| created_at | String | 创建时间 |
### Curl 示例
```bash
curl -X POST https://api.vidu.cn/ent/v2/audio-clone \
-H "Authorization: Token {your_api_key}" \
-H "Content-Type: application/json" \
-d '{
"audio_url": "your_audio_url",
"voice_id": "vidu01",
"text": "你好,欢迎使用vidu开放平台"
}'
```
---
## 四、预设音色列表
共 **16 个中文(普通话)**音色,分标准版和 Beta(精品)版。
### 标准版
| voice_id | 音色名称 |
|----------|----------|
| male-qn-qingse | 青涩青年音色 |
| male-qn-jingying | 精英青年音色 |
| male-qn-badao | 霸道青年音色 |
| male-qn-daxuesheng | 青年大学生音色 |
| female-shaonv | 少女音色 |
| female-yujie | 御姐音色 |
| female-chengshu | 成熟女性音色 |
| female-tianmei | 甜美女性音色 |
### Beta(精品)版
| voice_id | 音色名称 |
|----------|----------|
| male-qn-qingse-jingpin | 青涩青年音色-beta |
| male-qn-jingying-jingpin | 精英青年音色-beta |
| male-qn-badao-jingpin | 霸道青年音色-beta |
| male-qn-daxuesheng-jingpin | 青年大学生音色-beta |
| female-shaonv-jingpin | 少女音色-beta |
| female-yujie-jingpin | 御姐音色-beta |
| female-chengshu-jingpin | 成熟女性音色-beta |
| female-tianmei-jingpin | 甜美女性音色-beta |
> 音色试听示例 URL 格式:`https://scene.vidu.zone/media-asset/{id}.mp3`(见飞书表格原始链接)
---
## 五、与 MiniMax 对比(接入参考)
| 维度 | Vidu | MiniMax |
|------|------|---------|
| Base URL | `https://api.vidu.cn` | `https://api.minimaxi.com` |
| 认证 | `Token {key}` | `Bearer {key}` |
| TTS 端点 | `POST /ent/v2/audio-tts` | `POST /v1/t2a_v2` |
| 同步/异步 | 同步 | 同步 + 异步 |
| 文本上限 | 10000 字符 | 10000 字符(同步)|
| 语速范围 | 0.5 ~ 2.0 (Float) | 需传 Int |
| 音量范围 | 0 ~ 10 (Int0=正常) | 需传 Int |
| 语调范围 | -12 ~ 12 (Int) | 需传 Int |
| 情绪控制 | 7 种情绪可选 | 不支持 |
| 多音字 | 支持 `pronunciation_dict_tone` | 不支持 |
| 声音复刻 | 同步,自定义 voice_id | 异步,系统分配 voice_id |
| 复刻音频要求 | 10秒~5分钟,≤20MB | 约 10秒~5分钟 |
| 预设音色 | 16 个中文 | 6 个中文 |
| 响应音频字段 | `file_url` | `audio` |
---
## 六、对口型(Lip Sync
### 端点
```
POST /ent/v2/lip-sync
```
**⚠️ 异步接口**,创建后返回 task_id,需要通过查询接口轮询或使用 callback_url 接收回调。
### 请求体
| 参数名称 | 类型 | 必填 | 描述 |
|----------|------|------|------|
| video_url | String | 是 | 原视频 URL(需可访问)。格式:mp4/mov/avi;时长:1~600秒(建议 10~120秒);大小:≤5G;分辨率:360p~4096p;编码:H.264 |
| audio_url | String | 否 | 音频文件 URL。格式:wav/mp3/wma/m4a/aac/ogg;时长:>1s 且 <600s;大小:≤100MB |
| text | String | 否 | 文本内容,4~2000 字符。与 audio_url 同时有值时,以 audio_url 为准 |
| speed | Float | 否 | 语速,默认 1.0,范围 [0.5, 2]。仅文字生成时生效 |
| voice_id | String | 否 | 音色 ID。仅文字生成时生效 |
| volume | Int | 否 | 音量,默认 0(正常音量),范围 [0, 10]。仅文字生成时生效 |
| ref_photo_url | String | 否 | 人脸参考图 URLjpg/jpeg/png/bmp/webp192~4096px,≤10MB)。视频中有多张人脸时,用于指定对口型目标人物 |
| callback_url | String | 否 | 回调地址,任务状态变化时 POST 回调 |
### 视频素材规范
- 真人出镜(卡通人物需五官比例接近真人)
- 人脸正对镜头,水平转动不超过 45°,俯仰不超过 15°
- 人脸尽量不遮挡,面部光线稳定
### 创建响应
```json
{
"task_id": "your_task_id_here",
"state": "created",
"payload": "",
"created_at": "2025-01-01T15:41:31.968916Z"
}
```
### 查询任务状态
```
GET /ent/v2/tasks/{task_id}/creations
```
**响应体**
| 字段 | 类型 | 描述 |
|------|------|------|
| id | String | 任务 ID |
| state | String | `created`/`queueing`/`processing`/`success`/`failed` |
| err_code | String | 错误码 |
| credits | Int | 消耗的积分数 |
| payload | String | 透传参数 |
| bgm | Bool | 是否使用 BGM |
| off_peak | Bool | 是否使用错峰模式 |
| creations | Array | 生成物结果列表 |
| creations[].id | String | 生成物 ID |
| creations[].url | String | 生成物 URL24小时有效期) |
| creations[].cover_url | String | 生成物封面 URL24小时有效期) |
| creations[].watermarked_url | String | 带水印的生成物 URL |
### Curl 示例(音频驱动)
```bash
curl -X POST https://api.vidu.cn/ent/v2/lip-sync \
-H "Authorization: Token {your_api_key}" \
-H "Content-Type: application/json" \
-d '{
"video_url": "your_video_url",
"audio_url": "your_audio_url"
}'
```
### Curl 示例(文字驱动)
```bash
curl -X POST https://api.vidu.cn/ent/v2/lip-sync \
-H "Authorization: Token {your_api_key}" \
-H "Content-Type: application/json" \
-d '{
"video_url": "your_video_url",
"text": "你好,欢迎使用vidu开放平台",
"voice_id": "female-tianmei",
"speed": 1.0
}'
```
---
## 七、接入建议
1. **Vidu 优势**:情绪控制、多音字标注、16 个音色(含精品版)、同步复刻、对口型
2. **Vidu 劣势**:没有独立的"查询音色列表"API,音色通过飞书表格维护
3. **接口类型差异**
- TTS / 声音复刻:**同步接口**,直接返回结果
- 对口型:**异步接口**,需轮询 `GET /tasks/{id}/creations` 或使用 callback
4. **速度/音量/音调类型**Vidu 的速度是 **Float**,音量和音调是 **Int**(和 MiniMax 不同,MiniMax 三者都要求 Int
5. **前端适配**:语速 slider 范围改为 0.5~2.0;音量改为 0~10;音调改为 -12~12
@@ -0,0 +1,184 @@
"""
Vidu API Provider
=================
封装 Vidu 语音/视频相关 HTTP API
- 同步 TTS/ent/v2/audio-tts
- 声音复刻(/ent/v2/audio-clone
- 对口型(/ent/v2/lip-sync
- 查询任务(/ent/v2/tasks/{id}/creations
认证方式:Token {api_key}Authorization Header
"""
from __future__ import annotations
import logging
from typing import Any
import aiohttp
from app.config import get_settings
logger = logging.getLogger(__name__)
class ViduProvider:
"""Vidu API 客户端封装"""
def __init__(self, api_key: str | None = None, base_url: str | None = None):
settings = get_settings()
self.api_key = api_key or settings.VIDU_API_KEY
self.base_url = (base_url or settings.VIDU_BASE_URL).rstrip("/")
def _get_headers(self) -> dict[str, str]:
return {
"Authorization": f"Token {self.api_key}",
"Content-Type": "application/json",
}
# ==================== TTS 语音合成 ====================
async def tts_sync(
self,
text: str,
voice_id: str,
speed: float = 1.0,
volume: int = 0,
pitch: int = 0,
emotion: str | None = None,
pronunciation_dict_tone: list[str] | None = None,
payload: str | None = None,
) -> dict[str, Any]:
"""
同步语音合成
POST /ent/v2/audio-tts
"""
url = f"{self.base_url}/ent/v2/audio-tts"
body: dict[str, Any] = {
"text": text,
"voice_setting_voice_id": voice_id,
"voice_setting_speed": speed,
"voice_setting_volume": volume,
"voice_setting_pitch": pitch,
}
if emotion:
body["voice_setting_emotion"] = emotion
if pronunciation_dict_tone:
body["pronunciation_dict_tone"] = pronunciation_dict_tone
if payload:
body["payload"] = payload
async with aiohttp.ClientSession() as session:
async with session.post(url, json=body, headers=self._get_headers()) as resp:
data = await resp.json()
if resp.status != 200 or data.get("state") == "failed":
msg = data.get("err_code") or data.get("message") or f"HTTP {resp.status}"
raise Exception(f"Vidu TTS error: {msg}")
return data
# ==================== 声音复刻 ====================
async def clone_voice(
self,
audio_url: str,
voice_id: str,
text: str,
prompt_audio_url: str | None = None,
prompt_text: str | None = None,
payload: str | None = None,
) -> dict[str, Any]:
"""
声音复刻(同步接口)
POST /ent/v2/audio-clone
"""
url = f"{self.base_url}/ent/v2/audio-clone"
body: dict[str, Any] = {
"audio_url": audio_url,
"voice_id": voice_id,
"text": text,
}
if prompt_audio_url:
body["prompt_audio_url"] = prompt_audio_url
if prompt_text:
body["prompt_text"] = prompt_text
if payload:
body["payload"] = payload
async with aiohttp.ClientSession() as session:
async with session.post(url, json=body, headers=self._get_headers()) as resp:
data = await resp.json()
if resp.status != 200 or data.get("state") == "failed":
msg = data.get("err_code") or data.get("message") or f"HTTP {resp.status}"
raise Exception(f"Vidu clone error: {msg}")
return data
# ==================== 对口型 ====================
async def lip_sync(
self,
video_url: str,
audio_url: str | None = None,
text: str | None = None,
voice_id: str | None = None,
speed: float = 1.0,
volume: int = 0,
ref_photo_url: str | None = None,
callback_url: str | None = None,
payload: str | None = None,
) -> dict[str, Any]:
"""
对口型(异步接口)
POST /ent/v2/lip-sync
"""
url = f"{self.base_url}/ent/v2/lip-sync"
body: dict[str, Any] = {"video_url": video_url}
if audio_url:
body["audio_url"] = audio_url
if text:
body["text"] = text
if voice_id:
body["voice_id"] = voice_id
if speed != 1.0:
body["speed"] = speed
if volume != 0:
body["volume"] = volume
if ref_photo_url:
body["ref_photo_url"] = ref_photo_url
if callback_url:
body["callback_url"] = callback_url
if payload:
body["payload"] = payload
async with aiohttp.ClientSession() as session:
async with session.post(url, json=body, headers=self._get_headers()) as resp:
data = await resp.json()
if resp.status != 200 or data.get("state") == "failed":
msg = data.get("err_code") or data.get("message") or f"HTTP {resp.status}"
raise Exception(f"Vidu lip-sync error: {msg}")
return data
# ==================== 查询任务 ====================
async def query_task(self, task_id: str) -> dict[str, Any]:
"""
查询任务状态及生成物
GET /ent/v2/tasks/{task_id}/creations
"""
url = f"{self.base_url}/ent/v2/tasks/{task_id}/creations"
async with aiohttp.ClientSession() as session:
async with session.get(url, headers=self._get_headers()) as resp:
data = await resp.json()
if resp.status != 200:
msg = data.get("err_code") or data.get("message") or f"HTTP {resp.status}"
raise Exception(f"Vidu query task error: {msg}")
return data
+277 -70
View File
@@ -3,19 +3,24 @@
=======================
提供 TTS 语音合成、批量合成、声音克隆等功能。
基于 Kling AI TTS 和声音克隆 API。
基于 MiniMax TTS 和声音克隆 API。
(Kling AI 语音相关代码保留但已废弃,仅视频/形象克隆仍使用 Kling)
"""
import logging
import tempfile
import uuid
from pathlib import Path
from fastapi import APIRouter, HTTPException
from fastapi import APIRouter, File, Form, HTTPException, UploadFile
from pydantic import BaseModel, Field
from app.schemas.common import ApiResponse, success_response
from app.services.tts_service import TTSService
from app.services.voice_clone_service import VoiceCloneService
from app.services.qiniu_service import QiniuService
from app.services.vidu_tts_service import ViduTTSService
from app.services.minimax_tts_service import MiniMaxTTSService # noqa: F401 历史兼容
from app.services.tts_service import TTSService # noqa: F401 历史兼容
from app.services.voice_clone_service import VoiceCloneService # noqa: F401 历史兼容
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/voice", tags=["Voice"])
@@ -27,10 +32,12 @@ router = APIRouter(prefix="/voice", tags=["Voice"])
class TTSSynthesizeRequest(BaseModel):
"""TTS 合成请求"""
text: str = Field(..., min_length=1, max_length=1000, description="待合成文本(≤1000")
voice_id: str | None = Field(None, description="音色 ID(默认:温柔女声")
speed: float = Field(default=1.0, ge=0.8, le=2.0, description="语速 0.8-2.0")
text: str = Field(..., min_length=1, max_length=10000, description="待合成文本(≤10000字符")
voice_id: str | None = Field(None, description="音色 ID(默认:甜美女性")
speed: float = Field(default=1.0, ge=0.5, le=2.0, description="语速 0.5-2.0")
voice_language: str = Field(default="zh", description="音色语种 (zh/en)")
volume: int = Field(default=0, ge=0, le=10, description="音量 0-100=正常)")
pitch: int = Field(default=0, ge=-12, le=12, description="音调 -12 到 12")
class TTSBatchSegment(BaseModel):
@@ -46,7 +53,9 @@ class TTSBatchRequest(BaseModel):
segments: list[TTSBatchSegment] = Field(..., min_length=1, description="段落列表")
voice_id: str | None = Field(None, description="音色 ID")
speed: float = Field(default=1.0, ge=0.8, le=2.0, description="语速")
speed: float = Field(default=1.0, ge=0.5, le=2.0, description="语速")
volume: int = Field(default=0, ge=0, le=10, description="音量 0-10")
pitch: int = Field(default=0, ge=-12, le=12, description="音调 -12 到 12")
class VoiceCloneSubmitRequest(BaseModel):
@@ -77,6 +86,13 @@ class VoiceCloneTaskResponse(BaseModel):
error_message: str | None = None
class VoiceUploadResponse(BaseModel):
"""音频上传响应"""
url: str = Field(..., description="七牛云访问 URL")
key: str = Field(..., description="存储 Key")
class VoiceInfo(BaseModel):
"""音色信息"""
@@ -85,11 +101,109 @@ class VoiceInfo(BaseModel):
description: str = ""
language: str = "zh"
recommended: bool = False
previewUrl: str | None = None
class LipSyncRequest(BaseModel):
"""对口型请求"""
video_url: str = Field(..., description="原视频 URL")
audio_url: str | None = Field(None, description="音频 URL(与 text 二选一)")
text: str | None = Field(None, description="文本内容(与 audio_url 二选一)")
voice_id: str | None = Field(None, description="音色 ID(文字驱动时生效)")
speed: float = Field(default=1.0, ge=0.5, le=2.0, description="语速")
volume: int = Field(default=0, ge=0, le=10, description="音量")
ref_photo_url: str | None = Field(None, description="人脸参考图 URL")
class LipSyncResponse(BaseModel):
"""对口型响应"""
task_id: str
state: str
class LipSyncQueryResponse(BaseModel):
"""对口型查询响应"""
task_id: str
state: str
video_url: str | None = None
cover_url: str | None = None
# ========== API 路由 ==========
@router.post("/upload", response_model=ApiResponse[VoiceUploadResponse])
async def upload_voice_file(
file: UploadFile = File(...),
file_type: str = Form(default="audio", description="文件类型: audio | video"),
):
"""
上传音频/视频文件到七牛云
接收音频(mp3/wav)或视频(mp4/mov)文件,上传至七牛云 media bucket
返回公开访问 URL。
"""
try:
file_type = file_type.lower().strip()
if file_type not in ("audio", "video"):
raise HTTPException(status_code=400, detail="file_type 必须是 audio 或 video")
# 根据类型校验 MIME
if file_type == "audio":
allowed_types = {"audio/mpeg", "audio/mp3", "audio/wav"}
max_size = 50 * 1024 * 1024 # 50MB
prefix = "meijiaka-zj/voice"
type_label = "音频"
else:
allowed_types = {"video/mp4", "video/quicktime"}
max_size = 200 * 1024 * 1024 # 200MB
prefix = "meijiaka-zj/avatar"
type_label = "视频"
content_type = file.content_type or "application/octet-stream"
if content_type not in allowed_types:
raise HTTPException(
status_code=400,
detail=f"不支持的{type_label}格式: {content_type},仅支持 {', '.join(allowed_types)}",
)
# 读取文件内容
content = await file.read()
if len(content) > max_size:
raise HTTPException(status_code=400, detail=f"{type_label}文件大小不能超过 {max_size // 1024 // 1024}MB")
# 生成存储 key
ext = content_type.split("/")[-1].replace("quicktime", "mov").replace("mpeg", "mp3")
key = f"{prefix}/{uuid.uuid4().hex}.{ext}"
# 上传到七牛云
qiniu = QiniuService()
from io import BytesIO
qiniu.upload_stream(
stream=BytesIO(content),
key=key,
mime_type=content_type,
)
# 获取公开 URLmedia bucket 使用 video_domain
url = qiniu.get_file_url(qiniu.video_domain, key)
return success_response(
data=VoiceUploadResponse(url=url, key=key),
message="上传成功",
)
except HTTPException:
raise
except Exception as e:
logger.error(f"[Voice] 上传失败: {e}")
raise HTTPException(status_code=500, detail=f"上传失败: {str(e)}")
@router.get("/voices", response_model=ApiResponse[list[VoiceInfo]])
async def list_voices():
"""
@@ -97,13 +211,26 @@ async def list_voices():
返回预设的音色选项,用户可选择喜欢的音色进行 TTS 合成。
"""
voices = TTSService.get_preset_voices()
voices = ViduTTSService.get_preset_voices()
return success_response(
data=[VoiceInfo(**v) for v in voices],
message="获取音色列表成功",
)
@router.get("/preset-voices/raw", response_model=ApiResponse[list[dict]])
async def list_preset_voices_raw():
"""
【已废弃】KlingAI 官方预置音色列表
语音功能已迁移至 Vidu,此端点保留仅作历史兼容。
"""
return success_response(
data=[],
message="语音功能已迁移至 Vidu,请使用 /voices 获取音色列表",
)
@router.post("/synthesize", response_model=ApiResponse[dict])
async def synthesize_speech(request: TTSSynthesizeRequest):
"""
@@ -113,12 +240,13 @@ async def synthesize_speech(request: TTSSynthesizeRequest):
适用于短文本(≤1000字),长文本建议使用 /synthesize-batch。
"""
try:
service = TTSService()
service = ViduTTSService()
audio_url = await service.synthesize_sync(
text=request.text,
voice_id=request.voice_id,
speed=request.speed,
voice_language=request.voice_language,
volume=request.volume,
pitch=request.pitch,
)
return success_response(
@@ -126,7 +254,7 @@ async def synthesize_speech(request: TTSSynthesizeRequest):
"audio_url": audio_url,
"format": "mp3",
"text": request.text,
"voice_id": request.voice_id or "829826751244537879",
"voice_id": request.voice_id or ViduTTSService.DEFAULT_VOICE_ID,
},
message="合成成功",
)
@@ -154,13 +282,31 @@ async def synthesize_batch(request: TTSBatchRequest):
segments_data = [s.model_dump() for s in request.segments]
service = TTSService()
results = await service.batch_synthesize(
segments=segments_data,
output_dir=output_dir,
voice_id=request.voice_id,
speed=request.speed,
)
service = ViduTTSService()
# Vidu 暂不支持批量合成,逐段调用
results = []
for seg in segments_data:
try:
audio_url = await service.synthesize_sync(
text=seg["text"],
voice_id=request.voice_id,
speed=request.speed,
volume=request.volume,
pitch=request.pitch,
)
results.append({
"index": seg.get("index", 0),
"success": True,
"audio_url": audio_url,
"filename": seg.get("filename"),
})
except Exception as e:
results.append({
"index": seg.get("index", 0),
"success": False,
"error": str(e),
"filename": seg.get("filename"),
})
success_count = sum(1 for r in results if r["success"])
failed_count = len(results) - success_count
@@ -188,20 +334,28 @@ async def synthesize_to_file(request: TTSSynthesizeRequest, output_path: str):
将文本转换为语音并保存到指定文件路径。
"""
try:
service = TTSService()
saved_path = await service.synthesize_to_file(
service = ViduTTSService()
audio_url = await service.synthesize_sync(
text=request.text,
output_path=output_path,
voice_id=request.voice_id,
speed=request.speed,
voice_language=request.voice_language,
volume=request.volume,
pitch=request.pitch,
)
# 下载音频并保存到指定路径
import httpx
async with httpx.AsyncClient() as client:
response = await client.get(audio_url)
response.raise_for_status()
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
Path(output_path).write_bytes(response.content)
return success_response(
data={
"file_path": str(saved_path),
"file_path": output_path,
"text": request.text,
"voice_id": request.voice_id or "829826751244537879",
"voice_id": request.voice_id or ViduTTSService.DEFAULT_VOICE_ID,
},
message="文件保存成功",
)
@@ -217,26 +371,26 @@ async def synthesize_to_file(request: TTSSynthesizeRequest, output_path: str):
@router.post("/clone/submit", response_model=ApiResponse[VoiceCloneTaskResponse])
async def submit_clone_task(request: VoiceCloneSubmitRequest):
"""
提交声音克隆任务
提交声音克隆任务Vidu
提交音频/视频 URL 进行声音克隆,返回任务 ID 用于后续查询
支持三种来源:source_audio_url、source_video_url、video_id。
Vidu 声音复刻是同步接口,直接返回结果
"""
try:
service = VoiceCloneService()
task_id = await service.submit_clone_task(
source_audio_url=request.source_audio_url,
source_video_url=request.source_video_url,
video_id=request.video_id,
voice_name=request.voice_name,
service = ViduTTSService()
result = await service.clone_voice(
audio_url=request.source_audio_url or "",
voice_id=request.voice_name or f"vidu_{uuid.uuid4().hex[:8]}",
)
# Vidu 同步返回,状态直接为 succeeded
return success_response(
data=VoiceCloneTaskResponse(
task_id=task_id,
status="pending",
task_id=result.get("task_id", ""),
status="succeeded",
voice_id=result.get("voice_id"),
trial_url=result.get("demo_audio"),
),
message="克隆任务已提交",
message="克隆成功",
)
except ValueError as e:
@@ -250,29 +404,17 @@ async def submit_clone_task(request: VoiceCloneSubmitRequest):
@router.get("/clone/query/{task_id}", response_model=ApiResponse[VoiceCloneTaskResponse])
async def query_clone_task(task_id: str, blocking: bool = False):
"""
查询声音克隆任务状态
查询声音克隆任务状态Vidu
Args:
task_id: 任务 ID
blocking: 是否阻塞等待完成(默认 False)
Vidu 声音复刻是同步接口,此端点仅做兼容,直接返回成功状态。
"""
try:
service = VoiceCloneService()
result = await service.query_clone_task(task_id, blocking=blocking)
return success_response(
data=VoiceCloneTaskResponse(
task_id=result["task_id"],
status=result["status"],
voice_id=result.get("voice_id"),
trial_url=result.get("trial_url"),
error_message=result.get("error_message"),
)
)
except Exception as e:
logger.error(f"[Voice] 查询克隆任务失败: {e}")
raise HTTPException(status_code=500, detail=f"查询失败: {str(e)}")
return success_response(
data=VoiceCloneTaskResponse(
task_id=task_id,
status="succeeded",
),
message="克隆已完成",
)
@router.post("/clone/clone-and-wait", response_model=ApiResponse[VoiceCloneTaskResponse])
@@ -284,24 +426,20 @@ async def clone_and_wait(request: VoiceCloneSubmitRequest, poll_interval: float
适用于需要等待克隆完成的场景。
"""
try:
service = VoiceCloneService()
result = await service.wait_for_clone(
source_audio_url=request.source_audio_url,
source_video_url=request.source_video_url,
video_id=request.video_id,
voice_name=request.voice_name,
poll_interval=poll_interval,
service = ViduTTSService()
result = await service.clone_voice(
audio_url=request.source_audio_url or "",
voice_id=request.voice_name or f"vidu_{uuid.uuid4().hex[:8]}",
)
return success_response(
data=VoiceCloneTaskResponse(
task_id=result["task_id"],
status=result["status"],
task_id=result.get("task_id", ""),
status="succeeded",
voice_id=result.get("voice_id"),
trial_url=result.get("trial_url"),
error_message=result.get("error_message"),
trial_url=result.get("demo_audio"),
),
message=f"克隆任务完成,状态: {result['status']}",
message="克隆成功",
)
except ValueError as e:
@@ -312,4 +450,73 @@ async def clone_and_wait(request: VoiceCloneSubmitRequest, poll_interval: float
raise HTTPException(status_code=500, detail=f"克隆失败: {str(e)}")
# ==================== 对口型 ====================
@router.post("/lip-sync", response_model=ApiResponse[LipSyncResponse])
async def create_lip_sync(request: LipSyncRequest):
"""
创建对口型任务(异步接口)
输入视频 + 音频/文字,生成对口型视频。
返回 task_id,需通过 /lip-sync/{task_id} 查询结果。
"""
try:
if not request.audio_url and not request.text:
raise ValueError("audio_url 和 text 至少传一个")
service = ViduTTSService()
task_id = await service.lip_sync_create(
video_url=request.video_url,
audio_url=request.audio_url,
text=request.text,
voice_id=request.voice_id,
speed=request.speed,
volume=request.volume,
ref_photo_url=request.ref_photo_url,
)
return success_response(
data=LipSyncResponse(task_id=task_id, state="created"),
message="对口型任务已创建",
)
except ValueError as e:
logger.warning(f"[Voice] 对口型参数错误: {e}")
raise HTTPException(status_code=422, detail=str(e))
except Exception as e:
logger.error(f"[Voice] 对口型任务创建失败: {e}")
raise HTTPException(status_code=500, detail=f"创建失败: {str(e)}")
@router.get("/lip-sync/{task_id}", response_model=ApiResponse[LipSyncQueryResponse])
async def query_lip_sync(task_id: str):
"""
查询对口型任务状态
返回任务状态及生成物 URL(24小时有效期)。
"""
try:
service = ViduTTSService()
result = await service.lip_sync_query(task_id)
state = result.get("state", "unknown")
creations = result.get("creations", [])
video_url = creations[0].get("url") if creations else None
cover_url = creations[0].get("cover_url") if creations else None
return success_response(
data=LipSyncQueryResponse(
task_id=task_id,
state=state,
video_url=video_url,
cover_url=cover_url,
),
message=f"任务状态: {state}",
)
except Exception as e:
logger.error(f"[Voice] 查询对口型任务失败: {e}")
raise HTTPException(status_code=500, detail=f"查询失败: {str(e)}")
+14
View File
@@ -119,6 +119,20 @@ class Settings(BaseSettings):
KLINGAI_ACCESS_KEY: str | None = Field(default=None, description="KlingAI Access Key")
KLINGAI_SECRET_KEY: str | None = Field(default=None, description="KlingAI Secret Key")
# MiniMax 配置
MINIMAX_API_KEY: str | None = Field(default=None, description="MiniMax API Key")
MINIMAX_BASE_URL: str = Field(
default="https://api.minimaxi.com",
description="MiniMax Base URL(国内: api.minimaxi.com, 国际: api.minimax.io",
)
# Vidu 配置
VIDU_API_KEY: str | None = Field(default=None, description="Vidu API Key")
VIDU_BASE_URL: str = Field(
default="https://api.vidu.cn",
description="Vidu Base URL",
)
# 七牛云存储配置
QINIU_ACCESS_KEY: str | None = Field(default=None, description="七牛云 Access Key")
QINIU_SECRET_KEY: str | None = Field(default=None, description="七牛云 Secret Key")
+241
View File
@@ -0,0 +1,241 @@
"""
Vidu TTS 服务封装
=================
业务层封装:
- 同步 TTS
- 声音复刻
- 对口型(异步,需轮询)
- 预设音色列表
"""
from __future__ import annotations
import logging
from typing import Any
from app.ai.providers.vidu_provider import ViduProvider
logger = logging.getLogger(__name__)
# Vidu 预设音色(底层为 MiniMax,兼容 MiniMax 音色 ID
VIDU_PRESET_VOICES = [
{
"voice_id": "tianxin_xiaoling",
"name": "甜心小玲",
"language": "zh",
"description": "甜美可爱,活泼俏皮",
"recommended": True,
"previewUrl": "https://media.liche.cn/meijiaka-zj/voice/tianxin_xiaoling.mp3",
},
{
"voice_id": "danya_xuejie",
"name": "淡雅学姐",
"language": "zh",
"description": "淡雅知性,温婉柔和",
"recommended": False,
"previewUrl": "https://media.liche.cn/meijiaka-zj/voice/danya_xuejie.mp3",
},
{
"voice_id": "Chinese (Mandarin)_Warm_Girl",
"name": "温暖少女",
"language": "zh",
"description": "温暖亲切,清新自然",
"recommended": False,
"previewUrl": "https://media.liche.cn/meijiaka-zj/voice/Warm_Girl.mp3",
},
{
"voice_id": "Chinese (Mandarin)_Radio_Host",
"name": "电台男主播",
"language": "zh",
"description": "专业播报,沉稳有力",
"recommended": False,
"previewUrl": "https://media.liche.cn/meijiaka-zj/voice/Radio_Host.mp3",
},
{
"voice_id": "Chinese (Mandarin)_Straightforward_Boy",
"name": "率真弟弟",
"language": "zh",
"description": "率真爽朗,青春阳光",
"recommended": False,
"previewUrl": "https://media.liche.cn/meijiaka-zj/voice/Straightforward_Boy.mp3",
},
{
"voice_id": "Chinese (Mandarin)_Gentleman",
"name": "温润男声",
"language": "zh",
"description": "温润如玉,低沉磁性",
"recommended": False,
"previewUrl": "https://media.liche.cn/meijiaka-zj/voice/Gentleman.mp3",
},
]
DEFAULT_VOICE_ID = "tianxin_xiaoling"
class ViduTTSService:
"""Vidu TTS 服务封装"""
def __init__(self):
self.provider = ViduProvider()
# ==================== 预设音色 ====================
@staticmethod
def get_preset_voices() -> list[dict]:
"""获取预设音色列表"""
return VIDU_PRESET_VOICES
@staticmethod
def get_voice_by_id(voice_id: str) -> dict | None:
"""根据 ID 获取音色信息"""
for voice in VIDU_PRESET_VOICES:
if voice["voice_id"] == voice_id:
return voice
return None
# ==================== 同步 TTS ====================
async def synthesize_sync(
self,
text: str,
voice_id: str | None = None,
speed: float = 1.0,
volume: int = 0,
pitch: int = 0,
**kwargs,
) -> str:
"""
同步语音合成,返回音频 URL。
Args:
text: 待合成文本(≤10000 字符)
voice_id: 音色 ID(默认:甜心小玲)
speed: 语速(0.5-2.0
volume: 音量(0-100=正常)
pitch: 语调(-12~12
Returns:
音频 URL
"""
if not text or not text.strip():
raise ValueError("text 不能为空")
voice = voice_id or DEFAULT_VOICE_ID
result = await self.provider.tts_sync(
text=text,
voice_id=voice,
speed=speed,
volume=volume,
pitch=pitch,
**kwargs,
)
audio_url = result.get("file_url")
if not audio_url:
raise ValueError("TTS 合成失败: 未返回音频 URL")
logger.info(f"[Vidu TTS] 合成成功: voice_id={voice}, url={audio_url[:60]}...")
return audio_url
# ==================== 声音复刻 ====================
async def clone_voice(
self,
audio_url: str,
voice_id: str,
text: str | None = None,
prompt_audio_url: str | None = None,
prompt_text: str | None = None,
) -> dict[str, Any]:
"""
声音复刻(同步接口)。
Args:
audio_url: 原音频 URL
voice_id: 自定义 voice_id8~256字符,首字符字母)
text: 试听文本(≤1000字符,不传则不会生成试听音频)
prompt_audio_url: 示例音频 URL<8秒)
prompt_text: 示例音频对应文本
Returns:
复刻结果 dict,包含 voice_id、demo_audio 等
"""
trial_text = text or "你好,欢迎使用vidu开放平台"
result = await self.provider.clone_voice(
audio_url=audio_url,
voice_id=voice_id,
text=trial_text,
prompt_audio_url=prompt_audio_url,
prompt_text=prompt_text,
)
logger.info(f"[Vidu Clone] 复刻成功: voice_id={result.get('voice_id')}")
return result
async def query_clone_task(self, voice_id: str) -> dict[str, Any]:
"""
Vidu 声音复刻是同步接口,无独立查询。
此方法仅做兼容,返回已知的 voice_id 信息。
"""
return {"voice_id": voice_id, "status": "succeeded"}
# ==================== 对口型 ====================
async def lip_sync_create(
self,
video_url: str,
audio_url: str | None = None,
text: str | None = None,
voice_id: str | None = None,
speed: float = 1.0,
volume: int = 0,
ref_photo_url: str | None = None,
callback_url: str | None = None,
) -> str:
"""
创建对口型任务(异步接口),返回 task_id。
Args:
video_url: 原视频 URL
audio_url: 音频 URL(与 text 二选一)
text: 文本内容(与 audio_url 二选一)
voice_id: 音色 ID(文字驱动时生效)
speed: 语速(文字驱动时生效)
volume: 音量(文字驱动时生效)
ref_photo_url: 人脸参考图 URL
callback_url: 回调地址
Returns:
task_id
"""
result = await self.provider.lip_sync(
video_url=video_url,
audio_url=audio_url,
text=text,
voice_id=voice_id,
speed=speed,
volume=volume,
ref_photo_url=ref_photo_url,
callback_url=callback_url,
)
task_id = result.get("task_id")
if not task_id:
raise ValueError("对口型任务创建失败: 未返回 task_id")
logger.info(f"[Vidu LipSync] 任务创建成功: task_id={task_id}")
return task_id
async def lip_sync_query(self, task_id: str) -> dict[str, Any]:
"""
查询对口型任务状态及生成物。
Returns:
任务状态 dict,包含 state、creations 等
"""
result = await self.provider.query_task(task_id)
logger.info(f"[Vidu LipSync] 查询状态: task_id={task_id}, state={result.get('state')}")
return result
+8
View File
@@ -12,6 +12,14 @@ services:
- REDIS_PORT=6379
- REDIS_DB=1
- SECRET_KEY=dev-secret-key-change-in-production
- MINIMAX_API_KEY=${MINIMAX_API_KEY}
- MINIMAX_BASE_URL=${MINIMAX_BASE_URL:-https://api.minimaxi.com}
- VIDU_API_KEY=${VIDU_API_KEY}
- VIDU_BASE_URL=${VIDU_BASE_URL:-https://api.vidu.cn}
- MINIMAX_API_KEY=${MINIMAX_API_KEY}
- MINIMAX_BASE_URL=${MINIMAX_BASE_URL:-https://api.minimaxi.com}
- VIDU_API_KEY=${VIDU_API_KEY}
- VIDU_BASE_URL=${VIDU_BASE_URL:-https://api.vidu.cn}
volumes:
- .:/app
- ~/Documents/Meijiaka-zj:/root/Documents/Meijiaka-zj
+231 -170
View File
@@ -1,245 +1,306 @@
/**
* VoiceDubbing 样式
* ==================
*/
/* 语音配音页面 — 遵循项目样式规范 */
.voice-dubbing {
width: 100%;
height: 100%;
display: flex;
flex-direction: column;
}
/* 左右分栏 */
.dubbing-layout {
display: grid;
grid-template-columns: 1fr 1fr;
gap: var(--spacing-lg);
margin-top: var(--spacing-md);
gap: var(--spacing-xl);
flex: 1;
min-height: 0;
}
.voice-panel,
.mapping-panel {
/* ========== 左侧 ========== */
.voice-sidebar {
display: flex;
flex-direction: column;
gap: var(--spacing-md);
gap: var(--spacing-lg);
min-height: 0;
overflow: hidden;
}
.panel-section {
background: var(--bg-card);
border: 1px solid var(--border-light);
border-radius: var(--radius-lg);
padding: var(--spacing-md);
.voice-sidebar > .voice-section:first-child {
flex: 1;
min-height: 0;
overflow: hidden;
}
.panel-section h4 {
font-size: 13px;
.voice-list {
flex: 1;
min-height: 0;
overflow-y: auto;
}
.voice-section {
display: flex;
flex-direction: column;
gap: var(--spacing-sm);
}
.voice-section-header {
display: flex;
justify-content: space-between;
align-items: center;
}
.voice-section-title {
font-size: var(--font-sm);
font-weight: 600;
color: var(--text-primary);
margin-bottom: var(--spacing-sm);
}
/* 音色网格 */
.voice-grid {
display: grid;
grid-template-columns: 1fr 1fr;
.link-btn {
font-size: var(--font-sm);
color: var(--primary);
background: none;
border: none;
cursor: pointer;
padding: 0;
}
.link-btn:hover {
text-decoration: underline;
}
/* Tab — 遵循项目选项卡风格 */
.voice-tabs {
display: flex;
gap: 0;
border-bottom: 1px solid var(--border-light);
}
.voice-tab {
padding: 6px 12px;
border: none;
border-bottom: 2px solid transparent;
background: none;
color: var(--text-secondary);
font-size: var(--font-sm);
cursor: pointer;
transition: all var(--transition-fast);
}
.voice-tab:hover {
color: var(--primary);
}
.voice-tab.active {
border-bottom-color: var(--primary);
color: var(--primary);
font-weight: 600;
}
/* 试听条 */
.voice-preview-bar {
display: flex;
align-items: center;
gap: var(--spacing-sm);
padding: var(--spacing-sm);
background: var(--primary-light);
border-radius: var(--radius-md);
}
.voice-preview-audio {
flex: 1;
height: 28px;
}
/* 音色列表 — 遵循 .option-card 规范 */
.voice-list {
display: flex;
flex-direction: column;
gap: var(--spacing-xs);
}
.voice-card {
border: 1px solid var(--border-light);
.voice-row {
display: flex;
flex-direction: column;
padding: var(--spacing-sm) var(--spacing-md);
border-radius: var(--radius-md);
padding: var(--spacing-sm);
border: 1px solid var(--border-color);
background: var(--bg-card);
cursor: pointer;
transition: all 0.15s ease;
background: var(--bg-primary);
transition: all var(--transition-fast);
}
.voice-card:hover {
border-color: var(--primary-light);
.voice-row:hover {
border-color: var(--primary);
background: var(--bg-hover);
}
.voice-card.selected {
.voice-row.selected {
border-color: var(--primary);
background: color-mix(in srgb, var(--primary) 5%, var(--bg-card));
background: var(--primary-light);
}
.voice-name {
font-size: 13px;
font-weight: 600;
.voice-row-main {
display: flex;
align-items: center;
justify-content: space-between;
width: 100%;
}
.voice-row-info {
flex: 1;
min-width: 0;
}
.voice-row-name {
font-size: var(--font-sm);
font-weight: 500;
color: var(--text-primary);
display: flex;
align-items: center;
gap: 6px;
}
.recommended-tag {
font-size: 10px;
background: color-mix(in srgb, var(--primary) 15%, transparent);
color: var(--primary);
padding: 1px 5px;
border-radius: var(--radius-sm);
font-weight: 500;
}
.voice-desc {
font-size: 11px;
.voice-row-desc {
font-size: var(--font-xs);
color: var(--text-secondary);
margin-top: 2px;
}
/* 试听 */
.preview-row {
display: flex;
gap: var(--spacing-sm);
align-items: flex-end;
}
.preview-text {
flex: 1;
padding: var(--spacing-sm);
border: 1px solid var(--border-light);
border-radius: var(--radius-md);
font-size: 13px;
resize: none;
line-height: 1.5;
font-family: inherit;
}
.preview-audio {
width: 100%;
height: 36px;
margin-top: var(--spacing-sm);
}
/* 批量合成 */
.batch-info {
display: flex;
gap: var(--spacing-md);
font-size: 12px;
.voice-row-desc-inline {
font-size: var(--font-xs);
color: var(--text-secondary);
margin-bottom: var(--spacing-sm);
flex-wrap: wrap;
margin-left: 8px;
font-weight: 400;
}
.batch-btn {
width: 100%;
/* 标签 — 遵循全局 .tag 风格,不覆盖 */
.voice-row-name .tag {
font-size: var(--font-xs);
padding: 1px 5px;
}
.progress-bar {
height: 4px;
background: var(--bg-light);
border-radius: 2px;
overflow: hidden;
margin-top: var(--spacing-sm);
/* 试听按钮 — 图标按钮风格 */
.preview-icon {
width: 32px;
height: 32px;
display: inline-flex;
align-items: center;
justify-content: center;
border: none;
border-radius: var(--radius-md);
background: var(--bg-input);
color: var(--text-secondary);
font-size: var(--font-xs);
cursor: pointer;
flex-shrink: 0;
transition: all var(--transition-fast);
}
.progress-fill {
height: 100%;
.preview-icon:hover {
background: var(--primary);
transition: width 0.3s ease;
color: var(--text-inverse);
}
/* 分镜配音列表 */
.segment-voice-list {
display: flex;
flex-direction: column;
gap: var(--spacing-xs);
max-height: 400px;
overflow-y: auto;
}
.seg-voice-item {
border: 1px solid var(--border-light);
border-radius: var(--radius-md);
padding: var(--spacing-sm);
background: var(--bg-primary);
}
.seg-voice-item.empty-shot {
.preview-icon:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.seg-voice-info {
display: flex;
flex-direction: column;
gap: 4px;
/* 空状态 */
.voice-empty {
padding: var(--spacing-xl);
text-align: center;
color: var(--text-secondary);
font-size: var(--font-sm);
}
.seg-voice-index {
font-size: 12px;
.voice-empty small {
font-size: var(--font-xs);
opacity: 0.7;
}
/* 语速 */
.speed-value {
font-size: var(--font-sm);
color: var(--primary);
font-weight: 600;
color: var(--text-primary);
}
.seg-has-audio {
display: flex;
flex-direction: column;
gap: 4px;
}
.audio-name {
font-size: 11px;
color: var(--success);
}
.seg-audio-player {
height: 28px;
width: 100%;
}
.seg-no-audio {
font-size: 11px;
.speed-value small {
font-weight: 400;
color: var(--text-secondary);
margin-left: 4px;
}
.seg-voiceover {
font-size: 11px;
color: var(--text-secondary);
margin-top: 4px;
line-height: 1.4;
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
/* 音频文件库 */
.audio-file-list {
display: flex;
flex-direction: column;
gap: var(--spacing-xs);
}
.audio-file-item {
.speed-slider-wrap {
display: flex;
align-items: center;
gap: var(--spacing-sm);
padding: var(--spacing-xs) 0;
border-bottom: 1px solid var(--border-light);
gap: var(--spacing-md);
width: 100%;
}
.audio-file-item:last-child {
border-bottom: none;
}
.audio-file-info {
flex: 1;
min-width: 0;
}
.audio-file-name {
font-size: 12px;
font-weight: 500;
color: var(--text-primary);
display: block;
overflow: hidden;
text-overflow: ellipsis;
.speed-slider-wrap span {
font-size: var(--font-xs);
color: var(--text-tertiary);
white-space: nowrap;
flex-shrink: 0;
min-width: 36px;
text-align: center;
}
.audio-file-size {
font-size: 11px;
.speed-slider-wrap .slider-input {
flex: 1;
}
/* 底部生成按钮 — 复用全局 .btn-primary,只做宽度调整 */
.voice-generate-wrap {
margin-top: auto;
padding-top: var(--spacing-md);
}
.voice-generate-wrap .btn {
width: 100%;
}
/* ========== 右侧 ========== */
.script-content {
display: flex;
flex-direction: column;
min-height: 0;
}
.script-content-header {
display: flex;
align-items: center;
justify-content: space-between;
font-size: var(--font-sm);
font-weight: 600;
color: var(--text-primary);
margin-bottom: var(--spacing-sm);
}
.script-content-meta {
font-size: var(--font-xs);
font-weight: 400;
color: var(--text-secondary);
}
.audio-file-player {
height: 28px;
flex-shrink: 0;
/* textarea 撑满剩余空间 */
.script-content textarea {
flex: 1;
min-height: 0;
line-height: 1.8;
}
/* 内嵌试听播放器 */
.voice-preview-inline {
margin-top: var(--spacing-sm);
padding-top: var(--spacing-sm);
border-top: 1px solid var(--border-light);
}
.voice-preview-inline .voice-preview-audio {
width: 100%;
}
+235 -261
View File
@@ -1,314 +1,288 @@
/**
* 配音管理页面
* =============
* 语音配音页面 (Step 3)
* ======================
*
* TTS 文本转语音:选择音色、批量合成旁白配音。
* 管理项目音频文件,关联到分镜。
* 布局:左侧窄栏(音色 + 语速 + 生成按钮固定底部)| 右侧宽栏(配音文案)
*/
import { useState, useEffect, useCallback, useRef } from 'react';
import { useState, useEffect, useMemo, useCallback } from 'react';
import { useProjectStore } from '../../store';
import { useVoiceStore } from '../../store/voiceStore';
import { getCurrentProjectId } from '../../api/modules/localStorage';
import { synthesizeTTS, synthesizeBatchTTS } from '../../api/modules/voice';
import { saveAudio } from '../../api/modules/voice';
import { synthesizeTTS, saveAudio, uploadAudio } from '../../api/modules/voice';
import { toast } from '../../store/uiStore';
import { useProgressStore } from '../../store/progressStore';
import './VoiceDubbing.css';
export default function VoiceDubbing() {
const projectId = getCurrentProjectId();
const segments = useProjectStore(state => state.segments);
const updateSegment = useProjectStore(state => state.updateSegment);
const projectId = getCurrentProjectId();
const {
presetVoices,
voiceMaterials,
selectedVoiceId,
speed,
volume,
pitch,
loadPresetVoices,
loadVoiceMaterials,
setSelectedVoiceId,
projectAudios,
setSpeed,
setVolume,
setPitch,
loadProjectAudios,
getAudioForSegment,
setAudioMapping,
} = useVoiceStore();
const [isSynthesizing, setIsSynthesizing] = useState(false);
const [synthProgress, setSynthProgress] = useState(0);
const [synthTotal, setSynthTotal] = useState(0);
const [customText, setCustomText] = useState('');
const [customPreviewUrl, setCustomPreviewUrl] = useState<string | null>(null);
const audioPreviewRef = useRef<HTMLAudioElement>(null);
const [isGenerating, setIsGenerating] = useState(false);
const [activeVoiceTab, setActiveVoiceTab] = useState<'preset' | 'clone'>('preset');
const [activePreviewVoiceId, setActivePreviewVoiceId] = useState<string | null>(null);
// 加载音色和项目音频
useEffect(() => {
loadPresetVoices();
if (projectId) {
loadProjectAudios(projectId);
}
loadVoiceMaterials();
if (projectId) loadProjectAudios(projectId);
}, [projectId]);
// 获取有旁白文本的分镜(排除空镜)
const voicedSegments = segments.filter(s => s.type !== 'empty_shot' && s.voiceover);
const totalChars = voicedSegments.reduce((sum, s) => sum + (s.voiceover?.length || 0), 0);
const mergedText = useMemo(
() => segments.map(s => s.voiceover?.trim() || '【空镜】').join('\n'),
[segments]
);
const totalChars = mergedText.length;
// 批量合成所有旁白
const handleBatchSynthesize = useCallback(async () => {
if (!projectId || voicedSegments.length === 0) {
toast.warn('没有需要合成的旁白');
const handleTogglePreview = useCallback((voiceId: string, voiceName: string, e: React.MouseEvent) => {
e.stopPropagation();
// 点击同一个就是关闭
if (activePreviewVoiceId === voiceId) {
setActivePreviewVoiceId(null);
return;
}
setActivePreviewVoiceId(voiceId);
}, [activePreviewVoiceId]);
setIsSynthesizing(true);
setSynthProgress(0);
setSynthTotal(voicedSegments.length);
let successCount = 0;
let failCount = 0;
try {
for (let i = 0; i < voicedSegments.length; i++) {
const seg = voicedSegments[i];
const segId = seg.id?.toString() || String(i);
const text = seg.voiceover || '';
setSynthProgress(i + 1);
try {
// 同步 TTS 合成(≤200字)
const result = await synthesizeTTS({
text,
voiceId: selectedVoiceId,
speed: 1.0,
});
if (!result.audioBase64) {
throw new Error('未返回音频数据');
}
// 保存到本地
const audioId = `tts_${segId}_${Date.now()}`;
const meta = await saveAudio({
projectId,
audioId,
audioData: result.audioBase64,
name: `旁白-${segId}`,
voiceId: selectedVoiceId,
duration: 0, // 暂时无法获取时长
segmentId: segId,
});
// 关联到分镜
setAudioMapping(segId, meta.id);
// 更新分镜 audioPath
updateSegment(seg.id!, { audioPath: meta.filePath });
successCount++;
} catch (err) {
console.error(`[VoiceDubbing] 分镜 ${segId} 合成失败:`, err);
failCount++;
}
}
if (successCount > 0) {
toast.success(`配音合成完成:成功 ${successCount}${failCount > 0 ? `,失败 ${failCount}` : ''}`);
} else {
toast.error('配音合成全部失败');
}
} finally {
setIsSynthesizing(false);
setSynthProgress(0);
}
}, [projectId, voicedSegments, selectedVoiceId, updateSegment, setAudioMapping]);
// 试听音色
const handlePreviewVoice = useCallback(async () => {
if (!customText.trim()) {
toast.warn('请输入要预览的文本');
return;
}
try {
setCustomPreviewUrl(null);
const result = await synthesizeTTS({
text: customText.slice(0, 200),
voiceId: selectedVoiceId,
speed: 1.0,
});
if (!result.audioBase64) {
throw new Error('未返回音频数据');
}
const audioBlob = new Blob(
[Uint8Array.from(atob(result.audioBase64), c => c.charCodeAt(0))],
{ type: 'audio/mp3' }
);
const url = URL.createObjectURL(audioBlob);
setCustomPreviewUrl(url);
} catch (err) {
toast.error(`试听失败: ${err instanceof Error ? err.message : String(err)}`);
}
}, [customText, selectedVoiceId]);
// 将项目音频关联到分镜
const handleAssignToSegment = (audioId: string, segmentId: string) => {
setAudioMapping(segmentId, audioId);
// 同时更新分镜的 audioPath
const audio = projectAudios.find(a => a.id === audioId);
if (audio) {
updateSegment(parseInt(segmentId), { audioPath: audio.filePath });
}
toast.success('已关联到分镜');
const getPreviewUrl = (voiceId: string): string | null => {
const voice = presetVoices.find(v => v.voiceId === voiceId);
return voice?.previewUrl || null;
};
const selectedVoice = presetVoices.find(v => v.voiceId === selectedVoiceId);
const handleGenerate = useCallback(async () => {
if (!projectId) { toast.warning('请先创建项目'); return; }
const realText = segments.map(s => s.voiceover?.trim()).filter(Boolean).join('\n');
if (!realText) { toast.warning('没有需要合成的旁白文本'); return; }
// Kling TTS 限制单次 ≤1000 字,超长自动截断
const truncatedText = realText.length > 1000 ? realText.slice(0, 1000) : realText;
const progress = useProgressStore.getState();
setIsGenerating(true);
progress.show('生成配音');
try {
progress.update('正在合成语音...');
const result = await synthesizeTTS({ text: truncatedText, voiceId: selectedVoiceId, speed, volume, pitch });
if (!result.audioUrl) throw new Error('未返回音频 URL');
progress.update('正在保存音频...');
// 下载音频 blob
const response = await fetch(result.audioUrl);
if (!response.ok) throw new Error('下载音频失败');
const blob = await response.blob();
// 上传七牛云
const file = new File([blob], `tts_${Date.now()}.mp3`, { type: 'audio/mp3' });
const qiniuUrl = await uploadAudio(file);
// 本地保存
const base64 = await new Promise<string>((resolve, reject) => {
const reader = new FileReader();
reader.onloadend = () => {
const dataUrl = reader.result as string;
resolve(dataUrl.split(',')[1]);
};
reader.onerror = reject;
reader.readAsDataURL(blob);
});
const audioId = `voice_${Date.now()}`;
const meta = await saveAudio({
projectId, audioId, audioData: base64,
name: `配音-${segments.length}`, voiceId: selectedVoiceId, duration: 0,
});
for (const seg of segments) {
const segId = seg.id;
if (segId) {
setAudioMapping(segId.toString(), meta.id);
updateSegment(segId, { audioPath: meta.filePath, audioUrl: qiniuUrl });
}
}
progress.success('配音生成完成');
} catch (err) {
progress.error(err instanceof Error ? err.message : '生成失败');
} finally {
setIsGenerating(false);
}
}, [projectId, segments, selectedVoiceId, speed, volume, pitch, setAudioMapping, updateSegment]);
return (
<div className="voice-dubbing">
<div className="step-header">
<h2></h2>
<p className="step-desc">
{voicedSegments.length} {totalChars}
</p>
</div>
<div className="dubbing-layout">
{/* 左侧:音色选择 + 批量合成 */}
<div className="voice-panel">
{/* 左侧:音色 + 语速 + 生成按钮 */}
<div className="voice-sidebar">
{/* 音色选择 */}
<div className="panel-section">
<h4></h4>
<div className="voice-grid">
{presetVoices.map(voice => (
<div
key={voice.voiceId}
className={`voice-card ${voice.voiceId === selectedVoiceId ? 'selected' : ''}`}
onClick={() => setSelectedVoiceId(voice.voiceId)}
>
<div className="voice-name">
{voice.name}
{voice.recommended && <span className="recommended-tag"></span>}
</div>
<div className="voice-desc">{voice.description}</div>
</div>
))}
<div className="voice-section">
<div className="voice-section-header">
<span className="voice-section-title"></span>
</div>
</div>
{/* 音色试听 */}
<div className="panel-section">
<h4></h4>
<div className="preview-row">
<textarea
className="preview-text"
value={customText}
onChange={e => setCustomText(e.target.value)}
placeholder="输入文本试听音色(≤200字)..."
rows={3}
maxLength={200}
/>
<button
className="btn btn-secondary"
onClick={handlePreviewVoice}
disabled={!customText.trim()}
>
<div className="voice-tabs">
<button className={`voice-tab ${activeVoiceTab === 'preset' ? 'active' : ''}`} onClick={() => setActiveVoiceTab('preset')}>
({presetVoices.length})
</button>
<button className={`voice-tab ${activeVoiceTab === 'clone' ? 'active' : ''}`} onClick={() => setActiveVoiceTab('clone')}>
({voiceMaterials.filter(m => m.status === 'ready').length})
</button>
</div>
{customPreviewUrl && (
<audio ref={audioPreviewRef} src={customPreviewUrl} controls className="preview-audio" />
)}
</div>
{/* 批量合成 */}
<div className="panel-section">
<h4></h4>
<div className="batch-info">
<span>{selectedVoice?.name}</span>
<span>{voicedSegments.length} </span>
<span> {totalChars} </span>
</div>
<button
className="btn btn-primary batch-btn"
onClick={handleBatchSynthesize}
disabled={isSynthesizing || voicedSegments.length === 0}
>
{isSynthesizing
? `合成中... ${synthProgress}/${synthTotal}`
: `${voicedSegments.length} 个分镜生成配音`}
</button>
{isSynthesizing && (
<div className="progress-bar">
<div
className="progress-fill"
style={{ width: `${(synthProgress / synthTotal) * 100}%` }}
/>
</div>
)}
</div>
</div>
{/* 右侧:分镜-配音映射 */}
<div className="mapping-panel">
<div className="panel-section">
<h4></h4>
<div className="segment-voice-list">
{segments.map((seg, i) => {
const segId = seg.id?.toString() || String(i);
const audio = getAudioForSegment(segId);
const isEmptyShot = seg.type === 'empty_shot';
return (
<div key={segId} className={`seg-voice-item ${isEmptyShot ? 'empty-shot' : ''}`}>
<div className="seg-voice-info">
<span className="seg-voice-index">
{isEmptyShot ? '🎬' : '🎙️'} {i + 1}
</span>
{audio ? (
<div className="seg-has-audio">
<span className="audio-name">{audio.name}</span>
<audio
src={`file://${audio.filePath}`}
controls
className="seg-audio-player"
/>
{activeVoiceTab === 'preset' && (
<div className="voice-list">
{presetVoices.map(v => (
<div key={v.voiceId} className={`voice-row ${v.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => setSelectedVoiceId(v.voiceId)}>
<div className="voice-row-main">
<div className="voice-row-info">
<div className="voice-row-name">
{v.name}
<span className="voice-row-desc-inline">{v.description}</span>
</div>
) : (
<span className="seg-no-audio">
{isEmptyShot ? '空镜无需配音' : '未配音'}
</span>
)}
</div>
<button className="preview-icon" onClick={e => handleTogglePreview(v.voiceId, v.name, e)}>
{activePreviewVoiceId === v.voiceId ? '' : ''}
</button>
</div>
<div className="seg-voiceover">{seg.voiceover || ''}</div>
</div>
);
})}
</div>
</div>
{/* 音频文件列表 */}
{projectAudios.length > 0 && (
<div className="panel-section">
<h4></h4>
<div className="audio-file-list">
{projectAudios.map(audio => (
<div key={audio.id} className="audio-file-item">
<div className="audio-file-info">
<span className="audio-file-name">{audio.name}</span>
<span className="audio-file-size">
{(audio.fileSize / 1024).toFixed(1)} KB
</span>
</div>
<audio
src={`file://${audio.filePath}`}
controls
className="audio-file-player"
/>
{activePreviewVoiceId === v.voiceId && v.previewUrl && (
<div className="voice-preview-inline">
<audio src={v.previewUrl} controls className="voice-preview-audio" autoPlay />
</div>
)}
</div>
))}
</div>
)}
{activeVoiceTab === 'clone' && (
<div className="voice-list">
{voiceMaterials.filter(m => m.status === 'ready').length === 0 ? (
<div className="voice-empty"><br /><small></small></div>
) : (
voiceMaterials.filter(m => m.status === 'ready').map(m => (
<div key={m.voiceId} className={`voice-row ${m.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => setSelectedVoiceId(m.voiceId)}>
<div className="voice-row-main">
<div className="voice-row-info">
<div className="voice-row-name">
{m.name} <span className="tag clone"></span>
<span className="voice-row-desc-inline">
{m.createdAt ? new Date(m.createdAt).toLocaleDateString('zh-CN') : ''}
</span>
</div>
</div>
<button className="preview-icon" onClick={e => handleTogglePreview(m.voiceId, m.name, e)}>
{activePreviewVoiceId === m.voiceId ? '✕' : '▶'}
</button>
</div>
{activePreviewVoiceId === m.voiceId && m.trialUrl && (
<div className="voice-preview-inline">
<audio src={m.trialUrl} controls className="voice-preview-audio" autoPlay />
</div>
)}
</div>
))
)}
</div>
)}
</div>
{/* 语速 */}
<div className="voice-section">
<div className="voice-section-header">
<span className="voice-section-title"></span>
<span className="speed-value">{speed.toFixed(1)}x</span>
</div>
)}
<div className="speed-slider-wrap">
<span>0.5x</span>
<input
type="range"
className="slider-input"
min={5}
max={20}
step={1}
value={Math.round(speed * 10)}
onChange={e => setSpeed(parseInt(e.target.value) / 10)}
style={{ '--slider-percent': `${((Math.round(speed * 10) - 5) / 15) * 100}%` } as React.CSSProperties}
/>
<span>2.0x</span>
</div>
</div>
{/* 音量 */}
<div className="voice-section">
<div className="voice-section-header">
<span className="voice-section-title"></span>
<span className="speed-value">{volume}</span>
</div>
<div className="speed-slider-wrap">
<span>0</span>
<input
type="range"
className="slider-input"
min={0}
max={10}
step={1}
value={volume}
onChange={e => setVolume(parseInt(e.target.value))}
style={{ '--slider-percent': `${(volume / 10) * 100}%` } as React.CSSProperties}
/>
<span>10</span>
</div>
</div>
{/* 音调 */}
<div className="voice-section">
<div className="voice-section-header">
<span className="voice-section-title"></span>
<span className="speed-value">{pitch}</span>
</div>
<div className="speed-slider-wrap">
<span>-12</span>
<input
type="range"
className="slider-input"
min={-12}
max={12}
step={1}
value={pitch}
onChange={e => setPitch(parseInt(e.target.value))}
style={{ '--slider-percent': `${((pitch + 12) / 24) * 100}%` } as React.CSSProperties}
/>
<span>12</span>
</div>
</div>
{/* 底部生成按钮 */}
<div className="voice-generate-wrap">
<button className="btn btn-primary generate-btn" onClick={handleGenerate} disabled={isGenerating || !mergedText.trim()}>
{isGenerating ? '合成中...' : '生成配音'}
</button>
</div>
</div>
{/* 右侧:配音文案 */}
<div className="script-content">
<div className="script-content-header">
<span className="script-content-meta">{totalChars} · {segments.length} </span>
</div>
<textarea readOnly value={mergedText} rows={20} className="script-textarea" />
</div>
</div>
</div>
+235 -8
View File
@@ -7,7 +7,7 @@
import { create } from 'zustand';
import { useShallow } from 'zustand/react/shallow';
import type { VoiceInfo, AudioMeta } from '../api/modules/voice';
import type { VoiceInfo, AudioMeta, VoiceMaterial, AvatarMaterial } from '../api/modules/voice';
import * as voiceApi from '../api/modules/voice';
interface VoiceState {
@@ -25,9 +25,26 @@ interface VoiceState {
// 当前项目 ID
currentProjectId: string | null;
// 语速
speed: number;
// 音量 (0.5-10.0)
volume: number;
// 音调 (-10 到 10)
pitch: number;
// 加载状态
isLoadingVoices: boolean;
isLoadingAudios: boolean;
// 素材库(用户上传的克隆音色)
voiceMaterials: VoiceMaterial[];
isLoadingMaterials: boolean;
// 视频素材库
avatarMaterials: AvatarMaterial[];
isLoadingAvatarMaterials: boolean;
}
interface VoiceActions {
@@ -35,6 +52,28 @@ interface VoiceActions {
loadPresetVoices: () => Promise<void>;
setSelectedVoiceId: (id: string) => void;
// 语速
setSpeed: (speed: number) => void;
// 音量
setVolume: (volume: number) => void;
// 音调
setPitch: (pitch: number) => void;
// 素材库操作
loadVoiceMaterials: () => Promise<void>;
addVoiceMaterial: (file: File, name: string) => Promise<VoiceMaterial>;
updateVoiceMaterialStatus: (id: string, status: VoiceMaterial['status'], voiceId?: string, trialUrl?: string) => void;
renameVoiceMaterial: (id: string, name: string) => Promise<void>;
deleteVoiceMaterial: (materialId: string) => Promise<void>;
// 视频素材库操作
loadAvatarMaterials: () => Promise<void>;
addAvatarMaterial: (file: File, name: string) => Promise<AvatarMaterial>;
renameAvatarMaterial: (id: string, name: string) => Promise<void>;
deleteAvatarMaterial: (materialId: string) => Promise<void>;
// 项目音频操作
loadProjectAudios: (projectId: string) => Promise<void>;
saveAudio: (args: {
@@ -58,12 +97,19 @@ interface VoiceActions {
const initialState: VoiceState = {
presetVoices: [],
selectedVoiceId: '829826751244537879', // 温柔女声(Kling 预设音色)
selectedVoiceId: 'tianxin_xiaoling', // 甜心小玲
projectAudios: [],
audioMapping: {},
currentProjectId: null,
speed: 1.0,
volume: 0,
pitch: 0,
isLoadingVoices: false,
isLoadingAudios: false,
voiceMaterials: [],
isLoadingMaterials: false,
avatarMaterials: [],
isLoadingAvatarMaterials: false,
};
export const useVoiceStore = create<VoiceState & VoiceActions>()(
@@ -79,14 +125,57 @@ export const useVoiceStore = create<VoiceState & VoiceActions>()(
set({ presetVoices: voices });
} catch (err) {
console.error('[VoiceStore] 加载音色列表失败:', err);
// 静默失败,使用默认值(Kling 预设音色
// 静默失败,使用预设音色
set({
presetVoices: [
{ voiceId: '829826751244537879', name: '温柔女声', description: '温柔细腻', recommended: true, language: 'zh' },
{ voiceId: '829824295735410756', name: '钓系女友', description: '甜美撒娇', recommended: false, language: 'zh' },
{ voiceId: '829826792415842333', name: '播报男声', description: '沉稳播报', recommended: false, language: 'zh' },
{ voiceId: '829826834144964676', name: '盐系少年', description: '清新少年', recommended: false, language: 'zh' },
{ voiceId: '829826884271091753', name: '撒娇女友', description: '可爱撒娇', recommended: false, language: 'zh' },
{
voiceId: 'tianxin_xiaoling',
name: '甜心小玲',
description: '甜美可爱,活泼俏皮',
recommended: true,
language: 'zh',
previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/tianxin_xiaoling.mp3',
},
{
voiceId: 'danya_xuejie',
name: '淡雅学姐',
description: '淡雅知性,温婉柔和',
recommended: false,
language: 'zh',
previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/danya_xuejie.mp3',
},
{
voiceId: 'Chinese (Mandarin)_Warm_Girl',
name: '温暖少女',
description: '温暖亲切,清新自然',
recommended: false,
language: 'zh',
previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/Warm_Girl.mp3',
},
{
voiceId: 'Chinese (Mandarin)_Radio_Host',
name: '电台男主播',
description: '专业播报,沉稳有力',
recommended: false,
language: 'zh',
previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/Radio_Host.mp3',
},
{
voiceId: 'Chinese (Mandarin)_Straightforward_Boy',
name: '率真弟弟',
description: '率真爽朗,青春阳光',
recommended: false,
language: 'zh',
previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/Straightforward_Boy.mp3',
},
{
voiceId: 'Chinese (Mandarin)_Gentleman',
name: '温润男声',
description: '温润如玉,低沉磁性',
recommended: false,
language: 'zh',
previewUrl: 'https://media.liche.cn/meijiaka-zj/voice/Gentleman.mp3',
},
],
});
} finally {
@@ -96,6 +185,144 @@ export const useVoiceStore = create<VoiceState & VoiceActions>()(
setSelectedVoiceId: (id) => set({ selectedVoiceId: id }),
// ====================== 语速 ======================
setSpeed: (speed: number) => set({ speed }),
// ====================== 音量 ======================
setVolume: (volume: number) => set({ volume }),
// ====================== 音调 ======================
setPitch: (pitch: number) => set({ pitch }),
// ====================== 素材库操作 ======================
loadVoiceMaterials: async () => {
set({ isLoadingMaterials: true });
try {
const materials = await voiceApi.loadVoiceMaterials();
set({ voiceMaterials: materials });
} catch (err) {
console.error('[VoiceStore] 加载素材库失败:', err);
} finally {
set({ isLoadingMaterials: false });
}
},
addVoiceMaterial: async (file: File, name: string) => {
// 1. 上传七牛云
const sourceUrl = await voiceApi.uploadAudio(file);
// 2. 提交 Kling 克隆任务
const cloneResult = await voiceApi.submitCloneTask({
sourceAudioUrl: sourceUrl,
voiceName: name,
});
// 3. 创建本地记录
const material: VoiceMaterial = {
id: cloneResult.taskId,
name,
voiceId: '',
sourceUrl,
trialUrl: undefined,
status: 'pending',
createdAt: new Date().toISOString(),
};
// 4. 保存到本地 JSON
await voiceApi.saveVoiceMaterial(material);
set(state => ({ voiceMaterials: [material, ...state.voiceMaterials] }));
return material;
},
updateVoiceMaterialStatus: (id: string, status: VoiceMaterial['status'], voiceId?: string, trialUrl?: string) => {
set(state => {
const updated: VoiceMaterial[] = state.voiceMaterials.map((m): VoiceMaterial => {
if (m.id !== id) return m;
return {
...m,
status,
voiceId: voiceId || m.voiceId,
trialUrl: trialUrl || m.trialUrl,
};
});
// 同步保存到本地
const target = updated.find(m => m.id === id);
if (target) {
voiceApi.saveVoiceMaterial(target).catch(err => {
console.error('[VoiceStore] 保存素材状态失败:', err);
});
}
return { voiceMaterials: updated };
});
},
renameVoiceMaterial: async (id: string, name: string) => {
set(state => {
const updated = state.voiceMaterials.map(m => m.id === id ? { ...m, name } : m);
const target = updated.find(m => m.id === id);
if (target) {
voiceApi.saveVoiceMaterial(target).catch(err => {
console.error('[VoiceStore] 重命名素材失败:', err);
});
}
return { voiceMaterials: updated };
});
},
deleteVoiceMaterial: async (materialId: string) => {
await voiceApi.deleteVoiceMaterial(materialId);
set(state => ({
voiceMaterials: state.voiceMaterials.filter(m => m.id !== materialId),
}));
},
// ====================== 视频素材库操作 ======================
loadAvatarMaterials: async () => {
set({ isLoadingAvatarMaterials: true });
try {
const materials = await voiceApi.loadAvatarMaterials();
set({ avatarMaterials: materials });
} catch (err) {
console.error('[VoiceStore] 加载视频素材失败:', err);
} finally {
set({ isLoadingAvatarMaterials: false });
}
},
addAvatarMaterial: async (file: File, name: string) => {
const videoUrl = await voiceApi.uploadVideo(file);
const material: AvatarMaterial = {
id: `avatar_${Date.now()}`,
name,
videoUrl,
createdAt: new Date().toISOString(),
};
await voiceApi.saveAvatarMaterial(material);
set(state => ({ avatarMaterials: [material, ...state.avatarMaterials] }));
return material;
},
renameAvatarMaterial: async (id: string, name: string) => {
set(state => {
const updated = state.avatarMaterials.map(m => m.id === id ? { ...m, name } : m);
const target = updated.find(m => m.id === id);
if (target) {
voiceApi.saveAvatarMaterial(target).catch(err => {
console.error('[VoiceStore] 重命名素材失败:', err);
});
}
return { avatarMaterials: updated };
});
},
deleteAvatarMaterial: async (materialId: string) => {
await voiceApi.deleteAvatarMaterial(materialId);
set(state => ({
avatarMaterials: state.avatarMaterials.filter(m => m.id !== materialId),
}));
},
// ====================== 项目音频操作 ======================
loadProjectAudios: async (projectId) => {