feat: 视频创作流程全链路优化

- 后端: Vidu Provider、System API、Upload API、素材服务更新
- 前端: 字幕压制、视频生成、配音、本地存储、类型定义优化
- Rust: FFmpeg 命令、视频合成、语音命令、库注册更新
- Store: 项目状态、语音状态管理优化
- 新增: 对口型替换文档、健康检查器、字幕 API 模块、音频对齐工具
- 删除: 废弃的 polish 提示词模板
This commit is contained in:
小鱼开发
2026-04-26 21:24:42 +08:00
parent 3766a977e2
commit bc724810a6
28 changed files with 1603 additions and 563 deletions
+122
View File
@@ -0,0 +1,122 @@
# MVP 实验:对口型视频替换空镜片段
## 目标
通过音频打轴定位时间戳,用对口型后的人物视频片段替换空镜视频的对应片段。
## 完整流程
### 1. 音频打轴(Whisper
对配音音频进行语音识别,生成带时间轴的 SRT 字幕文件。
```bash
cd /Users/0fun/work/meijiaka-zj/python-api
source .venv/bin/activate
whisper "/Users/0fun/Documents/Meijiaka-zj/projects/.../audios/voice_xxx.mp3" \
--model base --language zh --output_format srt
```
**输出**56 句话,总时长 ~75s,生成 `voice_xxx.srt`
### 2. 文案定位
输入目标文案,在 SRT 中匹配对应句子,得到时间范围。
**示例**
- 文案:`"新房装修这七个时间,你必须在场盯着"`
- 匹配结果:时间范围 `0.000s ~ 4.120s`
- 涉及片段:segment 1~2
### 3. 片段截取
#### 人物视频(静音画面 + 对口型音频)
```bash
# 画面:0~4.12s
ffmpeg -y -ss 0 -t 4.12 -i video.mp4 -c:v libx264 -an clip_video.mp4
# 音频:0~4.12s
ffmpeg -y -ss 0 -t 4.12 -i video.mp4 -vn -c:a copy clip_audio.mp3
```
**音频检测**
- mean_volume: -20.8 dB
- max_volume: -3.8 dB
- 格式:24000 Hz / mono / 69 kb/s
### 4. 画面替换(FFmpeg overlay
将人物视频画面覆盖到 composed 视频对应时间段。
```bash
ffmpeg -y \
-i composed.mp4 \
-i video.mp4 \
-filter_complex \
"[1:v]setpts=PTS-STARTPTS[clip];
[0:v][clip]overlay=enable='between(t\,0\,4.12)':x=0:y=0[v]" \
-map "[v]" -map 0:a \
-c:v libx264 -crf 18 -preset fast \
-c:a copy \
composed_overlay.mp4
```
### 5. 音频拼接(关键步骤)
将对口型音频插入到 0~4.12s,原 composed 音频接在后面。
#### ❌ 第一次尝试(失败)
```bash
[a_rep]atrim=start=0:end=4.12...;
[a_tail]atrim=start=4.12...;
[a_rep][a_tail]concat=n=2:v=0:a=1[a]
```
**问题**:concat 要求所有输入流的**采样率和声道数一致**。
- 对口型音频:24000 Hz / mono
- composed 音频:44100 Hz / stereo
结果:拼接后开头 4 秒无声。
#### ✅ 修复方案
在 concat 前统一音频格式:
```bash
[1:a]aresample=44100,pan=stereo|c0=c0|c1=c0,atrim=start=0:end=4.12,asetpts=PTS-STARTPTS[a_rep];
[0:a]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,atrim=start=4.12,asetpts=PTS-STARTPTS[a_tail];
[a_rep][a_tail]concat=n=2:v=0:a=1[a]
```
### 6. 完整替换命令(最终版)
```bash
ffmpeg -y \
-i "composed.mp4" \
-i "video.mp4" \
-filter_complex \
"[1:v]setpts=PTS-STARTPTS[clip];
[0:v][clip]overlay=enable='between(t\,0\,4.12)':x=0:y=0[v];
[1:a]aresample=44100,pan=stereo|c0=c0|c1=c0,atrim=start=0:end=4.12,asetpts=PTS-STARTPTS[a_rep];
[0:a]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,atrim=start=4.12,asetpts=PTS-STARTPTS[a_tail];
[a_rep][a_tail]concat=n=2:v=0:a=1[a]" \
-map "[v]" \
-map "[a]" \
-c:v libx264 -crf 18 -preset fast \
-c:a aac -b:a 128k \
"composed_replaced.mp4"
```
**输出验证**
- 视频:H.264 / 1080x1920 / 29.82 fps
- 音频:AAC / 44100 Hz / stereo / 37 kb/s
- 前 4.12s 音量:mean -20.9 dB / max -3.7 dB(正常有声)
- 总时长:74.27 秒
## 踩坑记录
| 问题 | 原因 | 解决方案 |
|------|------|----------|
| 音频拼接后开头无声 | concat 输入流采样率/声道不一致 | 用 `aresample` + `pan` 统一为 44100 Hz stereo |
| 视频画面不同步 | overlay 时间范围写错 | 确认 `between(t,0,4.12)` 与文案定位时间一致 |
| 音频格式差异 | 对口型音频 24000Hz monocomposed 44100Hz stereo | 拼接前强制格式统一 |
## 后续可优化
1. **音频过渡**concat 是硬切,可在拼接处加 `acrossfade` 实现淡入淡出
2. **音量平衡**:对口型音频与 composed 音频音量差异大,可用 `volume` 滤镜统一
3. **自动化**:将以上步骤封装为 Python/Rust 函数,输入文案自动完成定位→截取→替换
@@ -1,13 +0,0 @@
你是一位口播短视频专家。请润色以下空镜画面描述,使其更适合AI视频生成:
【原文】
{content}
【要求】
- 保持原意,优化细节
- 重点强调场景环境、空间氛围、光影效果、材质质感
- 可以描述静态景物、装修细节、空间布局
- 不要有"镜头""特写""机位"等摄影术语
- 控制好字数,字数不能与原文差距超过20个字
直接输出润色后的描述,不要添加任何说明:
@@ -1,13 +0,0 @@
你是一位【口播短视频】专家。请润色以下分镜画面描述,使其更适合AI视频生成:
【原文】
{content}
【要求】
- 保持原意,优化细节
- 重点强调人物神态、表情、动作、姿态
- 描述人物与镜头前观众的互动
- 不要有"镜头""特写""机位"等摄影术语
- 控制好字数,字数不能与原文差距超过20个字
直接输出润色后的描述,不要添加任何说明:
@@ -0,0 +1,71 @@
你是一位专业的【口播类短视频】脚本创作专家,专注于家装/装修领域的抖音/视频号口播内容创作。
【平台适配要求】
1. 竖屏拍摄(9:16比例),画面构图以人物为主体
2. 台词口语化、接地气,像跟朋友聊天,避免"综上所述""研究表明"等书面语
3. 语速稍快有节奏感,每秒4个纯文字(不含标点),每句15-25字(对应3.75-6.25秒),一口气说完不换气,不拖沓
4. 避免专业术语堆砌,用业主听得懂的大白话
5. 符合新媒体用户观看习惯:3秒定生死,节奏紧凑
【文案要求】
请严格按照以下固定结构,生成装修现场监工类口播文案,要求语言口语化、有警示性,贴合装修业主视角,结构严格不变,内容围绕“新房装修一定要在场的7个时间”展开,每部分内容完整,总文案包含标点符号不得超过450字:
开篇总起:明确核心警示——新房装修一定要在场的7个时间,尤其最后一个,直接关系家里是不是甲醛房,提醒认真看完,避免后期返工、踩坑受害,语气直接、有紧迫感。
分点阐述(7点,严格遵循此顺序和格式):
每点均按照“监工场景+必做核查事项+不盯工的核心隐患”撰写,语言接地气,有劝诫感,避免生硬说教:
第1点:砸墙时必须在场,盯紧师傅封好下水口,避免管道堵塞,后期还要跑楼下疏通
第2点:封窗时一定要在场,监督师傅做好防水斜坡,防止下雨天雨水往屋里倒灌
第3点:水电验收必须在场,核对插座点位、检查强弱电包裹情况,记得拍照留存,避免后期返工
第4点:防水瓷砖验收必在场,做闭水试验检查是否漏水,核对瓷砖型号,防止色差导致重铺
第5点:贴砖时要在场,检查瓷砖平整度、空鼓率,确保阴阳角方正、缝隙均匀合格
第6点:木工吊顶必在场,要求拐角用整板、接缝做V型槽,杜绝后期乳胶漆开裂
第7点:刮腻子一定要在场,严禁师傅往腻子中加胶水,避免甲醛超标,变成毒气房
结尾引流:补充提示——准备装修的朋友,我整理了避坑手册,评论区回避坑直接领取参考,帮你装修少踩坑,语气亲切贴合业主需求。
提示:文案整体风格通俗好记,有代入感,符合普通装修业主的认知,避免专业术语过多,每部分内容饱满,不遗漏核心要点,严格匹配上述结构,不新增、不删减板块。
【素材库标题】
网红开篇
铺砖施工
吊顶施工
美缝施工
水电施工
壁纸施工
刮腻子
木工施工
柜子安装
乳胶漆
签合同
背景墙
【分镜结构】
开篇的分镜为:网红开头+人物出镜3秒+空镜补充
分点阐述全部用空镜
结尾人物出镜3秒+空镜补充
每个分镜时长不得少于3秒,且不得高于8秒,可以是一位小数,如3.1秒
且每个分镜配音文案的文字数量对应每秒4-5个纯文字(不含标点)
总分镜时长为:文案总字数/4
"segment"(主播口播出镜)对应"人物出镜",且时长为3秒(对应12字左右纯文字)
人物出镜画面的内容,可以不用完整的句子,句子可以延伸到下一个画面
"empty_shot"(空镜补充)对应"素材库标题"
配音文案必须要有标点符号断句,避免大长句,如:水电装错毁一生,错一个,返工就要好几万。
【输出格式要求】
输出的内容必须包含以下两部分
一、分镜内容
- id:1
- type:"segment"(主播口播出镜)或 "empty_shot"(空镜补充)
- scene:"人物出镜"或"素材库标题"
- voiceover: 配音文案(必填,口语化15-25字/句,对应4-6秒)
- duration: 时长(如 "5s",根据字数生成,严格按每秒4字、不含标点,可保留2位小数,如12个字3.00s17个字4.25s19个字4.75s
注意:只输出纯 JSON,不要包含 markdown 代码块或其他说明文字。
【示例】
[
{
"id": 1,
"type": "empty_shot",
"scene": "网红开篇",
"voiceover": "装修签合同别踩坑!固定模板千万别直接签!",
"duration": 3
},
{
"id": 2,
"type": "segment",
"scene": "人物出镜",
"voiceover": "这8条内容,必须白纸黑字写进合同里!",
"duration": 3
}
]
+3 -1
View File
@@ -74,12 +74,14 @@ class ViduProvider:
if payload:
body["payload"] = payload
logger.info(f"[Vidu TTS] 请求参数: text_length={len(text)}, body={body}")
async with aiohttp.ClientSession() as session:
async with session.post(url, json=body, headers=self._get_headers()) as resp:
data = await resp.json()
if resp.status != 200 or data.get("state") == "failed":
msg = data.get("err_code") or data.get("message") or f"HTTP {resp.status}"
logger.error(f"[Vidu TTS] 请求失败: url={url}, status={resp.status}, headers={self._get_headers()}, body={body}, response={data}")
logger.error(f"[Vidu TTS] 请求失败: url={url}, status={resp.status}, response={data}")
raise Exception(f"Vidu TTS error: {msg}")
return data
+29 -9
View File
@@ -3,8 +3,10 @@
============
"""
from fastapi import APIRouter
from fastapi import APIRouter, status
from fastapi.responses import JSONResponse
from app.core.health_checker import check_database, check_redis
from app.schemas.common import ApiResponse, success_response
router = APIRouter()
@@ -13,16 +15,34 @@ router = APIRouter()
@router.get("/health", response_model=ApiResponse[dict])
async def system_health():
"""系统健康检查(详细版)"""
return success_response(
data={
"status": "healthy",
"services": {
"api": "up",
"database": "unknown", # TODO: 检查数据库连接
"redis": "unknown", # TODO: 检查 Redis 连接
db_ok, db_msg = await check_database()
redis_ok, redis_msg = await check_redis()
services = {
"api": "up",
"database": "connected" if db_ok else db_msg,
"redis": "connected" if redis_ok else redis_msg,
}
if not db_ok:
return JSONResponse(
status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
content={
"code": status.HTTP_503_SERVICE_UNAVAILABLE,
"message": "数据库连接异常",
"data": {"status": "unhealthy", "services": services},
},
},
)
if not redis_ok:
return success_response(
message="Redis 连接异常,服务降级",
data={"status": "degraded", "services": services},
)
return success_response(
message="系统运行正常",
data={"status": "healthy", "services": services},
)
+84
View File
@@ -187,3 +187,87 @@ async def upload_image(
except Exception as e:
logger.error(f"[Upload] 图片上传失败: {e}")
raise HTTPException(status_code=500, detail=f"上传失败: {e}")
@router.post("/audio", response_model=ApiResponse[UploadResponse])
async def upload_audio(
file: UploadFile = File(..., description="音频文件"),
):
"""
上传音频到七牛云
支持格式:mp3, wav, aac, m4a, ogg, flac
"""
try:
allowed_types = {
"audio/mpeg",
"audio/mp3",
"audio/wav",
"audio/x-wav",
"audio/aac",
"audio/mp4",
"audio/ogg",
"audio/flac",
"audio/x-flac",
}
content_type = file.content_type or ""
if not content_type:
ext = Path(file.filename or "").suffix.lower()
ext_to_mime = {
".mp3": "audio/mpeg",
".wav": "audio/wav",
".aac": "audio/aac",
".m4a": "audio/mp4",
".ogg": "audio/ogg",
".flac": "audio/flac",
}
content_type = ext_to_mime.get(ext, "")
if content_type not in allowed_types:
raise HTTPException(
status_code=400,
detail=f"不支持的音频格式: {content_type},请上传 mp3/wav/aac/m4a/ogg/flac",
)
content = await file.read()
if not content:
raise HTTPException(status_code=400, detail="文件内容为空")
ext = Path(file.filename or "audio.mp3").suffix or ".mp3"
unique_name = f"{uuid.uuid4().hex[:16]}{ext}"
qiniu = get_qiniu_service()
# 复用视频 bucket(或根据配置使用音频 bucket)
bucket, domain = qiniu._get_bucket_and_domain("video")
key = qiniu.generate_key("audio", unique_name)
stream = io.BytesIO(content)
result = qiniu.upload_stream(
stream=stream,
key=key,
mime_type=content_type or "audio/mpeg",
bucket=bucket,
domain=domain,
)
url = result.get("url")
key = result.get("key")
if not url:
raise HTTPException(status_code=500, detail="上传到七牛云失败:未返回 URL")
logger.info(f"[Upload] 音频上传成功: {url[:80]}..., size={len(content)}")
return success_response(
data=UploadResponse(
url=url,
key=key or unique_name,
size=len(content),
)
)
except HTTPException:
raise
except Exception as e:
logger.error(f"[Upload] 音频上传失败: {e}")
raise HTTPException(status_code=500, detail=f"上传失败: {e}")
+34
View File
@@ -0,0 +1,34 @@
"""
健康检查
========
"""
import asyncio
from sqlalchemy import text
from app.core.redis_client import get_redis_client
from app.db.session import async_engine
async def check_database(timeout: float = 2.0) -> tuple[bool, str]:
"""检查数据库连接"""
try:
async with asyncio.timeout(timeout):
async with async_engine.connect() as conn:
await conn.execute(text("SELECT 1"))
await conn.commit()
return True, "connected"
except Exception as e:
return False, str(e)
async def check_redis(timeout: float = 2.0) -> tuple[bool, str]:
"""检查 Redis 连接"""
try:
async with asyncio.timeout(timeout):
redis = get_redis_client()
await redis.ping()
return True, "connected"
except Exception as e:
return False, str(e)
+12 -24
View File
@@ -64,9 +64,9 @@ def match_material(scene: str, required_duration: float, exclude_urls: list[str]
根据场景描述和所需时长匹配空镜素材
策略:
1. 收集所有满足时长要求(duration >= required_duration)的素材
2. 收集全局差值最近的 5 个素材
3. 合并去重后从候选池中随机选取,优先排除已使用的
1. 严格匹配分类(scene 必须完全匹配 keywords 中的关键词)
2. 过滤掉时长小于 required_duration 的素材
3. 从剩余素材中排除已使用的,随机选取
Args:
scene: 分镜场景描述
@@ -78,38 +78,26 @@ def match_material(scene: str, required_duration: float, exclude_urls: list[str]
"""
exclude_urls = exclude_urls or []
for keyword, slug in _keywords.items():
if keyword in scene:
if keyword == scene:
all_materials = _materials.get(slug, [])
if not all_materials:
return None
# 1. 满足时长要求的素材
# 1. 过滤掉时长小于 required_duration 的素材
matching = [m for m in all_materials if m["duration"] >= required_duration]
if not matching:
return None
# 2. 差值最近的 5 个素材(全局)
sorted_by_diff = sorted(all_materials, key=lambda m: abs(m["duration"] - required_duration))
closest_5 = sorted_by_diff[:5]
# 3. 合并候选池并去重(matching 在前,优先保留满足时长的)
candidate_pool = []
seen = set()
for m in matching + closest_5:
if m["url"] not in seen:
candidate_pool.append(m)
seen.add(m["url"])
# 4. 排除已使用的,从中随机选
unused = [m for m in candidate_pool if m["url"] not in exclude_urls]
# 2. 排除已使用的,从中随机选
unused = [m for m in matching if m["url"] not in exclude_urls]
if unused:
return random.choice(unused)
# 5. 严格模式下不允许返回已排除的素材
# 3. 严格模式下不允许返回已排除的素材
if strict:
return None
# 6. 非严格模式:全部用完则允许重复
if candidate_pool:
return random.choice(candidate_pool)
# 4. 非严格模式:全部用完则允许重复
return random.choice(matching)
return None
return None
+4 -1
View File
@@ -25,13 +25,16 @@
"灯槽灯带": "ceiling",
"乳胶漆色卡": "paint",
"墙面工艺": "paint",
"刮腻子": "paint",
"艺术漆选样": "paint",
"腻子打磨": "putty",
"橱柜": "cabinet",
"木工施工": "cabinet",
"验收标准": "final",
"网红开篇": "intro",
"合同": "contract"
"壁纸施工": "wallpaper",
"合同": "contract",
"签合同": "contract"
},
"materials": {
"ceiling": [
+105
View File
@@ -0,0 +1,105 @@
# 美家卡智剪 - 开发服务器配置
# 自包含:PostgreSQL + Redis + API + Scheduler
# usage: docker compose -f docker-compose.dev.yml up -d --build
version: "3.8"
services:
db:
image: postgres:15-alpine
container_name: meijiaka-dev-db
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: meijiaka_dev
volumes:
- postgres_dev_data:/var/lib/postgresql/data
ports:
- "127.0.0.1:5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
networks:
- dev-network
redis:
image: redis:7-alpine
container_name: meijiaka-dev-redis
volumes:
- redis_dev_data:/data
ports:
- "127.0.0.1:6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 5s
retries: 5
networks:
- dev-network
api:
build:
context: .
dockerfile: Dockerfile
container_name: meijiaka-dev-api
environment:
- ENV=development
- DEBUG=true
- DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/meijiaka_dev
- REDIS_HOST=redis
- REDIS_PORT=6379
- REDIS_DB=0
- SECRET_KEY=dev-secret-key-do-not-use-in-prod
- MINIMAX_API_KEY=${MINIMAX_API_KEY}
- MINIMAX_BASE_URL=${MINIMAX_BASE_URL:-https://api.minimaxi.com}
- VIDU_API_KEY=${VIDU_API_KEY}
- VIDU_BASE_URL=${VIDU_BASE_URL:-https://api.vidu.cn}
- LOG_LEVEL=DEBUG
volumes:
- .:/app
- ../data:/root/Documents/Meijiaka-zj
ports:
- "8080:8000"
command: gunicorn app.main:app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 --reload
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
networks:
- dev-network
scheduler:
build:
context: .
dockerfile: Dockerfile
container_name: meijiaka-dev-scheduler
environment:
- ENV=development
- DEBUG=true
- DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/meijiaka_dev
- REDIS_HOST=redis
- REDIS_PORT=6379
- REDIS_DB=0
- SECRET_KEY=dev-secret-key-do-not-use-in-prod
volumes:
- .:/app
- ../data:/root/Documents/Meijiaka-zj
command: python -m app.scheduler.main
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
networks:
- dev-network
volumes:
postgres_dev_data:
redis_dev_data:
networks:
dev-network:
driver: bridge
+275
View File
@@ -0,0 +1,275 @@
#!/usr/bin/env python3
"""
视频片段替换 MVP
================
基于音频文字内容,用人物视频的对应片段替换空镜视频的对应片段。
前置依赖:
pip install openai-whisper
用法示例:
python scripts/video-replace-mvp.py \
--person person.mp4 \
--broll broll.mp4 \
--query "水电改造要注意"
原理:
1. Whisper 识别人物视频音频 → 输出带时间戳的文案
2. 文本匹配找到目标时间段 [start, end]
3. FFmpeg overlay 滤镜:在 [start, end] 区间用人物画面覆盖空镜画面
"""
from __future__ import annotations
import argparse
import json
import shutil
import subprocess
import sys
from difflib import SequenceMatcher
from pathlib import Path
def check_dep(name: str) -> str | None:
"""检查系统命令是否存在"""
path = shutil.which(name)
return path
def ensure_whisper():
"""确保 whisper 可用"""
try:
import whisper # noqa: F401
return True
except ImportError:
print("❌ 未安装 openai-whisper")
print(" 安装命令:pip install openai-whisper")
print(" (首次会自动下载模型,base 模型约 150MB)")
return False
def run_whisper(video_path: str, model: str = "base") -> list[dict]:
"""Whisper 识别,返回 segment 列表(含 start/end/text"""
import whisper
print(f" 加载模型:{model}")
model_obj = whisper.load_model(model)
print(f" 识别中...(模型:{model},视频:{Path(video_path).name}")
result = model_obj.transcribe(
video_path,
language="zh",
word_timestamps=False, # segment 级别够用了
fp16=False, # CPU 友好
)
return result["segments"]
def find_time_range(
segments: list[dict],
query: str,
threshold: float = 0.6,
) -> tuple[float, float, str] | None:
"""
根据查询文字匹配时间段
匹配策略(优先级递减):
1. 精确子串匹配
2. 模糊匹配(最长公共子序列相似度 ≥ threshold)
"""
query = query.strip()
# 1. 精确子串匹配
for seg in segments:
text = seg["text"].strip()
if query in text:
return seg["start"], seg["end"], text
# 2. 模糊匹配
best = None
best_score = 0.0
for seg in segments:
text = seg["text"].strip()
score = SequenceMatcher(None, query, text).ratio()
if score > best_score and score >= threshold:
best_score = score
best = seg
if best:
return best["start"], best["end"], best["text"].strip()
return None
def get_video_info(video_path: str) -> dict:
"""ffprobe 获取视频信息"""
cmd = [
"ffprobe", "-v", "error",
"-select_streams", "v:0",
"-show_entries", "stream=width,height,r_frame_rate,duration",
"-show_entries", "format=duration",
"-of", "json",
video_path,
]
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
data = json.loads(result.stdout)
stream = data.get("streams", [{}])[0]
fmt = data.get("format", {})
# 解析帧率(如 "25/1" → 25.0
fps_str = stream.get("r_frame_rate", "25/1")
if "/" in fps_str:
num, den = fps_str.split("/")
fps = float(num) / float(den)
else:
fps = float(fps_str)
return {
"width": stream.get("width", 1920),
"height": stream.get("height", 1080),
"fps": fps,
"duration": float(fmt.get("duration", stream.get("duration", 0))),
}
def replace_with_overlay(
person_video: str,
broll_video: str,
start: float,
end: float,
output: str,
crf: int = 18,
):
"""
用 FFmpeg overlay 滤镜替换片段
逻辑:
- 输入0 (broll):底图 + 音频
- 输入1 (person):被截取的画面片段
- [1:v] trim → 截取 [start, end] → setpts 归零 → scale 适配分辨率
- [0:v][clip] overlay → 在 between(t,start,end) 时显示 clip
- 输出:画面 = 替换后的视频,音频 = 原 broll 音频
"""
duration = end - start
broll_info = get_video_info(broll_video)
w, h = broll_info["width"], broll_info["height"]
print(f" 空镜分辨率:{w}x{h}, 帧率:{broll_info['fps']:.2f}fps")
print(f" 截取人物片段:{start:.3f}s ~ {end:.3f}s{duration:.3f}s")
print(f" 正在渲染...CRF={crf}")
# overlay filter
# 注意:between(t,start,end) 中的逗号需要转义
filter_graph = (
f"[1:v]trim=start={start}:end={end},"
f"setpts=PTS-STARTPTS,"
f"scale={w}:{h}:force_original_aspect_ratio=decrease,"
f"pad={w}:{h}:(ow-iw)/2:(oh-ih)/2:black[clip];"
f"[0:v][clip]overlay="
f"enable='between(t\\,{start}\\,{end})':"
f"x=(W-w)/2:y=(H-h)/2[v]"
)
cmd = [
"ffmpeg", "-y",
"-i", broll_video,
"-i", person_video,
"-filter_complex", filter_graph,
"-map", "[v]",
"-map", "0:a",
"-c:v", "libx264", "-crf", str(crf), "-preset", "fast",
"-c:a", "copy",
"-movflags", "+faststart",
output,
]
subprocess.run(cmd, check=True, capture_output=True)
print(f"✅ 输出完成:{output}")
def save_srt(segments: list[dict], path: str):
"""保存 SRT 字幕供人工校对"""
def fmt(s: float) -> str:
h = int(s // 3600)
m = int((s % 3600) // 60)
sec = int(s % 60)
ms = int((s % 1) * 1000)
return f"{h:02d}:{m:02d}:{sec:02d},{ms:03d}"
with open(path, "w", encoding="utf-8") as f:
for i, seg in enumerate(segments, 1):
f.write(f"{i}\n{fmt(seg['start'])} --> {fmt(seg['end'])}\n{seg['text'].strip()}\n\n")
def main():
parser = argparse.ArgumentParser(description="基于音频文字的视频片段替换 MVP")
parser.add_argument("--person", required=True, help="人物出镜视频路径(提供画面)")
parser.add_argument("--broll", required=True, help="空镜视频路径(提供底图+音频)")
parser.add_argument("--query", required=True, help="要替换的文案(如:水电改造要注意)")
parser.add_argument("--output", default="output_replaced.mp4", help="输出文件路径")
parser.add_argument("--model", default="base", choices=["tiny", "base", "small"],
help="Whisper 模型,tiny 最快,small 最准")
parser.add_argument("--crf", type=int, default=18, help="视频质量(0=无损,23=默认,越大越小)")
parser.add_argument("--threshold", type=float, default=0.6,
help="模糊匹配阈值(0~1),低于此值视为未匹配")
args = parser.parse_args()
# 0. 依赖检查
if not check_dep("ffmpeg"):
print("❌ 未找到 ffmpeg")
sys.exit(1)
if not check_dep("ffprobe"):
print("❌ 未找到 ffprobe")
sys.exit(1)
if not ensure_whisper():
sys.exit(1)
for p in (args.person, args.broll):
if not Path(p).exists():
print(f"❌ 文件不存在:{p}")
sys.exit(1)
# 1. ASR 识别人物视频
print(f"\n🎙️ Step 1/3:识别人物视频音频")
segments = run_whisper(args.person, args.model)
print(f" 识别到 {len(segments)} 句话")
# 保存字幕供参考
srt_path = str(Path(args.output).with_suffix(".srt"))
save_srt(segments, srt_path)
print(f"📝 字幕已保存:{srt_path}")
# 2. 文本匹配
print(f"\n🔍 Step 2/3:查找文案「{args.query}")
result = find_time_range(segments, args.query, threshold=args.threshold)
if not result:
print(f"❌ 未找到匹配文案(阈值 {args.threshold}")
print(f" 建议:查看 {srt_path} 里的实际文案,调整 --query 内容")
sys.exit(1)
start, end, matched_text = result
print(f" 匹配文案:「{matched_text}")
print(f" 时间段: {start:.3f}s ~ {end:.3f}s(时长 {end - start:.3f}s")
# 3. FFmpeg 替换
print(f"\n🎬 Step 3/3:替换片段")
replace_with_overlay(
args.person,
args.broll,
start,
end,
args.output,
crf=args.crf,
)
print(f"\n🎉 全部完成!")
print(f" 输出文件:{args.output}")
print(f" 字幕参考:{srt_path}")
if __name__ == "__main__":
main()
+88 -109
View File
@@ -39,13 +39,16 @@ pub struct ComposeVideoResult {
pub duration: f64,
}
/// 上传视频响应
/// 通用上传响应
#[derive(Debug, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct UploadVideoResult {
pub struct UploadResult {
pub url: String,
}
/// 兼容旧命名
pub type UploadVideoResult = UploadResult;
/// 获取项目视频目录
fn get_project_video_dir(project_id: &str) -> Result<std::path::PathBuf, String> {
let docs_dir = dirs::document_dir().ok_or("无法获取文档目录")?;
@@ -166,6 +169,53 @@ pub async fn compose_video(
}
}
/// 截取视频片段请求参数
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct ExtractVideoSegmentArgs {
pub input_path: String,
pub start: f64,
pub duration: f64,
pub output_path: String,
}
/// 截取视频片段(FFmpeg clip_video 封装)
#[tauri::command]
pub async fn extract_video_segment(
app: AppHandle,
args: ExtractVideoSegmentArgs,
) -> ApiResponse<String> {
let safe_output = match sanitize_output_path(&args.output_path) {
Ok(p) => p,
Err(e) => return ApiResponse { code: 500, message: e, data: None },
};
let safe_input = if args.input_path.starts_with("http://") || args.input_path.starts_with("https://") {
args.input_path.clone()
} else if std::path::Path::new(&args.input_path).exists() {
args.input_path.clone()
} else {
return ApiResponse {
code: 500,
message: format!("输入文件不存在: {}", args.input_path),
data: None,
};
};
match ffmpeg_cmd::clip_video(&app, &safe_input, args.start, args.duration, &safe_output).await {
Ok(_) => ApiResponse {
code: 200,
message: "视频片段截取成功".to_string(),
data: Some(safe_output),
},
Err(e) => ApiResponse {
code: 500,
message: format!("截取视频片段失败: {}", e),
data: None,
},
}
}
/// 上传视频请求参数
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
@@ -173,47 +223,45 @@ pub struct UploadVideoArgs {
pub local_path: String,
}
/// 上传本地视频到后端,后端上传到七牛云并返回 URL
#[tauri::command]
pub async fn upload_video_file(
local_path: String,
) -> ApiResponse<UploadVideoResult> {
// 读取本地文件
let file_bytes = match std::fs::read(&local_path) {
/// 通用文件上传本地 → 后端 → 七牛云)
pub async fn upload_file_to_backend(
local_path: &str,
endpoint: &str,
default_filename: &str,
mime_type: &str,
read_error_prefix: &str,
) -> ApiResponse<UploadResult> {
let file_bytes = match std::fs::read(local_path) {
Ok(bytes) => bytes,
Err(e) => {
return ApiResponse {
code: 500,
message: format!("读取视频文件失败: {}", e),
message: format!("{}: {}", read_error_prefix, e),
data: None,
};
}
};
// 获取文件名
let filename = std::path::Path::new(&local_path)
let filename = std::path::Path::new(local_path)
.file_name()
.and_then(|n| n.to_str())
.unwrap_or("video.mp4")
.unwrap_or(default_filename)
.to_string();
// 构建 multipart 请求
let backend_url = crate::PYTHON_API_BASE_URL;
let upload_url = format!("{}/upload/video", backend_url);
let upload_url = format!("{}{}", backend_url, endpoint);
let client = reqwest::Client::new();
// 构建 multipart form
let form = reqwest::multipart::Form::new()
.part(
"file",
reqwest::multipart::Part::bytes(file_bytes)
.file_name(filename)
.mime_str("video/mp4")
.mime_str(mime_type)
.unwrap_or_else(|_| reqwest::multipart::Part::bytes(vec![])),
);
// 发送请求
let response = match client.post(&upload_url).multipart(form).send().await {
Ok(resp) => resp,
Err(e) => {
@@ -235,7 +283,6 @@ pub async fn upload_video_file(
};
}
// 解析响应
let result: serde_json::Value = match response.json().await {
Ok(data) => data,
Err(e) => {
@@ -247,7 +294,6 @@ pub async fn upload_video_file(
}
};
// 提取 URL
let url = result
.get("data")
.and_then(|d| d.get("url"))
@@ -258,7 +304,7 @@ pub async fn upload_video_file(
Some(url) => ApiResponse {
code: 200,
message: "上传成功".to_string(),
data: Some(UploadVideoResult { url }),
data: Some(UploadResult { url }),
},
None => ApiResponse {
code: 500,
@@ -268,30 +314,25 @@ pub async fn upload_video_file(
}
}
/// 上传本地视频到后端,后端上传到七牛云并返回 URL
#[tauri::command]
pub async fn upload_video_file(
local_path: String,
) -> ApiResponse<UploadVideoResult> {
upload_file_to_backend(
&local_path,
"/upload/video",
"video.mp4",
"video/mp4",
"读取视频文件失败",
).await
}
/// 上传本地图片到后端,后端上传到七牛云并返回 URL
#[tauri::command]
pub async fn upload_image_file(
local_path: String,
) -> ApiResponse<UploadVideoResult> {
// 读取本地文件
let file_bytes = match std::fs::read(&local_path) {
Ok(bytes) => bytes,
Err(e) => {
return ApiResponse {
code: 500,
message: format!("读取图片文件失败: {}", e),
data: None,
};
}
};
// 获取文件名和扩展名,推断 mime type
let filename = std::path::Path::new(&local_path)
.file_name()
.and_then(|n| n.to_str())
.unwrap_or("image.jpg")
.to_string();
let ext = std::path::Path::new(&local_path)
.extension()
.and_then(|e| e.to_str())
@@ -306,75 +347,13 @@ pub async fn upload_image_file(
_ => "image/jpeg",
};
// 构建 multipart 请求
let backend_url = crate::PYTHON_API_BASE_URL;
let upload_url = format!("{}/upload/image", backend_url);
let client = reqwest::Client::new();
// 构建 multipart form
let form = reqwest::multipart::Form::new()
.part(
"file",
reqwest::multipart::Part::bytes(file_bytes)
.file_name(filename)
.mime_str(mime_type)
.unwrap_or_else(|_| reqwest::multipart::Part::bytes(vec![])),
);
// 发送请求
let response = match client.post(&upload_url).multipart(form).send().await {
Ok(resp) => resp,
Err(e) => {
return ApiResponse {
code: 500,
message: format!("上传请求失败: {}", e),
data: None,
};
}
};
if !response.status().is_success() {
let status = response.status();
let error_text = response.text().await.unwrap_or_default();
return ApiResponse {
code: status.as_u16() as i32,
message: format!("上传失败: {} - {}", status, error_text),
data: None,
};
}
// 解析响应
let result: serde_json::Value = match response.json().await {
Ok(data) => data,
Err(e) => {
return ApiResponse {
code: 500,
message: format!("解析上传响应失败: {}", e),
data: None,
};
}
};
// 提取 URL
let url = result
.get("data")
.and_then(|d| d.get("url"))
.and_then(|u| u.as_str())
.map(|s| s.to_string());
match url {
Some(url) => ApiResponse {
code: 200,
message: "上传成功".to_string(),
data: Some(UploadVideoResult { url }),
},
None => ApiResponse {
code: 500,
message: "上传响应中未找到 URL".to_string(),
data: None,
},
}
upload_file_to_backend(
&local_path,
"/upload/image",
"image.jpg",
mime_type,
"读取图片文件失败",
).await
}
/// 下载文件请求参数
+63
View File
@@ -191,3 +191,66 @@ pub async fn get_project_audios_dir(
},
}
}
// --------------------- 音频截取与上传 ---------------------
#[derive(serde::Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct ExtractAudioSegmentArgs {
pub input_path: String,
pub start: f64,
pub duration: f64,
pub output_path: String,
}
/// 截取音频片段(FFmpeg
#[tauri::command]
pub async fn extract_audio_segment(
app: tauri::AppHandle,
args: ExtractAudioSegmentArgs,
) -> ApiResponse<String> {
match crate::ffmpeg_cmd::extract_audio_segment(
&app,
&args.input_path,
args.start,
args.duration,
&args.output_path,
).await {
Ok(_) => ApiResponse {
code: 200,
message: "Audio segment extracted successfully".to_string(),
data: Some(args.output_path),
},
Err(e) => ApiResponse {
code: 500,
message: format!("Failed to extract audio segment: {}", e),
data: None,
},
}
}
/// 上传本地音频文件到后端,后端上传到七牛云并返回 URL
#[tauri::command]
pub async fn upload_audio_file(
local_path: String,
) -> ApiResponse<crate::commands::video_compose::UploadVideoResult> {
// 验证路径安全
let safe_path = match crate::ffmpeg_cmd::sanitize_output_path(&local_path) {
Ok(p) => p,
Err(e) => {
return ApiResponse {
code: 500,
message: format!("路径验证失败: {}", e),
data: None,
};
}
};
crate::commands::video_compose::upload_file_to_backend(
&safe_path,
"/upload/audio",
"audio.mp3",
"audio/mpeg",
"读取音频文件失败",
).await
}
+41 -10
View File
@@ -129,7 +129,7 @@ pub async fn run_ffmpeg(app: &AppHandle, args: Vec<String>) -> Result<String, St
}
/**
* 标准化单个视频片段 (调整为 1080:1920, 30fps, libx264, aac 44100Hz stereo)
* 标准化单个视频片段 (调整为 1080:1920, 25fps, libx264, aac 44100Hz stereo)
*/
pub async fn standardize_video(app: &AppHandle, input_path: &str, output_path: &str) -> Result<(), String> {
// 验证路径安全
@@ -138,14 +138,14 @@ pub async fn standardize_video(app: &AppHandle, input_path: &str, output_path: &
let args = vec![
"-i".to_string(), safe_input,
"-vf".to_string(), "fps=30,scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2,format=yuv420p".to_string(),
"-vf".to_string(), "fps=25,scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2,format=yuv420p".to_string(),
"-c:v".to_string(), "libx264".to_string(),
"-c:a".to_string(), "aac".to_string(),
"-ar".to_string(), "44100".to_string(),
"-ac".to_string(), "2".to_string(),
"-preset".to_string(), "veryfast".to_string(),
"-crf".to_string(), "23".to_string(),
"-r".to_string(), "30".to_string(),
"-r".to_string(), "25".to_string(),
"-y".to_string(),
safe_output
];
@@ -233,7 +233,6 @@ pub async fn add_audio_to_video(app: &AppHandle, video_path: &str, audio_path: &
"-ar".to_string(), "44100".to_string(), // 统一采样率
"-map".to_string(), "0:v:0".to_string(),
"-map".to_string(), "1:a:0".to_string(),
"-shortest".to_string(),
"-y".to_string(),
safe_output
];
@@ -241,7 +240,7 @@ pub async fn add_audio_to_video(app: &AppHandle, video_path: &str, audio_path: &
}
/**
* 将封面图转换为一段短视频 (0.5s, 1080x1920, 30fps)
* 将封面图转换为一段短视频 (0.5s, 1080x1920, 25fps)
* 带静音音频轨道,避免 concat 时丢失后续片段音频
*/
pub async fn create_cover_video(app: &AppHandle, input_path: &str, output_path: &str, duration: &str) -> Result<(), String> {
@@ -259,7 +258,7 @@ pub async fn create_cover_video(app: &AppHandle, input_path: &str, output_path:
"-t".to_string(), duration.to_string(),
"-pix_fmt".to_string(), "yuv420p".to_string(),
"-vf".to_string(), "scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2,setsar=1".to_string(),
"-r".to_string(), "30".to_string(),
"-r".to_string(), "25".to_string(),
"-shortest".to_string(),
"-y".to_string(),
safe_output
@@ -487,7 +486,6 @@ pub async fn replace_audio_track(
// 只保留第一个视频流和第一个音频流
"-map".to_string(), "0:v:0".to_string(),
"-map".to_string(), "1:a:0".to_string(),
"-shortest".to_string(),
"-y".to_string(),
safe_output,
];
@@ -546,7 +544,7 @@ pub async fn mix_audio_tracks(
/**
* 裁剪视频片段(支持本地文件和 HTTP URL)
*
* 从起始时间裁剪指定时长,同时标准化输出格式(1080x1920, 30fps, libx264, aac)。
* 从起始时间裁剪指定时长,同时标准化输出格式(1080x1920, 25fps, libx264, aac)。
* 适用于从人物形象素材或空镜素材中提取指定时长的片段。
*/
pub async fn clip_video(
@@ -575,14 +573,14 @@ pub async fn clip_video(
"-ss".to_string(), start_str,
"-t".to_string(), duration_str,
"-i".to_string(), safe_input,
"-vf".to_string(), "fps=30,scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2,format=yuv420p".to_string(),
"-vf".to_string(), "fps=25,scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2,format=yuv420p".to_string(),
"-c:v".to_string(), "libx264".to_string(),
"-preset".to_string(), "veryfast".to_string(),
"-crf".to_string(), "23".to_string(),
"-c:a".to_string(), "aac".to_string(),
"-ar".to_string(), "44100".to_string(),
"-ac".to_string(), "2".to_string(),
"-r".to_string(), "30".to_string(),
"-r".to_string(), "25".to_string(),
"-pix_fmt".to_string(), "yuv420p".to_string(),
"-avoid_negative_ts".to_string(), "make_zero".to_string(),
"-y".to_string(),
@@ -592,6 +590,39 @@ pub async fn clip_video(
run_ffmpeg(app, args).await.map(|_| ())
}
/**
* 截取音频片段
*
* 从指定起始时间截取指定时长的音频,输出为 MP3 格式。
*/
pub async fn extract_audio_segment(
app: &AppHandle,
input_path: &str,
start: f64,
duration: f64,
output_path: &str,
) -> Result<(), String> {
let safe_input = validate_safe_path(input_path)?;
let safe_output = sanitize_output_path(output_path)?;
let start_str = format!("{:.3}", start);
let duration_str = format!("{:.3}", duration);
let args = vec![
"-ss".to_string(), start_str,
"-t".to_string(), duration_str,
"-i".to_string(), safe_input,
"-c:a".to_string(), "libmp3lame".to_string(),
"-b:a".to_string(), "192k".to_string(),
"-ar".to_string(), "44100".to_string(),
"-ac".to_string(), "2".to_string(),
"-vn".to_string(), // 无视频
"-y".to_string(),
safe_output,
];
run_ffmpeg(app, args).await.map(|_| ())
}
/**
* 转码音频为标准格式 (MP3 44.1kHz stereo 192kbps)
*/
+3
View File
@@ -118,12 +118,15 @@ pub fn run() {
commands::voice::list_project_audios,
commands::voice::delete_audio,
commands::voice::get_project_audios_dir,
commands::voice::extract_audio_segment,
commands::voice::upload_audio_file,
// 音色素材库
commands::voice::load_voice_materials,
commands::voice::save_voice_material,
commands::voice::delete_voice_material_cmd,
// 视频合成(Phase 2
commands::video_compose::compose_video,
commands::video_compose::extract_video_segment,
commands::video_compose::upload_video_file,
commands::video_compose::download_file,
// 音频处理
+60
View File
@@ -0,0 +1,60 @@
/**
* Caption 字幕 API 模块
* =====================
*
* 直接调用后端字幕相关 API(不走 Async Engine)。
*/
import { client } from '../client';
export interface CaptionUtterance {
text: string;
startTime: number; // 毫秒(client.ts 自动将后端 snake_case 转为 camelCase
endTime: number; // 毫秒
}
export interface AutoAlignResult {
code: number;
message: string;
duration: number; // 秒
utterances: CaptionUtterance[];
}
/**
* 自动字幕打轴(完整流程,同步阻塞)
*
* 为已有音频文本配上时间轴。后端内部轮询,最多等待 120 秒。
*
* @param audioUrl 音频/视频文件 URL(七牛云)
* @param audioText 要打轴的完整字幕文本
* @returns 打轴结果,含每句话的时间轴
*/
export async function autoAlignCaption(
audioUrl: string,
audioText: string
): Promise<AutoAlignResult> {
// client.post 已自动提取 ApiResponse.data 并做 snakeToCamel 转换
const result = await client.post<{
code: number;
message: string;
duration: number;
utterances: CaptionUtterance[];
}>('/caption/ata/align', {
audioUrl,
audioText,
captionType: 'speech',
staPuncMode: 3,
});
// result.code 是火山引擎打轴结果的状态码(0=成功, 2000=处理中)
if (result.code !== 0) {
throw new Error(result.message || '打轴失败');
}
return {
code: result.code,
message: result.message,
duration: result.duration,
utterances: result.utterances || [],
};
}
+19
View File
@@ -67,6 +67,9 @@ export interface ProjectMeta {
dubbingAudioUrl?: string; // 生成后的配音音频七牛云URL
dubbingAudioPath?: string; // 生成后的配音音频本地路径
dubbingVoiceId?: string; // 生成配音使用的音色ID
voiceSpeed?: number; // 配音语速
voiceVolume?: number; // 配音音量
voicePitch?: number; // 配音音调
subtitleAlignment?: AlignmentResult; // 全局字幕打轴结果(单视频模式)
burnedVideoPath?: string; // 压制字幕后的成品视频路径
coverConfig?: {
@@ -120,6 +123,13 @@ export interface ProjectSegment {
alignmentResult?: AlignmentResult; // 字幕打轴结果
burnedVideoPath?: string; // 压制字幕后的视频路径
burnedAt?: number; // 压制字幕的时间戳
audioStartTime?: number; // 在完整配音音频中的开始时间(毫秒)
audioEndTime?: number; // 在完整配音音频中的结束时间(毫秒)
actualDuration?: number; // 实际时长(秒,基于字幕打轴)
clipAudioPath?: string; // 截取后的音频片段本地路径
clipAudioUrl?: string; // 截取后的音频片段七牛云 URL
lipSyncTaskId?: string; // Vidu 对口型任务 ID
lipSyncState?: string; // Vidu 对口型任务状态
}
/**
@@ -146,6 +156,7 @@ export const localProjectApi = {
selectedElementId: meta.selectedElementId,
selectedVoiceId: meta.selectedVoiceId,
composedVideoUrl: meta.composedVideoUrl,
composedVideoPath: meta.composedVideoPath,
lipSyncTaskId: meta.lipSyncTaskId,
lipSyncState: meta.lipSyncState,
lipSyncedVideoPath: meta.lipSyncedVideoPath,
@@ -153,6 +164,9 @@ export const localProjectApi = {
dubbingAudioUrl: meta.dubbingAudioUrl,
dubbingAudioPath: meta.dubbingAudioPath,
dubbingVoiceId: meta.dubbingVoiceId,
voiceSpeed: meta.voiceSpeed,
voiceVolume: meta.voiceVolume,
voicePitch: meta.voicePitch,
avatarMaterialPath: meta.avatarMaterialPath,
avatarMaterialName: meta.avatarMaterialName,
avatarMaterialDuration: meta.avatarMaterialDuration,
@@ -199,6 +213,11 @@ export const localProjectApi = {
alignmentResult: s.alignmentResult,
burnedVideoPath: s.burnedVideoPath,
burnedAt: s.burnedAt,
audioStartTime: s.audioStartTime,
audioEndTime: s.audioEndTime,
actualDuration: s.actualDuration,
clipAudioPath: s.clipAudioPath,
clipAudioUrl: s.clipAudioUrl,
}));
const jsonContent = JSON.stringify(orderedSegments, null, 2);
const res = await safeInvoke<ApiResponse<boolean>>('save_project_segments_raw', {
+16
View File
@@ -84,6 +84,22 @@ export async function uploadImageFile(localPath: string): Promise<string> {
return res.data!.url;
}
/**
* 上传本地音频文件到后端(后端上传到七牛云)
*
* @param localPath 本地音频文件路径
* @returns 七牛云 URL
*/
export async function uploadAudioFile(localPath: string): Promise<string> {
const res = await invoke<ApiResponse<UploadVideoResult>>('upload_audio_file', {
localPath,
});
if (res.code !== 200) {
throw new Error(res.message);
}
return res.data!.url;
}
/**
* 从 URL 下载文件到本地
*
+23
View File
@@ -319,3 +319,26 @@ export async function standardizeAudio(args: StandardizeAudioRequest): Promise<s
}
return result.data;
}
export interface ExtractAudioSegmentRequest {
inputPath: string;
start: number;
duration: number;
outputPath: string;
}
/** 截取音频片段 */
export async function extractAudioSegment(args: ExtractAudioSegmentRequest): Promise<string> {
const result = await invoke<{ code: number; data?: string; message: string }>('extract_audio_segment', {
args: {
inputPath: args.inputPath,
start: args.start,
duration: args.duration,
outputPath: args.outputPath,
},
});
if (result.code !== 200 || !result.data) {
throw new Error(result.message || '截取音频片段失败');
}
return result.data;
}
+7
View File
@@ -59,4 +59,11 @@ export interface ScriptShot {
burnedAt?: number; // 压制字幕的时间戳
audioPath?: string; // 本地配音音频文件路径
audioUrl?: string; // 七牛云配音音频 URL
audioStartTime?: number; // 在完整配音音频中的开始时间(毫秒)
audioEndTime?: number; // 在完整配音音频中的结束时间(毫秒)
actualDuration?: number; // 实际时长(秒,基于字幕打轴)
clipAudioPath?: string; // 截取后的音频片段本地路径
clipAudioUrl?: string; // 截取后的音频片段七牛云 URL
lipSyncTaskId?: string; // Vidu 对口型任务 ID
lipSyncState?: string; // Vidu 对口型任务状态
}
@@ -6,18 +6,18 @@
* 布局:左侧操作区 + 右侧预览区(使用 step-layout 标准布局)
*/
import { useState, useEffect, useRef, useMemo } from 'react';
import { useState, useRef, useMemo } from 'react';
import { invoke } from '@tauri-apps/api/core';
import { homeDir } from '@tauri-apps/api/path';
import { useProjectStore, saveMetaToLocalFile } from '../../store';
import { getCurrentProjectId } from '../../api/modules/localStorage';
import { useTask } from '../../hooks/useTask';
import { useLocalVideo } from '../../hooks/useLocalVideo';
import { useAssJsRenderer } from '../../hooks/useAssJsRenderer';
import { generateAssFromAlignment, saveAssFile, htmlColorToAss, applyAssJsCompensation } from '../../utils/assGenerator';
import { useProgressStore } from '../../store/progressStore';
import { toast } from '../../store/uiStore';
import type { AlignmentResult } from '../../api/types';
import './SubtitleBurning.css';
// 解析 Docker 容器内路径 (/root/Documents/...) 转换为本地用户路径
@@ -46,27 +46,26 @@ const SUBTITLE_PRESETS = [
];
export default function SubtitleBurning() {
const segments = useProjectStore(state => state.segments);
const projectId = getCurrentProjectId();
// 成品视频
const lipSyncedVideoUrl = useProjectStore(state => state.lipSyncedVideoUrl);
const lipSyncedVideoPath = useProjectStore(state => state.lipSyncedVideoPath);
// 成品视频(临时:只用拼接视频,对口型替换验证通过后再启用)
const composedVideoUrl = useProjectStore(state => state.composedVideoUrl);
const composedVideoPath = useProjectStore(state => state.composedVideoPath);
// 打轴状态
const storeAlignment = useProjectStore(state => state.subtitleAlignment);
const [alignment, setAlignment] = useState<AlignmentResult | undefined>(storeAlignment);
const [isAligning, setIsAligning] = useState(false);
const actualVideoUrl = composedVideoUrl;
const actualVideoPath = composedVideoPath;
// 打轴结果直接从 Step 2 复用(VoiceDubbing 已保存到 meta
const alignment = useProjectStore(state => state.subtitleAlignment);
const [isBurning, setIsBurning] = useState(false);
const { submit } = useTask();
// 视频播放相关
const videoRef = useRef<HTMLVideoElement>(null);
const containerRef = useRef<HTMLDivElement>(null);
// 预览用七牛云 URL(加载快
const { videoUrl: loadedVideoUrl } = useLocalVideo(lipSyncedVideoUrl);
// 预览用:优先 URL,否则回退到本地路径(useLocalVideo 支持本地路径读取
const { videoUrl: loadedVideoUrl } = useLocalVideo(actualVideoUrl || actualVideoPath);
// 字幕样式(默认值基于 1080x1920 视频)
const [subtitleStyle, setSubtitleStyle] = useState<SubtitleStyle>({
@@ -125,13 +124,6 @@ export default function SubtitleBurning() {
enabled: subtitleEnabled,
});
// 从 store 恢复打轴结果(页面刷新后)
useEffect(() => {
if (storeAlignment) {
setAlignment(storeAlignment);
}
}, [storeAlignment]);
// 应用预设样式
const applyPreset = (presetId: string) => {
const preset = SUBTITLE_PRESETS.find(p => p.id === presetId);
@@ -146,84 +138,13 @@ export default function SubtitleBurning() {
});
};
// 字幕打轴:对成品视频统一打轴
const handleAlign = async () => {
if (!projectId) {
toast.error('项目ID不存在');
return;
}
if (!lipSyncedVideoUrl) {
toast.error('请先完成视频生成');
return;
}
// 拼接所有分镜文案
const audioText = segments.map(s => s.voiceover).filter(Boolean).join('\n');
if (!audioText) {
toast.error('没有配音文案');
return;
}
setIsAligning(true);
useProgressStore.getState().show('字幕打轴');
const taskId = await submit(
'subtitle',
{
videoPath: lipSyncedVideoUrl,
audioText,
mode: 'auto_align',
language: 'zh',
},
{
showProgress: true,
callbacks: {
onComplete: (result: unknown) => {
const r = result as {
utterances?: Array<{ text: string; startTime: number; endTime: number }>;
duration?: number;
} | undefined;
const newAlignment: AlignmentResult = {
status: 'completed',
utterances: r?.utterances?.map(u => ({
text: u.text,
start_time: u.startTime,
end_time: u.endTime,
})),
duration: r?.duration,
};
setAlignment(newAlignment);
useProjectStore.setState({ subtitleAlignment: newAlignment });
saveMetaToLocalFile({ subtitleAlignment: newAlignment });
setIsAligning(false);
},
onError: (error: string) => {
const newAlignment: AlignmentResult = {
status: 'failed',
errorMessage: error,
};
setAlignment(newAlignment);
useProjectStore.setState({ subtitleAlignment: newAlignment });
saveMetaToLocalFile({ subtitleAlignment: newAlignment });
setIsAligning(false);
},
},
}
);
if (!taskId) {
setIsAligning(false);
}
};
// 压制字幕:单次压制全局字幕到成品视频
const handleBurn = async () => {
if (!projectId) {
toast.error('项目ID不存在');
return;
}
if (!lipSyncedVideoPath) {
if (!actualVideoPath) {
toast.error('成品视频不存在');
return;
}
@@ -265,7 +186,7 @@ export default function SubtitleBurning() {
const outputPath = outputRes.data;
// 4. 解析视频路径
const resolvedVideoPath = await resolveHostPath(lipSyncedVideoPath);
const resolvedVideoPath = await resolveHostPath(actualVideoPath);
// 5. 调用 Rust 压制字幕
const burnResult = await invoke<{ code: number; data?: string; message: string }>('burn_subtitle', {
@@ -293,48 +214,10 @@ export default function SubtitleBurning() {
}
};
// 打轴状态文本
const alignmentStatusText = (() => {
if (!alignment) return '未打轴';
switch (alignment.status) {
case 'pending': return '待打轴';
case 'aligning': return '打轴中...';
case 'completed': return '✓ 已打轴';
case 'failed': return '✗ 打轴失败';
default: return '未知';
}
})();
return (
<div className="step-layout subtitle-burning">
{/* 左侧操作区 */}
<div className="step-panel-left">
{/* 打轴区 */}
<div className="panel-section">
<div className="panel-header">
<label className="panel-label"></label>
<div className="panel-actions">
<button
className="btn btn-primary btn-sm"
onClick={handleAlign}
disabled={isAligning || isBurning || !lipSyncedVideoUrl}
>
{isAligning ? '打轴中...' : '字幕打轴'}
</button>
</div>
</div>
<div className="shot-list-hint">
<span className={alignment?.status === 'failed' ? 'text-error' : ''}>
{alignmentStatusText}
</span>
{alignment?.status === 'failed' && alignment.errorMessage && (
<span style={{ color: 'var(--error)', marginLeft: 'var(--spacing-sm)', fontSize: 'var(--font-xs)' }}>
{alignment.errorMessage}
</span>
)}
</div>
</div>
{/* 字幕样式设置 */}
<div className="style-section">
<label className="panel-label"></label>
@@ -386,7 +269,7 @@ export default function SubtitleBurning() {
<button
className="btn btn-primary burn-btn"
onClick={handleBurn}
disabled={isAligning || isBurning || alignment?.status !== 'completed'}
disabled={isBurning || !alignment?.utterances?.length}
>
{isBurning ? '压制中...' : '压制字幕'}
</button>
@@ -409,14 +292,15 @@ export default function SubtitleBurning() {
) : (
<>
<video
key={loadedVideoUrl}
ref={videoRef}
src={loadedVideoUrl}
className="preview-video"
controls
autoPlay={false}
autoPlay
/>
{/* 打轴时显示提示遮罩 */}
{alignment?.status !== 'completed' && (
{/* 打轴数据时显示提示遮罩 */}
{!alignment?.utterances?.length && (
<div
style={{
position: 'absolute',
@@ -440,8 +324,8 @@ export default function SubtitleBurning() {
<circle cx="12" cy="12" r="10" />
<path d="M12 6v6l4 2" />
</svg>
<p style={{ fontSize: 'var(--font-md)', fontWeight: 600, lineHeight: 1.4, color: '#fff' }}></p>
<p style={{ fontSize: 'var(--font-sm)', color: 'rgba(255,255,255,0.75)', marginTop: 'var(--spacing-xs)' }}></p>
<p style={{ fontSize: 'var(--font-md)', fontWeight: 600, lineHeight: 1.4, color: '#fff' }}></p>
<p style={{ fontSize: 'var(--font-sm)', color: 'rgba(255,255,255,0.75)', marginTop: 'var(--spacing-xs)' }}>2</p>
</div>
)}
{/* 字幕由 ASS.js 自动渲染在 video 上方的 canvas 层 */}
@@ -6,31 +6,64 @@ import { toast } from '../../store/uiStore';
import { useProjectStore, saveMetaToLocalFile, type ProjectState } from '../../store';
import { useProgressStore } from '../../store/progressStore';
import { useVideoGeneration } from '../../hooks/useVideoGeneration';
import { useLocalVideo } from '../../hooks/useLocalVideo';
import { matchMaterial } from '../../api/modules/materials';
import {
composeVideo,
uploadVideoFile,
uploadImageFile,
downloadFile,
type ComposeSegment,
} from '../../api/modules/videoCompose';
import { submitLipSync, queryLipSyncTask } from '../../api/modules/vidu';
import { submitLipSync } from '../../api/modules/vidu';
import { invoke } from '@tauri-apps/api/core';
import { getCurrentProjectId, localProjectApi } from '../../api/modules/localStorage';
import './VideoGeneration.css';
/**
* 统一填补法:计算每段在全局时间轴上的播放区间
* - 分镜(segment):严格 [audioStartTime, audioEndTime]
* - 空镜(empty_shot):
* - assignedStart = 上一段.audioEndTime(首段为 0
* - assignedEnd = 下一段是空镜 ? 自己.audioEndTime : 下一段.audioStartTime
*/
function computeAssignedIntervals(
segs: Array<{ id: number | string; type: string; audioStartTime?: number; audioEndTime?: number; actualDuration?: number; duration?: string | number }>
) {
const result: Record<string | number, { assignedStart: number; assignedEnd: number }> = {};
for (let i = 0; i < segs.length; i++) {
const seg = segs[i];
const prev = i > 0 ? segs[i - 1] : null;
const next = i < segs.length - 1 ? segs[i + 1] : null;
if (seg.type === 'segment') {
result[seg.id] = {
assignedStart: seg.audioStartTime ?? 0,
assignedEnd: seg.audioEndTime ?? ((seg.actualDuration ?? 5) * 1000),
};
} else {
const assignedStart = prev ? (prev.audioEndTime ?? 0) : 0;
let assignedEnd: number;
if (next && next.type === 'empty_shot') {
assignedEnd = seg.audioEndTime ?? ((seg.actualDuration ?? 5) * 1000);
} else if (next) {
assignedEnd = next.audioStartTime ?? (seg.audioEndTime ?? assignedStart);
} else {
assignedEnd = seg.audioEndTime ?? assignedStart;
}
result[seg.id] = { assignedStart, assignedEnd };
}
}
return result;
}
export default function VideoGeneration() {
const segments = useProjectStore(state => state.segments);
const selectedElementId = useProjectStore(state => state.selectedElementId);
const selectedVoiceId = useProjectStore(state => state.selectedVoiceId);
const dubbingAudioUrl = useProjectStore(state => state.dubbingAudioUrl);
const lipSyncTaskId = useProjectStore(state => state.lipSyncTaskId);
const lipSyncState = useProjectStore(state => state.lipSyncState);
const lipSyncedVideoUrl = useProjectStore(state => state.lipSyncedVideoUrl);
const composedVideoPath = useProjectStore(state => state.composedVideoPath);
const { videoUrl: composedVideoBlobUrl } = useLocalVideo(composedVideoPath);
const updateSegment = useProjectStore(state => state.updateSegment);
const projectId = getCurrentProjectId();
@@ -69,9 +102,6 @@ export default function VideoGeneration() {
dubbingAudioUrl: meta.dubbingAudioUrl ?? undefined,
composedVideoUrl: meta.composedVideoUrl ?? undefined,
composedVideoPath: meta.composedVideoPath ?? undefined,
lipSyncTaskId: meta.lipSyncTaskId ?? undefined,
lipSyncState: meta.lipSyncState ?? undefined,
lipSyncedVideoUrl: meta.lipSyncedVideoUrl ?? undefined,
lipSyncedVideoPath: meta.lipSyncedVideoPath ?? undefined,
};
@@ -125,76 +155,15 @@ export default function VideoGeneration() {
// Vidu 对口型任务轮询
useEffect(() => {
if (!lipSyncTaskId || !projectId) return;
if (lipSyncState === 'succeeded' || lipSyncState === 'failed') return;
let canceled = false;
const interval = setInterval(async () => {
if (canceled) return;
try {
const result = await queryLipSyncTask(lipSyncTaskId);
console.log('[VideoGeneration] 轮询状态:', result.state);
if (result.state === 'succeeded') {
clearInterval(interval);
const videoUrl = result.videoUrl ?? result.creations?.[0]?.url;
if (!videoUrl) {
console.error('[VideoGeneration] 对口型任务成功但无视频 URL');
useProjectStore.setState({ lipSyncState: 'failed' });
await saveMetaToLocalFile({ lipSyncState: 'failed' });
toast.error('对口型视频 URL 为空');
return;
}
// 1. 获取本地保存路径(通过 IPC,自动处理路径展开)
const pathRes = await invoke<{ code: number; data?: string; message: string }>('get_video_save_path', {
projectId,
filename: `lip_synced_${Date.now()}.mp4`,
});
if (pathRes.code !== 200 || !pathRes.data) {
throw new Error(pathRes.message || '获取保存路径失败');
}
const localPath = pathRes.data;
// 2. 下载 Vidu 视频到本地
await downloadFile(videoUrl, localPath);
console.log('[VideoGeneration] 对口型视频下载完成:', localPath);
// 3. 上传七牛云
const qiniuUrl = await uploadVideoFile(localPath);
console.log('[VideoGeneration] 对口型视频上传完成:', qiniuUrl);
// 4. 更新 store 和 meta.json
useProjectStore.setState({
lipSyncState: 'succeeded',
lipSyncedVideoUrl: qiniuUrl,
lipSyncedVideoPath: localPath,
});
await saveMetaToLocalFile({
lipSyncedVideoPath: localPath,
lipSyncedVideoUrl: qiniuUrl,
lipSyncState: 'succeeded',
});
toast.success('对口型视频生成完成');
} else if (result.state === 'failed') {
clearInterval(interval);
useProjectStore.setState({ lipSyncState: 'failed' });
await saveMetaToLocalFile({ lipSyncState: 'failed' });
toast.error('对口型生成失败');
}
// pending/processing: 继续轮询
} catch (e) {
console.error('[VideoGeneration] 轮询失败:', e);
}
}, 3000); // 3 秒轮询
return () => {
canceled = true;
clearInterval(interval);
};
}, [lipSyncTaskId, lipSyncState, projectId]);
// TODO: Vidu 对口型任务轮询(待回调通知上线后处理)
// 当前逻辑:每个 segment 单独提交对口型任务,taskId 保存在 segment.lipSyncTaskId 中
// 上线后:收到 Vidu 回调通知时,根据 taskId 找到对应 segment
// 下载对口型后的视频,再根据 audioStartTime/audioEndTime 替换到合成视频的对应位置
// useEffect(() => {
// const segmentShots = shots.filter(s => s.type === 'segment' && s.lipSyncTaskId && s.lipSyncState !== 'succeeded' && s.lipSyncState !== 'failed');
// if (segmentShots.length === 0 || !projectId) return;
// // 轮询所有未完成的任务...
// }, [shots, projectId]);
// 自动匹配空镜素材(调用后端接口)
// 同步 activeScene 与 shots 数据
@@ -252,15 +221,20 @@ export default function VideoGeneration() {
async function doMatch() {
const newMap: Record<string, { url: string; duration: number } | null> = {};
const usedUrls: string[] = [];
const assignedMap = computeAssignedIntervals(shots);
// 串行匹配避免并发导致重复(同分类素材尽量分散)
for (const shot of emptyShots) {
if (canceled) break;
const duration =
typeof shot.duration === 'number'
? shot.duration
: parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5;
const assigned = assignedMap[shot.id];
const requiredDuration = assigned
? (assigned.assignedEnd - assigned.assignedStart) / 1000
: (shot.actualDuration ?? (
typeof shot.duration === 'number'
? shot.duration
: parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5
));
try {
const result = await matchMaterial(shot.scene || '', duration, usedUrls);
const result = await matchMaterial(shot.scene || '', requiredDuration, usedUrls);
newMap[String(shot.id)] = result;
if (result) usedUrls.push(result.url);
} catch (err) {
@@ -409,10 +383,15 @@ export default function VideoGeneration() {
const shot = shots.find((s) => String(s.id) === shotId);
if (!shot) return;
const duration =
typeof shot.duration === 'number'
? shot.duration
: parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5;
const assignedMap = computeAssignedIntervals(shots);
const assigned = assignedMap[shot.id];
const requiredDuration = assigned
? (assigned.assignedEnd - assigned.assignedStart) / 1000
: (shot.actualDuration ?? (
typeof shot.duration === 'number'
? shot.duration
: parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5
));
const currentUrl = materialMatchMap[shotId]?.url;
@@ -423,7 +402,7 @@ export default function VideoGeneration() {
if (currentUrl) usedUrls.push(currentUrl);
try {
const result = await matchMaterial(shot.scene || '', duration, usedUrls, true);
const result = await matchMaterial(shot.scene || '', requiredDuration, usedUrls, true);
setMaterialMatchMap((prev) => ({
...prev,
[shotId]: result || prev[shotId],
@@ -516,17 +495,86 @@ export default function VideoGeneration() {
progress.show('视频生成');
try {
// Step 1: 合成视频(裁剪 + 拼接)
progress.update('正在处理视频素材...');
// 人物视频总时长,用于计算随机截取起始点
const avatarDuration = selectedAvatarMaterial.duration;
const segments: ComposeSegment[] = shots.map((shot) => {
const duration =
// ========== Step 1: 对每个 segment 提交对口型任务 ==========
const segmentShots = shots.filter((s) => s.type === 'segment');
for (let i = 0; i < segmentShots.length; i++) {
const shot = segmentShots[i];
progress.update(`正在提交对口型任务 (${i + 1}/${segmentShots.length})...`);
const duration = shot.actualDuration ?? (
typeof shot.duration === 'number'
? shot.duration
: parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5;
: parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5
);
// 1a. 获取切割后视频保存路径
const pathRes = await invoke<{ code: number; data?: string; message: string }>(
'get_video_save_path',
{ projectId, filename: `segment_clip_${shot.id}_${Date.now()}.mp4` }
);
if (pathRes.code !== 200 || !pathRes.data) {
throw new Error(pathRes.message || '获取保存路径失败');
}
const clipPath = pathRes.data;
// 1b. 从人物素材中随机截取对应时长的片段
const maxStart = Math.max(0, avatarDuration - duration);
const startTime = maxStart > 0 ? Math.random() * maxStart : 0;
const clipRes = await invoke<{ code: number; message: string }>('extract_video_segment', {
args: {
inputPath: selectedAvatarMaterial.path,
start: startTime,
duration,
outputPath: clipPath,
},
});
if (clipRes.code !== 200) {
throw new Error(`截取视频片段失败: ${clipRes.message}`);
}
// 1c. 上传切割后的视频到七牛云
const clipUrl = await uploadVideoFile(clipPath);
console.log(`[VideoGeneration] Segment ${shot.id} 视频上传完成:`, clipUrl);
// 1d. 提交对口型任务(仅当该分镜有 clipAudioUrl 时)
if (!shot.clipAudioUrl) {
console.warn(`[VideoGeneration] Segment ${shot.id} 无 clipAudioUrl,跳过对口型`);
continue;
}
const lipSyncRes = await submitLipSync({
videoUrl: clipUrl,
audioUrl: shot.clipAudioUrl,
});
console.log(`[VideoGeneration] Segment ${shot.id} 对口型任务提交成功:`, lipSyncRes.taskId);
// 1e. 保存 taskId 到 segment
updateSegment(shot.id, { lipSyncTaskId: lipSyncRes.taskId, lipSyncState: 'processing' });
}
// 保存 segments.json(包含 lipSyncTaskId
const currentSegments = useProjectStore.getState().segments;
await localProjectApi.saveSegments(projectId, currentSegments);
// ========== Step 2: 拼接所有片段 ==========
progress.update('正在拼接视频...');
const assignedMap = computeAssignedIntervals(shots);
const composeSegments: ComposeSegment[] = shots.map((shot) => {
const assigned = assignedMap[shot.id];
const duration = assigned
? (assigned.assignedEnd - assigned.assignedStart) / 1000
: (shot.actualDuration ?? (
typeof shot.duration === 'number'
? shot.duration
: parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5
));
if (shot.type === 'empty_shot') {
const matched = materialMatchMap[String(shot.id)];
@@ -538,102 +586,81 @@ export default function VideoGeneration() {
startTime: 0,
};
} else {
// 人物出镜:从人物视频中随机截取,确保不超出素材时长
const maxStart = Math.max(0, avatarDuration - duration);
const segDuration = shot.actualDuration ?? (
typeof shot.duration === 'number'
? shot.duration
: parseFloat(String(shot.duration).replace(/[^0-9.]/g, '')) || 5
);
const maxStart = Math.max(0, avatarDuration - segDuration);
const startTime = maxStart > 0 ? Math.random() * maxStart : 0;
return {
id: String(shot.id),
type: 'segment' as const,
duration,
duration: segDuration,
source: selectedAvatarMaterial.path,
startTime,
};
}
});
const composeResult = await composeVideo(projectId, segments);
console.log('[VideoGeneration] 视频合成完成:', composeResult.outputPath);
const composeResult = await composeVideo(projectId, composeSegments);
console.log('[VideoGeneration] 视频拼接完成:', composeResult.outputPath);
// Step 2: 上传拼接视频到七牛云
progress.update('正在智能拼接视频...');
const composedVideoUrl = await uploadVideoFile(composeResult.outputPath);
console.log('[VideoGeneration] 视频上传完成:', composedVideoUrl);
// ========== Step 3: 将完整音频合成到拼接视频中 ==========
let finalVideoPath = composeResult.outputPath;
// 保存拼接视频的本地路径和七牛云 URL 到 meta.json
// 注意:拼接视频是中间产物,不放入 products 目录(products 只存 Step 6 最终成品)
await saveMetaToLocalFile({
composedVideoUrl,
composedVideoPath: composeResult.outputPath,
});
if (dubbingAudioUrl) {
progress.update('正在合成音频...');
// Step 3: 截取人物素材首帧并上传,作为 Vidu 对口型的 ref_photo_url
let refPhotoUrl: string | undefined;
console.log('[VideoGeneration] selectedAvatarMaterial:', selectedAvatarMaterial);
if (selectedAvatarMaterial) {
try {
progress.update('正在提取人物参考图...');
// 获取首帧保存路径
const framePathRes = await invoke<{ code: number; data?: string; message: string }>(
'get_image_save_path',
{
projectId,
filename: `ref_frame_${Date.now()}.jpg`,
}
);
console.log('[VideoGeneration] get_image_save_path result:', framePathRes);
if (framePathRes.code === 200 && framePathRes.data) {
const framePath = framePathRes.data;
console.log('[VideoGeneration] 首帧保存路径:', framePath);
// 截取首帧
const extractRes = await invoke<{ code: number; data?: string; message: string }>(
'extract_video_first_frame',
{
request: {
video_path: selectedAvatarMaterial.path,
output_path: framePath,
},
}
);
console.log('[VideoGeneration] extract_video_first_frame result:', extractRes);
if (extractRes.code === 200) {
console.log('[VideoGeneration] 首帧提取成功,开始上传...');
// 上传首帧到七牛云
refPhotoUrl = await uploadImageFile(framePath);
console.log('[VideoGeneration] 首帧上传完成:', refPhotoUrl);
} else {
console.warn('[VideoGeneration] 首帧提取失败:', extractRes.message);
}
} else {
console.warn('[VideoGeneration] 获取首帧保存路径失败:', framePathRes?.message);
}
} catch (err) {
console.warn('[VideoGeneration] 首帧提取/上传失败,继续提交对口型任务:', err);
// 失败则忽略 refPhotoUrl,继续提交
// 下载配音音频到本地
const audioPathRes = await invoke<{ code: number; data?: string; message: string }>(
'get_video_save_path',
{ projectId, filename: `dubbing_${Date.now()}.mp3` }
);
if (audioPathRes.code !== 200 || !audioPathRes.data) {
throw new Error(audioPathRes.message || '获取音频保存路径失败');
}
} else {
console.log('[VideoGeneration] 未选择人物素材,跳过首帧提取');
const audioLocalPath = audioPathRes.data;
await downloadFile(dubbingAudioUrl, audioLocalPath);
// 获取最终输出路径
const outputPathRes = await invoke<{ code: number; data?: string; message: string }>(
'get_video_save_path',
{ projectId, filename: `composed_with_audio_${Date.now()}.mp4` }
);
if (outputPathRes.code !== 200 || !outputPathRes.data) {
throw new Error(outputPathRes.message || '获取输出路径失败');
}
const composedWithAudioPath = outputPathRes.data;
// 替换音频轨道
const replaceRes = await invoke<{ code: number; message: string }>('replace_audio_track', {
request: {
video_path: composeResult.outputPath,
audio_path: audioLocalPath,
output_path: composedWithAudioPath,
},
});
if (replaceRes.code !== 200) {
throw new Error(`音频替换失败: ${replaceRes.message}`);
}
finalVideoPath = composedWithAudioPath;
console.log('[VideoGeneration] 音频合成完成:', finalVideoPath);
}
// Step 4: 提交 Vidu 对口型任务
if (!dubbingAudioUrl) {
throw new Error('请先完成语音配音步骤');
}
progress.update('正在合成视频...');
const lipSyncRes = await submitLipSync({
videoUrl: composedVideoUrl,
audioUrl: dubbingAudioUrl,
refPhotoUrl,
// 保存最终结果到 meta.json
await saveMetaToLocalFile({
composedVideoPath: finalVideoPath,
currentStep: 3,
});
console.log('[VideoGeneration] 对口型任务提交成功:', lipSyncRes.taskId);
// 保存对口型任务 ID 和状态到 meta.json
await saveMetaToLocalFile({ lipSyncTaskId: lipSyncRes.taskId, lipSyncState: 'processing' });
// TODO: 对口型结果替换(待 Vidu 回调通知上线后处理)
// 逻辑:收到回调通知后,前端获取对口型后的视频 URL,
// 根据 segment 的 audioStartTime / audioEndTime 确定时间位置和时间范围,
// 使用 FFmpeg 将合成视频中对应位置的视频替换为对口型后的视频。
// 保存当前步骤进度(finalVideoPath 只在 Step 6 最终合成时写入)
await saveMetaToLocalFile({ currentStep: 3 });
progress.success('视频生成任务已提交');
progress.success('视频生成完成');
} catch (error) {
const msg = error instanceof Error ? error.message : '视频生成失败';
console.error('[VideoGeneration] 生成失败:', error);
@@ -1046,13 +1073,13 @@ export default function VideoGeneration() {
strokeLinecap="round"
strokeLinejoin="round"
>
{lipSyncedVideoUrl ? (
{composedVideoPath ? (
<path d="M23 4v6h-6M1 20v-6h6M3.51 9a9 9 0 0114.85-3.36L23 10M1 14l4.64 4.36A9 9 0 0020.49 15" />
) : (
<polygon points="5 3 19 12 5 21 5 3" />
)}
</svg>
{lipSyncedVideoUrl ? '重新生成' : '生成视频'}
{composedVideoPath ? '重新生成' : '生成视频'}
</>
)}
</button>
@@ -1064,10 +1091,10 @@ export default function VideoGeneration() {
<div className="video-preview-header"></div>
<div className="video-preview-wrapper">
<div className="video-preview-container">
{lipSyncedVideoUrl ? (
{composedVideoBlobUrl ? (
<video
key={lipSyncedVideoUrl}
src={lipSyncedVideoUrl}
key={composedVideoBlobUrl}
src={composedVideoBlobUrl}
className="preview-video"
controls
autoPlay
@@ -10,29 +10,34 @@ import { useProjectStore } from '../../store';
import { useVoiceStore } from '../../store/voiceStore';
import { getCurrentProjectId } from '../../api/modules/localStorage';
import { saveMetaToLocalFile } from '../../store/projectStore';
import { synthesizeTTS, saveAudio, uploadAudio } from '../../api/modules/voice';
import { synthesizeTTS, saveAudio, uploadAudio, extractAudioSegment } from '../../api/modules/voice';
import { toast } from '../../store/uiStore';
import type { AlignmentResult } from '../../api/types';
import { useProgressStore } from '../../store/progressStore';
import { autoAlignCaption } from '../../api/modules/caption';
import { matchSegmentsToUtterances } from '../../utils/audioAlign';
import { uploadAudioFile } from '../../api/modules/videoCompose';
import { localProjectApi } from '../../api/modules/localStorage';
import './VoiceDubbing.css';
export default function VoiceDubbing() {
const projectId = getCurrentProjectId();
const segments = useProjectStore(state => state.segments);
const updateSegment = useProjectStore(state => state.updateSegment);
const selectedVoiceId = useProjectStore(state => state.selectedVoiceId);
const speed = useProjectStore(state => state.voiceSpeed);
const volume = useProjectStore(state => state.voiceVolume);
const pitch = useProjectStore(state => state.voicePitch);
const setSelectedVoiceId = useProjectStore(state => state.setSelectedVoiceId);
const setSpeed = useProjectStore(state => state.setVoiceSpeed);
const setVolume = useProjectStore(state => state.setVoiceVolume);
const setPitch = useProjectStore(state => state.setVoicePitch);
const {
presetVoices,
voiceMaterials,
selectedVoiceId,
speed,
volume,
pitch,
loadPresetVoices,
loadVoiceMaterials,
setSelectedVoiceId,
setSpeed,
setVolume,
setPitch,
loadProjectAudios,
setAudioMapping,
} = useVoiceStore();
@@ -95,23 +100,141 @@ export default function VoiceDubbing() {
setPlayingVoiceId(voiceId);
}, [playingVoiceId]);
const handleAlignAndClip = useCallback(async (
dubbingAudioUrl: string,
dubbingAudioPath: string
) => {
if (!projectId) return;
const progress = useProgressStore.getState();
try {
// 1. 拼接完整文本用于打轴(需与 TTS 文本顺序完全一致,包括 empty_shot
const fullText = segments
.filter(s => s.voiceover?.trim())
.map(s => s.voiceover!.trim())
.join('\n');
if (!fullText) return;
// 2. 字幕打轴
progress.update('正在处理字幕...');
const alignResult = await autoAlignCaption(dubbingAudioUrl, fullText);
if (!alignResult.utterances?.length) {
console.warn('[VoiceDubbing] 打轴返回空结果');
progress.error('字幕处理异常');
return;
}
// 3. 文本匹配
const matchSegments = segments
.filter(s => s.voiceover?.trim())
.map(s => ({ id: s.id, voiceover: s.voiceover || '' }));
const matched = matchSegmentsToUtterances(matchSegments, alignResult.utterances);
if (!matched.length) {
console.warn('[VoiceDubbing] 文本匹配无结果');
progress.error('音频对齐失败');
return;
}
// 4. 截取音频片段并上传
progress.update('正在整理音频...');
const audiosDir = dubbingAudioPath.replace(/\\/g, '/').split('/').slice(0, -1).join('/');
for (const m of matched) {
const seg = segments.find(s => s.id === m.segmentId);
if (!seg) continue;
// 所有分镜都保存实际时间信息(供后续步骤精确匹配素材)
updateSegment(m.segmentId, {
audioStartTime: m.startTime,
audioEndTime: m.endTime,
actualDuration: m.actualDuration,
});
// 只截取【分镜】类型的音频片段,空镜跳过
if (seg.type !== 'segment') continue;
const outputPath = `${audiosDir}/segment_${m.segmentId}.mp3`;
try {
await extractAudioSegment({
inputPath: dubbingAudioPath,
start: m.startTime / 1000,
duration: m.actualDuration,
outputPath,
});
const clipUrl = await uploadAudioFile(outputPath);
updateSegment(m.segmentId, {
clipAudioPath: outputPath,
clipAudioUrl: clipUrl,
});
} catch (e) {
console.error(`[VoiceDubbing] Segment ${m.segmentId} 截取/上传失败:`, e);
}
}
// 5. 保存 segments.json
const currentSegments = useProjectStore.getState().segments;
await localProjectApi.saveSegments(projectId, currentSegments);
// 6. 保存字幕打轴结果到 meta,供 Step 4 直接复用
const subtitleAlignment: AlignmentResult = {
status: 'completed',
utterances: alignResult.utterances.map(u => ({
text: u.text,
start_time: u.startTime,
end_time: u.endTime,
})),
duration: alignResult.duration,
};
useProjectStore.setState({ subtitleAlignment });
await saveMetaToLocalFile({ subtitleAlignment });
progress.success('音频处理完成');
} catch (err) {
console.error('[VoiceDubbing] 打轴截取流程失败:', err);
progress.error(err instanceof Error ? err.message : '音频处理失败');
}
}, [projectId, segments, updateSegment]);
const handleGenerate = useCallback(async () => {
if (!projectId) { toast.warning('请先创建项目'); return; }
const realText = segments.map(s => s.voiceover?.trim()).filter(Boolean).join('\n');
// 拼接 TTS 文本,根据镜头切换类型插入停顿标记
const realText = segments
.filter(s => s.voiceover?.trim())
.map((s, i, arr) => {
const text = s.voiceover!.trim();
if (i === arr.length - 1) return text;
const next = arr[i + 1];
// segment ↔ empty_shot 切换:长停顿,让观众看清画面
if (s.type !== next?.type) {
return text + '<#0.5#>';
}
// 同类型之间(segment→segment / empty_shot→empty_shot):短停顿,保持节奏
return text + '<#0.3#>';
})
.join('\n');
if (!realText) { toast.warning('没有需要合成的旁白文本'); return; }
// Kling TTS 限制单次 ≤1000 字,超长自动截断
const truncatedText = realText.length > 1000 ? realText.slice(0, 1000) : realText;
// Vidu TTS 限制单次 ≤10000,超长自动截断
const truncatedText = realText.length > 10000 ? realText.slice(0, 10000) : realText;
// 直接从 store 获取最新音色,避免闭包捕获旧值
const currentVoiceId = useProjectStore.getState().selectedVoiceId;
const currentSpeed = useProjectStore.getState().voiceSpeed;
const currentVolume = useProjectStore.getState().voiceVolume;
const currentPitch = useProjectStore.getState().voicePitch;
const progress = useProgressStore.getState();
setIsGenerating(true);
progress.show('生成配音');
try {
progress.update('正在合成语音...');
const result = await synthesizeTTS({ text: truncatedText, voiceId: selectedVoiceId, speed, volume, pitch });
progress.update('正在生成配音...');
const result = await synthesizeTTS({ text: truncatedText, voiceId: currentVoiceId, speed: currentSpeed, volume: currentVolume, pitch: currentPitch });
if (!result.audioUrl) throw new Error('未返回音频 URL');
progress.update('正在保存音频...');
progress.update('正在处理音频...');
// 下载音频 blob
const response = await fetch(result.audioUrl);
if (!response.ok) throw new Error('下载音频失败');
@@ -135,7 +258,7 @@ export default function VoiceDubbing() {
const audioId = `voice_${Date.now()}`;
const meta = await saveAudio({
projectId, audioId, audioData: base64,
name: `配音-${segments.length}`, voiceId: selectedVoiceId, duration: 0,
name: `配音-${segments.length}`, voiceId: currentVoiceId || 'tianxin_xiaoling', duration: 0,
skipList: true,
});
@@ -143,12 +266,12 @@ export default function VoiceDubbing() {
useProjectStore.setState({
dubbingAudioUrl: qiniuUrl,
dubbingAudioPath: meta.filePath,
dubbingVoiceId: selectedVoiceId,
dubbingVoiceId: currentVoiceId,
});
await saveMetaToLocalFile({
dubbingAudioUrl: qiniuUrl,
dubbingAudioPath: meta.filePath,
dubbingVoiceId: selectedVoiceId,
dubbingVoiceId: currentVoiceId,
});
// 配音音频是项目级别的,不写入各分镜
@@ -160,13 +283,17 @@ export default function VoiceDubbing() {
}
setGeneratedAudioUrl(qiniuUrl);
progress.success('配音生成完成');
// 生成完成后自动执行打轴+截取
await handleAlignAndClip(qiniuUrl, meta.filePath);
progress.success('配音已就绪');
} catch (err) {
progress.error(err instanceof Error ? err.message : '生成失败');
} finally {
setIsGenerating(false);
}
}, [projectId, segments, selectedVoiceId, speed, volume, pitch, setAudioMapping, updateSegment]);
}, [projectId, segments, setAudioMapping, updateSegment, handleAlignAndClip]);
const handleToggleGeneratedAudio = useCallback(() => {
if (!generatedAudioUrl) return;
@@ -215,7 +342,7 @@ export default function VoiceDubbing() {
{activeVoiceTab === 'preset' && (
<div className="voice-list">
{presetVoices.map(v => (
<div key={v.voiceId} className={`voice-row ${v.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => setSelectedVoiceId(v.voiceId)}>
<div key={v.voiceId} className={`voice-row ${v.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => { setSelectedVoiceId(v.voiceId); saveMetaToLocalFile({ selectedVoiceId: v.voiceId }); }}>
<div className="voice-row-main">
<div className="voice-row-info">
<div className="voice-row-name">
@@ -238,7 +365,7 @@ export default function VoiceDubbing() {
<div className="voice-empty"><br /><small></small></div>
) : (
voiceMaterials.filter(m => m.status === 'ready').map(m => (
<div key={m.voiceId} className={`voice-row ${m.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => setSelectedVoiceId(m.voiceId)}>
<div key={m.voiceId} className={`voice-row ${m.voiceId === selectedVoiceId ? 'selected' : ''}`} onClick={() => { setSelectedVoiceId(m.voiceId); saveMetaToLocalFile({ selectedVoiceId: m.voiceId }); }}>
<div className="voice-row-main">
<div className="voice-row-info">
<div className="voice-row-name">
@@ -274,7 +401,7 @@ export default function VoiceDubbing() {
max={20}
step={1}
value={Math.round(speed * 10)}
onChange={e => setSpeed(parseInt(e.target.value) / 10)}
onChange={e => { const v = parseInt(e.target.value) / 10; setSpeed(v); saveMetaToLocalFile({ voiceSpeed: v }); }}
style={{ '--slider-percent': `${((Math.round(speed * 10) - 5) / 15) * 100}%` } as React.CSSProperties}
/>
<span>2.0x</span>
@@ -296,7 +423,7 @@ export default function VoiceDubbing() {
max={10}
step={1}
value={volume}
onChange={e => setVolume(parseInt(e.target.value))}
onChange={e => { const v = parseInt(e.target.value); setVolume(v); saveMetaToLocalFile({ voiceVolume: v }); }}
style={{ '--slider-percent': `${(volume / 10) * 100}%` } as React.CSSProperties}
/>
<span>10</span>
@@ -318,7 +445,7 @@ export default function VoiceDubbing() {
max={12}
step={1}
value={pitch}
onChange={e => setPitch(parseInt(e.target.value))}
onChange={e => { const v = parseInt(e.target.value); setPitch(v); saveMetaToLocalFile({ voicePitch: v }); }}
style={{ '--slider-percent': `${((pitch + 12) / 24) * 100}%` } as React.CSSProperties}
/>
<span>12</span>
+3 -3
View File
@@ -35,7 +35,7 @@ function VideoCreationContent() {
// 直接订阅 segments,避免 project getter 无法追踪的问题
const segments = useProjectStore(state => state.segments);
const isLoading = useProjectStore(state => state._isLoading);
const lipSyncedVideoUrl = useProjectStore(state => state.lipSyncedVideoUrl);
const composedVideoPath = useProjectStore(state => state.composedVideoPath);
const dubbingAudioUrl = useProjectStore(state => state.dubbingAudioUrl);
// 页面刷新后从 meta.json 恢复项目状态(projectStore persist 不保存业务数据)
@@ -80,8 +80,8 @@ function VideoCreationContent() {
const isStep1Complete = segments.length > 0;
// Step 2 音频合成:必须有配音音频
const isStep2Complete = isStep1Complete && !!dubbingAudioUrl;
// Step 3 视频生成:对口型成品视频已生成(七牛云 URL
const isStep3Complete = isStep2Complete && !!lipSyncedVideoUrl;
// Step 3 视频生成:拼接视频已生成(对口型结果替换上线后再改回 lipSyncedVideoUrl
const isStep3Complete = isStep2Complete && !!composedVideoPath;
// Step 4+ 字幕压制:需要视频生成完成
// 判断用户能否进入某步骤
const canAccessStep = (stepId: number) => {
+44 -2
View File
@@ -45,6 +45,9 @@ export interface ProjectState {
currentStep: number; // 当前视频创作步骤 (1-6)
scriptDuration?: number; // 脚本生成参数:视频时长
scriptType?: string; // 脚本生成参数:脚本类型
voiceSpeed: number; // 配音语速
voiceVolume: number; // 配音音量
voicePitch: number; // 配音音调
}
interface ProjectActions {
@@ -57,12 +60,16 @@ interface ProjectActions {
setCurrentStep: (step: number) => void;
setSelectedHumanId: (id: string | undefined) => void;
setSelectedAvatarInfo: (humanId?: string, elementId?: number, voiceId?: string) => void;
setSelectedVoiceId: (voiceId: string) => void;
setFinalVideoPath: (path: string | undefined) => void;
setCoverPath: (path: string | undefined) => void;
setCoverConfig: (config: ProjectMeta['coverConfig']) => void;
setExportedAt: (timestamp: number | undefined) => void;
setScriptDuration: (duration: number) => void;
setScriptType: (type: string) => void;
setVoiceSpeed: (speed: number) => void;
setVoiceVolume: (volume: number) => void;
setVoicePitch: (pitch: number) => void;
setIsLoading: (loading: boolean) => void;
setHasHydrated: (hydrated: boolean) => void;
@@ -83,6 +90,9 @@ const initialState: Omit<
segments: [],
currentStep: 1,
_isLoading: false,
voiceSpeed: 1.0,
voiceVolume: 0,
voicePitch: 0,
_hasHydrated: false,
};
@@ -112,6 +122,9 @@ interface SaveState {
burnedVideoPath?: string;
scriptDuration?: number;
scriptType?: string;
voiceSpeed?: number;
voiceVolume?: number;
voicePitch?: number;
currentStep?: number;
coverConfig?: ProjectMeta['coverConfig'];
segments: ScriptShot[];
@@ -168,6 +181,9 @@ export async function saveMetaToLocalFile(overrides: Partial<Omit<SaveState, 'se
subtitleAlignment: 'subtitleAlignment' in overrides ? overrides.subtitleAlignment : existingMeta?.subtitleAlignment,
burnedVideoPath: 'burnedVideoPath' in overrides ? overrides.burnedVideoPath : existingMeta?.burnedVideoPath,
coverConfig: 'coverConfig' in overrides ? overrides.coverConfig : existingMeta?.coverConfig,
voiceSpeed: 'voiceSpeed' in overrides ? overrides.voiceSpeed : existingMeta?.voiceSpeed,
voiceVolume: 'voiceVolume' in overrides ? overrides.voiceVolume : existingMeta?.voiceVolume,
voicePitch: 'voicePitch' in overrides ? overrides.voicePitch : existingMeta?.voicePitch,
scriptDuration: 'scriptDuration' in overrides ? overrides.scriptDuration : existingMeta?.scriptDuration,
scriptType: 'scriptType' in overrides ? overrides.scriptType : existingMeta?.scriptType,
};
@@ -264,6 +280,12 @@ export const useProjectStore = create<ProjectStore>()(
state.updatedAt = Date.now();
});
},
setSelectedVoiceId: voiceId => {
set(state => {
state.selectedVoiceId = voiceId;
state.updatedAt = Date.now();
});
},
setFinalVideoPath: path => {
set(state => {
state.finalVideoPath = path;
@@ -302,7 +324,21 @@ export const useProjectStore = create<ProjectStore>()(
state.scriptType = type;
state.updatedAt = Date.now();
}),
setVoiceSpeed: speed =>
set(state => {
state.voiceSpeed = speed;
state.updatedAt = Date.now();
}),
setVoiceVolume: volume =>
set(state => {
state.voiceVolume = volume;
state.updatedAt = Date.now();
}),
setVoicePitch: pitch =>
set(state => {
state.voicePitch = pitch;
state.updatedAt = Date.now();
}),
}),
{
@@ -364,6 +400,9 @@ export async function initProjectStore(projectId?: string): Promise<void> {
dubbingAudioUrl: meta?.dubbingAudioUrl,
dubbingAudioPath: meta?.dubbingAudioPath,
dubbingVoiceId: meta?.dubbingVoiceId,
voiceSpeed: meta?.voiceSpeed ?? 1.0,
voiceVolume: meta?.voiceVolume ?? 0,
voicePitch: meta?.voicePitch ?? 0,
avatarMaterialPath: meta?.avatarMaterialPath,
avatarMaterialName: meta?.avatarMaterialName,
avatarMaterialDuration: meta?.avatarMaterialDuration,
@@ -397,7 +436,10 @@ export async function createNewProject(topic?: string, segments?: ScriptShot[]):
topic: undefined,
selectedHumanId: undefined,
selectedElementId: undefined,
selectedVoiceId: undefined,
selectedVoiceId: 'tianxin_xiaoling',
voiceSpeed: 1.0,
voiceVolume: 0,
voicePitch: 0,
finalVideoPath: undefined,
coverPath: undefined,
coverConfig: undefined,
-37
View File
@@ -13,7 +13,6 @@ import * as voiceApi from '../api/modules/voice';
interface VoiceState {
// 预设音色列表
presetVoices: VoiceInfo[];
selectedVoiceId: string;
// 项目音频文件列表
projectAudios: AudioMeta[];
@@ -25,15 +24,6 @@ interface VoiceState {
// 当前项目 ID
currentProjectId: string | null;
// 语速
speed: number;
// 音量 (0.5-10.0)
volume: number;
// 音调 (-10 到 10)
pitch: number;
// 加载状态
isLoadingVoices: boolean;
isLoadingAudios: boolean;
@@ -46,16 +36,6 @@ interface VoiceState {
interface VoiceActions {
// 音色操作
loadPresetVoices: () => Promise<void>;
setSelectedVoiceId: (id: string) => void;
// 语速
setSpeed: (speed: number) => void;
// 音量
setVolume: (volume: number) => void;
// 音调
setPitch: (pitch: number) => void;
// 素材库操作
loadVoiceMaterials: () => Promise<void>;
@@ -87,13 +67,9 @@ interface VoiceActions {
const initialState: VoiceState = {
presetVoices: [],
selectedVoiceId: 'tianxin_xiaoling', // 甜心小玲
projectAudios: [],
audioMapping: {},
currentProjectId: null,
speed: 1.0,
volume: 0,
pitch: 0,
isLoadingVoices: false,
isLoadingAudios: false,
voiceMaterials: [],
@@ -171,17 +147,6 @@ export const useVoiceStore = create<VoiceState & VoiceActions>()(
}
},
setSelectedVoiceId: (id) => set({ selectedVoiceId: id }),
// ====================== 语速 ======================
setSpeed: (speed: number) => set({ speed }),
// ====================== 音量 ======================
setVolume: (volume: number) => set({ volume }),
// ====================== 音调 ======================
setPitch: (pitch: number) => set({ pitch }),
// ====================== 素材库操作 ======================
loadVoiceMaterials: async () => {
set({ isLoadingMaterials: true });
@@ -358,10 +323,8 @@ export function usePresetVoices() {
return useVoiceStore(
useShallow(state => ({
voices: state.presetVoices,
selectedVoiceId: state.selectedVoiceId,
isLoading: state.isLoadingVoices,
load: state.loadPresetVoices,
setSelected: state.setSelectedVoiceId,
}))
);
}
+118
View File
@@ -0,0 +1,118 @@
/**
* 音频打轴文本匹配工具
* =====================
*
* 将火山引擎打轴返回的 utterances 按 segment voiceover 进行分组匹配,
* 计算每个 segment 在完整配音音频中的时间范围。
*/
export interface Utterance {
text: string;
startTime: number; // 毫秒
endTime: number; // 毫秒
}
export interface SegmentAudioRange {
segmentId: number;
startTime: number; // 毫秒
endTime: number; // 毫秒
actualDuration: number; // 秒
matchedText: string;
matchedUtteranceCount: number;
}
// 中文数字映射(只处理单个数字,不处理组合如"十五")
const CN_NUMBERS: Record<string, string> = {
'零': '0', '': '0', '一': '1', '二': '2', '两': '2', '三': '3',
'四': '4', '五': '5', '六': '6', '七': '7', '八': '8', '九': '9',
};
// 语气词列表
const PARTICLES = new Set([
'啊', '呢', '吧', '嗯', '哦', '哈', '啦', '嘛', '呗', '咧', '哟',
]);
/**
* 文本归一化:去标点、去空格、中文数字转阿拉伯数字、去语气词
*/
function normalizeText(text: string): string {
return text
// 去标点(保留中文、英文、数字)
.replace(/[^\u4e00-\u9fa5a-zA-Z0-9]/g, '')
// 中文数字转阿拉伯数字
.replace(/[零〇一二两三四五六七八九十]/g, (match) => CN_NUMBERS[match] || match)
// 去语气词
.split('')
.filter((c) => !PARTICLES.has(c))
.join('')
.toLowerCase();
}
/**
* 将 segment voiceover 列表与打轴 utterances 进行顺序匹配
*
* @param segments 分镜列表,每个含 id 和 voiceover
* @param utterances 打轴返回的字幕时间轴
* @returns 每个 segment 对应的时间范围,匹配失败的 segment 不包含在结果中
*/
export function matchSegmentsToUtterances(
segments: Array<{ id: number; voiceover: string }>,
utterances: Utterance[]
): SegmentAudioRange[] {
const results: SegmentAudioRange[] = [];
let uIdx = 0; // 当前 utterance 索引
for (const seg of segments) {
const targetNorm = normalizeText(seg.voiceover);
if (!targetNorm) continue;
let accumulatedNorm = '';
const matchedUtterances: Utterance[] = [];
// 贪婪累加 utterances,直到匹配或耗尽
while (uIdx < utterances.length) {
const u = utterances[uIdx];
const uNorm = normalizeText(u.text);
accumulatedNorm += uNorm;
matchedUtterances.push(u);
uIdx++;
// 检查是否包含目标文本
if (accumulatedNorm.includes(targetNorm)) {
break;
}
}
// 判断是否匹配成功
// 条件1accumulated 文本包含 target 文本(最可靠)
// 条件2:长度差异 ≤ 10% 且 accumulated 以 target 开头(避免完全不相关的文本被错误匹配)
const isMatch = accumulatedNorm.includes(targetNorm);
const lengthDiff = Math.abs(accumulatedNorm.length - targetNorm.length);
const lengthTolerance = Math.max(1, Math.floor(targetNorm.length * 0.1));
const isApproxMatch =
lengthDiff <= lengthTolerance &&
(accumulatedNorm.startsWith(targetNorm.slice(0, Math.min(3, targetNorm.length))) ||
targetNorm.startsWith(accumulatedNorm.slice(0, Math.min(3, accumulatedNorm.length))));
if (isMatch || isApproxMatch) {
const firstU = matchedUtterances[0];
const lastU = matchedUtterances[matchedUtterances.length - 1];
results.push({
segmentId: seg.id,
startTime: firstU.startTime,
endTime: lastU.endTime,
actualDuration: (lastU.endTime - firstU.startTime) / 1000,
matchedText: matchedUtterances.map((u) => u.text).join(''),
matchedUtteranceCount: matchedUtterances.length,
});
} else {
// 匹配失败:已消费的 utterances 确实属于这个失败的 segment,不回退 uIdx
console.warn(
`[audioAlign] Segment ${seg.id} 匹配失败: ` +
`target="${targetNorm}" accumulated="${accumulatedNorm}"`
);
}
}
return results;
}