202 lines
5.0 KiB
Markdown
202 lines
5.0 KiB
Markdown
# 火山引擎音视频字幕 API 开发文档
|
|
|
|
> 更新日期: 2026-04-09
|
|
> 官方文档: https://www.volcengine.com/docs/6561/80907
|
|
|
|
---
|
|
|
|
## 产品简介
|
|
|
|
火山引擎音视频字幕服务提供两种能力:
|
|
|
|
1. **音视频字幕生成** - 自动识别音频中的语音/歌词,生成带时间轴的字幕
|
|
2. **自动字幕打轴** - 为已有字幕文本自动配上时间轴
|
|
|
|
---
|
|
|
|
## 基础信息
|
|
|
|
| 项目 | 内容 |
|
|
|------|------|
|
|
| 基础 URL | `https://openspeech.bytedance.com/api/v1/vc` |
|
|
| 鉴权 Header | `Authorization: Bearer; {token}` |
|
|
| 文件限制 | ≤200MB, 支持 WAV/M4A/MP3/MP4/MOV/OGG |
|
|
|
|
---
|
|
|
|
## API 接口
|
|
|
|
### 1. 音视频字幕生成
|
|
|
|
#### 提交任务
|
|
```http
|
|
POST /submit?appid={appid}&language=zh-CN&use_punc=True
|
|
Content-Type: application/json
|
|
Authorization: Bearer; {token}
|
|
|
|
{"url": "https://example.com/audio.mp3"}
|
|
```
|
|
|
|
**关键参数:**
|
|
- `language` - 语言: `zh-CN`, `en-US`, `ja-JP`, `ko-KR`, `es-MX`, `ru-RU`, `fr-FR`, `yue`, `wuu`, `nan`, `ug`
|
|
- `caption_type` - 识别类型: `auto`(默认), `speech`, `singing`
|
|
- `use_punc` - 自动标点: `True`, `False`
|
|
- `use_itn` - 数字转换: `True`(中文数字转阿拉伯数字)
|
|
- `words_per_line` - 每行字数, 默认 46
|
|
- `max_lines` - 每屏行数, 默认 1
|
|
|
|
#### 查询结果
|
|
```http
|
|
GET /query?appid={appid}&id={task_id}&blocking=1
|
|
Authorization: Bearer; {token}
|
|
```
|
|
|
|
**响应:**
|
|
```json
|
|
{
|
|
"code": 0,
|
|
"message": "Success",
|
|
"duration": 5.32,
|
|
"utterances": [
|
|
{
|
|
"text": "识别文本",
|
|
"start_time": 0,
|
|
"end_time": 3197,
|
|
"words": [
|
|
{"text": "单字", "start_time": 0, "end_time": 208}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 2. 自动字幕打轴
|
|
|
|
#### 提交任务
|
|
```http
|
|
POST /ata/submit?appid={appid}&caption_type=speech
|
|
Content-Type: application/json
|
|
Authorization: Bearer; {token}
|
|
|
|
{
|
|
"url": "https://example.com/audio.mp3",
|
|
"audio_text": "这是要被打轴的字幕文本"
|
|
}
|
|
```
|
|
|
|
**参数:**
|
|
- `caption_type` - `speech`(说话) 或 `singing`(歌词)
|
|
- `sta_punc_mode` - 标点模式: `1`(省略句末标点), `2`(空格代替), `3`(保留完整标点)
|
|
|
|
#### 查询结果
|
|
```http
|
|
GET /ata/query?appid={appid}&id={task_id}&blocking=1
|
|
Authorization: Bearer; {token}
|
|
```
|
|
|
|
---
|
|
|
|
## 错误码
|
|
|
|
| 码 | 含义 | 处理 |
|
|
|----|------|------|
|
|
| 0 | 成功 | - |
|
|
| 2000 | 处理中 | 继续轮询 |
|
|
| 1001 | 参数无效 | 检查必填参数 |
|
|
| 1002 | 无权限 | 检查 token |
|
|
| 1003 | 超频 | 降低调用频率 |
|
|
| 1010 | 音频过长 | 缩短音频 |
|
|
| 1011 | 音频过大 | 压缩音频(<200MB) |
|
|
| 1012 | 格式无效 | 检查音频格式 |
|
|
| 1013 | 音频静音 | 检查音频内容 |
|
|
|
|
---
|
|
|
|
## Python 代码示例
|
|
|
|
```python
|
|
import requests
|
|
import time
|
|
|
|
TOKEN = "your_token"
|
|
APPID = "your_appid"
|
|
BASE_URL = "https://openspeech.bytedance.com/api/v1/vc"
|
|
|
|
def submit(audio_url, language="zh-CN", use_punc=True):
|
|
"""提交字幕生成任务"""
|
|
resp = requests.post(
|
|
f"{BASE_URL}/submit",
|
|
params={"appid": APPID, "language": language, "use_punc": str(use_punc)},
|
|
json={"url": audio_url},
|
|
headers={"Authorization": f"Bearer; {TOKEN}"}
|
|
)
|
|
return resp.json()["id"]
|
|
|
|
def query(task_id):
|
|
"""查询任务结果"""
|
|
resp = requests.get(
|
|
f"{BASE_URL}/query",
|
|
params={"appid": APPID, "id": task_id, "blocking": "1"},
|
|
headers={"Authorization": f"Bearer; {TOKEN}"}
|
|
)
|
|
return resp.json()
|
|
|
|
def generate_caption(audio_url, language="zh-CN"):
|
|
"""完整流程: 提交->轮询->返回结果"""
|
|
task_id = submit(audio_url, language)
|
|
|
|
for _ in range(60): # 最多轮询60秒
|
|
result = query(task_id)
|
|
if result["code"] == 0:
|
|
return result["utterances"]
|
|
elif result["code"] != 2000:
|
|
raise Exception(f"Task failed: {result['message']}")
|
|
time.sleep(1)
|
|
|
|
raise Exception("Timeout")
|
|
|
|
def to_srt(utterances):
|
|
"""转换为 SRT 字幕格式"""
|
|
def ms_to_time(ms):
|
|
h = ms // 3600000
|
|
m = (ms % 3600000) // 60000
|
|
s = (ms % 60000) // 1000
|
|
ms = ms % 1000
|
|
return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"
|
|
|
|
lines = []
|
|
for i, u in enumerate(utterances, 1):
|
|
lines.append(f"{i}")
|
|
lines.append(f"{ms_to_time(u['start_time'])} --> {ms_to_time(u['end_time'])}")
|
|
lines.append(u['text'])
|
|
lines.append("")
|
|
return "\n".join(lines)
|
|
|
|
# 使用示例
|
|
if __name__ == "__main__":
|
|
utterances = generate_caption("https://example.com/audio.mp3")
|
|
srt_content = to_srt(utterances)
|
|
print(srt_content)
|
|
```
|
|
|
|
---
|
|
|
|
## cURL 示例
|
|
|
|
```bash
|
|
# 1. 提交任务
|
|
TASK_ID=$(curl -s -X POST \
|
|
-H "Authorization: Bearer; ${TOKEN}" \
|
|
-H "content-type: application/json" \
|
|
-d '{"url": "'${AUDIO_URL}'"}' \
|
|
"https://openspeech.bytedance.com/api/v1/vc/submit?appid=${APPID}&language=zh-CN" \
|
|
| jq -r '.id')
|
|
|
|
# 2. 查询结果
|
|
curl -s -X GET \
|
|
-H "Authorization: Bearer; ${TOKEN}" \
|
|
"https://openspeech.bytedance.com/api/v1/vc/query?appid=${APPID}&id=${TASK_ID}&blocking=1"
|
|
```
|