Files
meijiaka-zy/docs/volcengine-video-caption-api.md
T

5.0 KiB

火山引擎音视频字幕 API 开发文档

更新日期: 2026-04-09
官方文档: https://www.volcengine.com/docs/6561/80907


产品简介

火山引擎音视频字幕服务提供两种能力:

  1. 音视频字幕生成 - 自动识别音频中的语音/歌词,生成带时间轴的字幕
  2. 自动字幕打轴 - 为已有字幕文本自动配上时间轴

基础信息

项目 内容
基础 URL https://openspeech.bytedance.com/api/v1/vc
鉴权 Header Authorization: Bearer; {token}
文件限制 ≤200MB, 支持 WAV/M4A/MP3/MP4/MOV/OGG

API 接口

1. 音视频字幕生成

提交任务

POST /submit?appid={appid}&language=zh-CN&use_punc=True
Content-Type: application/json
Authorization: Bearer; {token}

{"url": "https://example.com/audio.mp3"}

关键参数:

  • language - 语言: zh-CN, en-US, ja-JP, ko-KR, es-MX, ru-RU, fr-FR, yue, wuu, nan, ug
  • caption_type - 识别类型: auto(默认), speech, singing
  • use_punc - 自动标点: True, False
  • use_itn - 数字转换: True(中文数字转阿拉伯数字)
  • words_per_line - 每行字数, 默认 46
  • max_lines - 每屏行数, 默认 1

查询结果

GET /query?appid={appid}&id={task_id}&blocking=1
Authorization: Bearer; {token}

响应:

{
    "code": 0,
    "message": "Success",
    "duration": 5.32,
    "utterances": [
        {
            "text": "识别文本",
            "start_time": 0,
            "end_time": 3197,
            "words": [
                {"text": "单字", "start_time": 0, "end_time": 208}
            ]
        }
    ]
}

2. 自动字幕打轴

提交任务

POST /ata/submit?appid={appid}&caption_type=speech
Content-Type: application/json
Authorization: Bearer; {token}

{
    "url": "https://example.com/audio.mp3",
    "audio_text": "这是要被打轴的字幕文本"
}

参数:

  • caption_type - speech(说话) 或 singing(歌词)
  • sta_punc_mode - 标点模式: 1(省略句末标点), 2(空格代替), 3(保留完整标点)

查询结果

GET /ata/query?appid={appid}&id={task_id}&blocking=1
Authorization: Bearer; {token}

错误码

含义 处理
0 成功 -
2000 处理中 继续轮询
1001 参数无效 检查必填参数
1002 无权限 检查 token
1003 超频 降低调用频率
1010 音频过长 缩短音频
1011 音频过大 压缩音频(<200MB)
1012 格式无效 检查音频格式
1013 音频静音 检查音频内容

Python 代码示例

import requests
import time

TOKEN = "your_token"
APPID = "your_appid"
BASE_URL = "https://openspeech.bytedance.com/api/v1/vc"

def submit(audio_url, language="zh-CN", use_punc=True):
    """提交字幕生成任务"""
    resp = requests.post(
        f"{BASE_URL}/submit",
        params={"appid": APPID, "language": language, "use_punc": str(use_punc)},
        json={"url": audio_url},
        headers={"Authorization": f"Bearer; {TOKEN}"}
    )
    return resp.json()["id"]

def query(task_id):
    """查询任务结果"""
    resp = requests.get(
        f"{BASE_URL}/query",
        params={"appid": APPID, "id": task_id, "blocking": "1"},
        headers={"Authorization": f"Bearer; {TOKEN}"}
    )
    return resp.json()

def generate_caption(audio_url, language="zh-CN"):
    """完整流程: 提交->轮询->返回结果"""
    task_id = submit(audio_url, language)
    
    for _ in range(60):  # 最多轮询60秒
        result = query(task_id)
        if result["code"] == 0:
            return result["utterances"]
        elif result["code"] != 2000:
            raise Exception(f"Task failed: {result['message']}")
        time.sleep(1)
    
    raise Exception("Timeout")

def to_srt(utterances):
    """转换为 SRT 字幕格式"""
    def ms_to_time(ms):
        h = ms // 3600000
        m = (ms % 3600000) // 60000
        s = (ms % 60000) // 1000
        ms = ms % 1000
        return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"
    
    lines = []
    for i, u in enumerate(utterances, 1):
        lines.append(f"{i}")
        lines.append(f"{ms_to_time(u['start_time'])} --> {ms_to_time(u['end_time'])}")
        lines.append(u['text'])
        lines.append("")
    return "\n".join(lines)

# 使用示例
if __name__ == "__main__":
    utterances = generate_caption("https://example.com/audio.mp3")
    srt_content = to_srt(utterances)
    print(srt_content)

cURL 示例

# 1. 提交任务
TASK_ID=$(curl -s -X POST \
  -H "Authorization: Bearer; ${TOKEN}" \
  -H "content-type: application/json" \
  -d '{"url": "'${AUDIO_URL}'"}' \
  "https://openspeech.bytedance.com/api/v1/vc/submit?appid=${APPID}&language=zh-CN" \
  | jq -r '.id')

# 2. 查询结果
curl -s -X GET \
  -H "Authorization: Bearer; ${TOKEN}" \
  "https://openspeech.bytedance.com/api/v1/vc/query?appid=${APPID}&id=${TASK_ID}&blocking=1"