# 火山引擎音视频字幕 API 开发文档 > 更新日期: 2026-04-09 > 官方文档: https://www.volcengine.com/docs/6561/80907 --- ## 产品简介 火山引擎音视频字幕服务提供两种能力: 1. **音视频字幕生成** - 自动识别音频中的语音/歌词,生成带时间轴的字幕 2. **自动字幕打轴** - 为已有字幕文本自动配上时间轴 --- ## 基础信息 | 项目 | 内容 | |------|------| | 基础 URL | `https://openspeech.bytedance.com/api/v1/vc` | | 鉴权 Header | `Authorization: Bearer; {token}` | | 文件限制 | ≤200MB, 支持 WAV/M4A/MP3/MP4/MOV/OGG | --- ## API 接口 ### 1. 音视频字幕生成 #### 提交任务 ```http POST /submit?appid={appid}&language=zh-CN&use_punc=True Content-Type: application/json Authorization: Bearer; {token} {"url": "https://example.com/audio.mp3"} ``` **关键参数:** - `language` - 语言: `zh-CN`, `en-US`, `ja-JP`, `ko-KR`, `es-MX`, `ru-RU`, `fr-FR`, `yue`, `wuu`, `nan`, `ug` - `caption_type` - 识别类型: `auto`(默认), `speech`, `singing` - `use_punc` - 自动标点: `True`, `False` - `use_itn` - 数字转换: `True`(中文数字转阿拉伯数字) - `words_per_line` - 每行字数, 默认 46 - `max_lines` - 每屏行数, 默认 1 #### 查询结果 ```http GET /query?appid={appid}&id={task_id}&blocking=1 Authorization: Bearer; {token} ``` **响应:** ```json { "code": 0, "message": "Success", "duration": 5.32, "utterances": [ { "text": "识别文本", "start_time": 0, "end_time": 3197, "words": [ {"text": "单字", "start_time": 0, "end_time": 208} ] } ] } ``` --- ### 2. 自动字幕打轴 #### 提交任务 ```http POST /ata/submit?appid={appid}&caption_type=speech Content-Type: application/json Authorization: Bearer; {token} { "url": "https://example.com/audio.mp3", "audio_text": "这是要被打轴的字幕文本" } ``` **参数:** - `caption_type` - `speech`(说话) 或 `singing`(歌词) - `sta_punc_mode` - 标点模式: `1`(省略句末标点), `2`(空格代替), `3`(保留完整标点) #### 查询结果 ```http GET /ata/query?appid={appid}&id={task_id}&blocking=1 Authorization: Bearer; {token} ``` --- ## 错误码 | 码 | 含义 | 处理 | |----|------|------| | 0 | 成功 | - | | 2000 | 处理中 | 继续轮询 | | 1001 | 参数无效 | 检查必填参数 | | 1002 | 无权限 | 检查 token | | 1003 | 超频 | 降低调用频率 | | 1010 | 音频过长 | 缩短音频 | | 1011 | 音频过大 | 压缩音频(<200MB) | | 1012 | 格式无效 | 检查音频格式 | | 1013 | 音频静音 | 检查音频内容 | --- ## Python 代码示例 ```python import requests import time TOKEN = "your_token" APPID = "your_appid" BASE_URL = "https://openspeech.bytedance.com/api/v1/vc" def submit(audio_url, language="zh-CN", use_punc=True): """提交字幕生成任务""" resp = requests.post( f"{BASE_URL}/submit", params={"appid": APPID, "language": language, "use_punc": str(use_punc)}, json={"url": audio_url}, headers={"Authorization": f"Bearer; {TOKEN}"} ) return resp.json()["id"] def query(task_id): """查询任务结果""" resp = requests.get( f"{BASE_URL}/query", params={"appid": APPID, "id": task_id, "blocking": "1"}, headers={"Authorization": f"Bearer; {TOKEN}"} ) return resp.json() def generate_caption(audio_url, language="zh-CN"): """完整流程: 提交->轮询->返回结果""" task_id = submit(audio_url, language) for _ in range(60): # 最多轮询60秒 result = query(task_id) if result["code"] == 0: return result["utterances"] elif result["code"] != 2000: raise Exception(f"Task failed: {result['message']}") time.sleep(1) raise Exception("Timeout") def to_srt(utterances): """转换为 SRT 字幕格式""" def ms_to_time(ms): h = ms // 3600000 m = (ms % 3600000) // 60000 s = (ms % 60000) // 1000 ms = ms % 1000 return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}" lines = [] for i, u in enumerate(utterances, 1): lines.append(f"{i}") lines.append(f"{ms_to_time(u['start_time'])} --> {ms_to_time(u['end_time'])}") lines.append(u['text']) lines.append("") return "\n".join(lines) # 使用示例 if __name__ == "__main__": utterances = generate_caption("https://example.com/audio.mp3") srt_content = to_srt(utterances) print(srt_content) ``` --- ## cURL 示例 ```bash # 1. 提交任务 TASK_ID=$(curl -s -X POST \ -H "Authorization: Bearer; ${TOKEN}" \ -H "content-type: application/json" \ -d '{"url": "'${AUDIO_URL}'"}' \ "https://openspeech.bytedance.com/api/v1/vc/submit?appid=${APPID}&language=zh-CN" \ | jq -r '.id') # 2. 查询结果 curl -s -X GET \ -H "Authorization: Bearer; ${TOKEN}" \ "https://openspeech.bytedance.com/api/v1/vc/query?appid=${APPID}&id=${TASK_ID}&blocking=1" ```