利用腾讯云API(Python)对字幕文件进行翻译

原文地址：利用腾讯云API(Python)对字幕文件进行翻译

引言

本篇文章使用腾讯云的机器翻译来对英语字幕文件进行翻译，接口的需要的SecretId和SecretKey请自行上腾讯云https://console.cloud.tencent.com/cam/capi获取，运行环境为Python3.8，如使用Python2，请注意注释内容，并进行相对于的修改，程序还需要用到腾讯云的Python SDK：

代码语言：txt

复制

pip install tencentcloud-sdk-python

翻译前示例文件

代码语言：txt

复制

WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:161632 1 00:00:01.070 --> 00:00:02.970 <v Don>Greetings ladies and gentlemen, this is Don Murdoch</v> 2 00:00:02.970 --> 00:00:05.070 and I'm going to be doing a talk this afternoon here 3 00:00:05.070 --> 00:00:07.960 at the RSA conference on adversary simulation. 4 00:00:07.960 --> 00:00:10.170 We're going to go through this process 5 00:00:10.170 --> 00:00:11.645 and what we want to be able to do 6 00:00:11.645 --> 00:00:13.100 throughout this presentation has help you close the gaps 7 00:00:13.100 --> 00:00:15.070 in your security posture. 8 00:00:15.070 --> 00:00:17.480 So, by way of introduction, I've been in IT 9 00:00:17.480 --> 00:00:20.260 for well over 25 years, about 17 years

10 00:00:20.260 --> 00:00:21.860 in information security. ......

翻译后示例文件

代码语言：txt

复制

WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:161632 1 00:00:01.070 --> 00:00:02.970 <v Don>女士们先生们，大家好，我是Don Murdoch</v> 2 00:00:02.970 --> 00:00:05.070 今天下午我要在这里做一个演讲 3 00:00:05.070 --> 00:00:07.960 在RSA关于对手模拟的会议上。 4 00:00:07.960 --> 00:00:10.170 我们要经历这个过程 5 00:00:10.170 --> 00:00:11.645 我们想要做的是 6 00:00:11.645 --> 00:00:13.100 整个演示文稿帮助您缩小差距 7 00:00:13.100 --> 00:00:15.070 以你的安全姿态。 8 00:00:15.070 --> 00:00:17.480 所以，顺便介绍一下，我在IT行业 9 00:00:17.480 --> 00:00:20.260 已经超过25年了，大约17年

10 00:00:20.260 --> 00:00:21.860 在信息安全方面。

代码

代码语言：txt

复制

# coding:utf-8

'''

@author: Duckweeds7  20210527

@todo: 腾讯云API翻译字幕文件

'''

import json

from time import sleep

from tencentcloud.common import credential

from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException

from tencentcloud.common.profile.client_profile import ClientProfile

from tencentcloud.common.profile.http_profile import HttpProfile

from tencentcloud.tmt.v20180321 import tmt_client, models
class TencentTranslate():
&#39;&#39;&#39;
翻译接口，输入为待翻译句子的列表
&#39;&#39;&#39;
def translate(self, t):
    try:
        cred = credential.Credential(&#34;your SecretId&#34;, &#34;your SecretKey&#34;)
        httpProfile = HttpProfile()
        httpProfile.endpoint = &#34;tmt.tencentcloudapi.com&#34;

        clientProfile = ClientProfile()
        clientProfile.httpProfile = httpProfile
        client = tmt_client.TmtClient(cred, &#34;ap-guangzhou&#34;, clientProfile)

        req = models.TextTranslateBatchRequest()
        params = {
            &#34;Source&#34;: &#34;auto&#34;,
            &#34;Target&#34;: &#34;zh&#34;,
            &#34;ProjectId&#34;: 0,
            &#34;SourceTextList&#34;: t
        }
        req.from_json_string(json.dumps(params))

        resp = client.TextTranslateBatch(req)
        return json.loads(resp.to_json_string())

    except TencentCloudSDKException as err:
        print(err)

&#39;&#39;&#39;
程序主入口
&#39;&#39;&#39;
def main(self, path):
    content = open(path, &#39;r&#39;, encoding=&#39;utf-8&#39;).readlines()  # 将待翻译字幕文件按行读取成列表 
    # python2 content = open(path, &#39;r&#39;).readlines()
    head, context = content[:5], content[5:]  # 切割头部不需要翻译的内容和正文 根据自己需求修改头部行数
    new_context = context[:]  # 复制一份准备用来替换翻译内容的正文部分
    
    wait_for_translate = []  # 声明一个放置待翻译文本的列表
    for c in range(0, len(context), 4): # 将每行的内容加入待翻译列表中，并去掉换行符，4是间隔
        wait_for_translate.append(context[c].replace(&#39;\n&#39;, &#39;&#39;))
    wail_list = [] 
    wail_tmp = []
    for l in range(len(wait_for_translate)): # 这一块是将总的文本切分成多个40行的文本，这是因为腾讯云的批量文本翻译接口有限制，不能超出2000个字符，这一块也是根据你的字幕文件来决定的，句子如果较长的话，就把这个数调低点，句子较短，就把这个数调高。
        wail_tmp.append(wait_for_translate[l])
        if len(wail_tmp) == 40 or l == len(wait_for_translate) - 1: 
            wail_list.append(wail_tmp)
            wail_tmp = []
    translater = []

    for w in range(len(wail_list)): # 批量进行翻译
        translater.extend(self.translate(wail_list[w])[&#39;TargetTextList&#39;])
        sleep(0.21) # 休眠是因为腾讯云接口调用时间限制
    count = 0
    for c in range(0, len(context), 4):
        new_context[c] = translater[count] + &#39;\n&#39; # 替换翻译内容并补上换行符
        count += 1
        if count == len(translater):
            break
    name = path.replace(&#39;en&#39;, &#39;zh&#39;) # 
    with open(name, &#39;w&#39;, encoding=&#39;utf-8&#39;) as f:
        f.writelines(head + new_context)
    return name

if name == 'main':

TencentTranslate().main('xxx_en.vtt')

# test()