绕过有道翻译的反爬措施调用翻译接口实现单词短句翻译功能。

源代码(Python实现)

用Python绕过有道翻译的反爬虫措施,调用翻译接口,实现单词和短句的翻译和用法解析。效果如图所示。

2

话不多说先上代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import requests
import random
import hashlib
import time


def salt_sign(e):
navigator_appVersion = "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
t = hashlib.md5(navigator_appVersion.encode("utf-8")).hexdigest()
r = str(int(time.time() * 1000))
i = r + str(random.randint(1, 10))
return {
"ts": r,
"bv": t,
"salt": i,
"sign": hashlib.md5(str("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5").encode("utf-8")).hexdigest()
}


def translate(word):
url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
r = salt_sign(word)
data = {
"i": word,
"from": "AUTO",
"to": "AUTO",
"smartresult": "dict",
"client": "fanyideskweb",
"salt": r["salt"],
"sign": r["sign"],
"lts": r["ts"],
"bv": r["bv"],
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"action": "FY_BY_REALTlME"
}
headers = {
"Cookie": "OUTFOX_SEARCH_USER_ID=-286220249@10.108.160.17;",
"Referer": "http://fanyi.youdao.com/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
}
res = requests.post(url=url, data=data, headers=headers).json()
print(word + "的译文:" + res['translateResult'][0][0]['tgt'])
print("翻译类型:" + res['type'])
print("用法(一):" + res['smartResult']['entries'][0])
print("用法(二):" + res['smartResult']['entries'][1])


if __name__ == '__main__':
while True:
try:
word = input("请输入你要翻译的单词或短句:")
translate(word)
except Exception as e:
print("错误:", e)

实现过程

寻找接口

目标网址:有道翻译

随便翻译一个单词,F12进入控制台,选择网络,选择XHR查看,很快就发现了一个接口。

1

查看发送请求的表单数据,是用json传递的数据,我们就可以用Python发送请求。实现很简单。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import requests


def translate(word):
url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
data = {
"i": word,
"from": "AUTO",
"to": "AUTO",
"smartresult": "dict",
"client": "fanyideskweb",
"salt": "16632096342368",
"sign": "f68df14c3fd6c01e6820cd3ffd826e55",
"lts": "1663209634236",
"bv": "47edca4d7e6ec9bf4fca7156ea36b8ef",
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"action": "FY_BY_REALTlME"
}
headers = {
"Cookie": "OUTFOX_SEARCH_USER_ID=-286220249@10.108.160.17;",
"Referer": "http://fanyi.youdao.com/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
}
res = requests.post(url=url, data=data, headers=headers).json()
print(res)


if __name__ == '__main__':
while True:
try:
word = input("请输入你要查询的字符串:")
translate(word)
except Exception as e:
print("错误:", e)

随意输入一个单词,居然报错了,怎会如此?

3

破解反爬措施

还是来分析一下表单数据吧:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
data = {
"i": word,
"from": "AUTO",
"to": "AUTO",
"smartresult": "dict",
"client": "fanyideskweb",
"salt": r["salt"],
"sign": r["sign"],
"lts": r["ts"],
"bv": r["bv"],
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"action": "FY_BY_REALTlME"
}

改变输入的单词,发现saltsignltsbv每次都变化,而且很容易看出来signsalt经过md5加密。这可能是有道设置的反爬措施,和之前写过的token很类似。暴力破解之token绕过

我们来分析一下有道翻译的网页源代码,找一下这几个数据是怎么生成的。

发现了一个名为fanyi.min.js的脚本,猜测这四个数据可能就是这个脚本生成的。打开源代码,密密麻麻一大堆还没有格式化。

搜索一下关于signsalt等的代码部分。找到了,格式化一下得到了下面这部分关键代码。

5

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
var r = function (e) {
var t = n.md5(navigator.appVersion),
r = "" + (new Date)
.getTime(),
i = r + parseInt(10 * Math.random(), 10);
return {
ts: r,
bv: t,
salt: i,
sign: n.md5("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5")
}
};
t.recordUpdate = function (e) {
var t = e.i,
i = r(t);
n.ajax({
type: "POST",
contentType: "application/x-www-form-urlencoded; charset=UTF-8",
url: "/bettertranslation",
data: {
i: e.i,
client: "fanyideskweb",
salt: i.salt,
sign: i.sign,
lts: i.ts,
bv: i.bv,
tgt: e.tgt,
modifiedTgt: e.modifiedTgt,
from: e.from,
to: e.to
},
success: function (e) { },
error: function (e) { }
})
}, t.recordMoreResultLog_get = function (e) {
n.ajax({
type: "POST",
contentType: "application/x-www-form-urlencoded; charset=UTF-8",
url: "/ctlog",
data: {
i: e.i,
action: "GET_MORE_TRANSLATION",
from: e.from,
to: e.to
},
success: function (e) { },
error: function (e) { }
})
}, t.recordMoreResultLog_choose = function (e) {
n.ajax({
type: "POST",
contentType: "application/x-www-form-urlencoded; charset=UTF-8",
url: "/ctlog",
data: {
i: e.i,
tgt: e.tgt,
systemName: e.systemName,
pos: e.pos,
action: "SELECT_OTHER_TRANSLATION",
from: e.from,
to: e.to
},
success: function (e) { },
error: function (e) { }
})
};

后面就很简单了,分析一下这段代码。总结一下表单各参数的释义。

1
2
3
4
5
6
7
8
9
10
11
12
13
i:需要进行翻译的字符串
from:被翻译语言的语种
to:翻译后的语言的语种
smartresult:智能结果,固定值
client:客户端,固定值
salt:加密用到的盐值,待定
sign:签名字符串,待定
lts:毫秒时间戳
bv:未知的md5值,固定值
doctype:文档类型,固定值
version:版本,固定值
keyfrom:键来源,固定值
action:操作动作,固定值

根据源代码分析saltsignltsbv的规律。

1
2
3
4
5
6
7
8
9
10
11
12
13
# 最关键代码
var r = function (e) {
var t = n.md5(navigator.appVersion),
r = "" + (new Date)
.getTime(),
i = r + parseInt(10 * Math.random(), 10);
return {
ts: r,
bv: t,
salt: i,
sign: n.md5("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5")
}
};
  • salt:当前毫秒时间戳与10以内随机数字字符串的拼接
  • sign:”fanyideskweb”+i+salt+”Ygy_4c=r#e#4EX^NUGUc5”的md5值
  • ts:当前毫秒时间戳
  • bv: 浏览器版本md5值

所以绕过就很简单了。这是获取saltsignltsbv四个值的函数。代码实现很简单,就不多解释了。

1
2
3
4
5
6
7
8
9
10
11
def salt_sign(e):
navigator_appVersion = "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
t = hashlib.md5(navigator_appVersion.encode("utf-8")).hexdigest()
r = str(int(time.time() * 1000))
i = r + str(random.randint(1, 10))
return {
"ts": r,
"bv": t,
"salt": i,
"sign": hashlib.md5(str("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5").encode("utf-8")).hexdigest()
}

格式化返回的JSON数据

返回的数据是JSON格式的,既然是实现翻译功能,当然需要解析一下这个数据了,更好看懂。

1
2
3
4
5
6
{
'errorCode': 0,
'translateResult': [[{'tgt': '你好', 'src': 'hello'}]],
'type': 'en2zh-CHS',
'smartResult': {'entries': ['', 'int. 喂,你好(用于问候或打招呼);喂,你好(打电话时的招呼语);喂,你好(引起别人注 意的招呼语);<非正式>喂,嘿 (认为别人说了蠢话或分心);<英,旧>嘿(表示惊讶)\r\n', 'n. 招呼,问候;(Hello)(法、印、 美、俄)埃洛(人名)\r\n', 'v. 说(或大声说)“喂”;打招呼\r\n'], 'type': 1}
}

解析如下:

1
2
3
4
5
res = requests.post(url=url, data=data, headers=headers).json()
print(word + "的译文:" + res['translateResult'][0][0]['tgt'])
print("翻译类型:" + res['type'])
print("用法(一):" + res['smartResult']['entries'][0])
print("用法(二):" + res['smartResult']['entries'][1])

最终代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import requests
import random
import hashlib
import time


def salt_sign(e):
navigator_appVersion = "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
t = hashlib.md5(navigator_appVersion.encode("utf-8")).hexdigest()
r = str(int(time.time() * 1000))
i = r + str(random.randint(1, 10))
return {
"ts": r,
"bv": t,
"salt": i,
"sign": hashlib.md5(str("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5").encode("utf-8")).hexdigest()
}


def translate(word):
url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
r = salt_sign(word)
data = {
"i": word,
"from": "AUTO",
"to": "AUTO",
"smartresult": "dict",
"client": "fanyideskweb",
"salt": r["salt"],
"sign": r["sign"],
"lts": r["ts"],
"bv": r["bv"],
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"action": "FY_BY_REALTlME"
}
headers = {
"Cookie": "OUTFOX_SEARCH_USER_ID=-286220249@10.108.160.17;",
"Referer": "http://fanyi.youdao.com/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
}
res = requests.post(url=url, data=data, headers=headers).json()
print(word + "的译文:" + res['translateResult'][0][0]['tgt'])
print("翻译类型:" + res['type'])
print("用法(一):" + res['smartResult']['entries'][0])
print("用法(二):" + res['smartResult']['entries'][1])


if __name__ == '__main__':
while True:
try:
word = input("请输入你要翻译的单词或短句:")
translate(word)
except Exception as e:
print("错误:", e)

我的博客即将同步至腾讯云开发者社区,邀请大家一同入驻:https://cloud.tencent.com/developer/support-plan?invite_code=35w4tlmp1aec4