绕过有道翻译的反爬措施调用翻译接口实现单词短句翻译功能。
源代码(Python实现) 用Python绕过有道翻译的反爬虫措施,调用翻译接口,实现单词和短句的翻译和用法解析。效果如图所示。
话不多说先上代码。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 import requestsimport randomimport hashlibimport timedef salt_sign (e ): navigator_appVersion = "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" t = hashlib.md5(navigator_appVersion.encode("utf-8" )).hexdigest() r = str (int (time.time() * 1000 )) i = r + str (random.randint(1 , 10 )) return { "ts" : r, "bv" : t, "salt" : i, "sign" : hashlib.md5(str ("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5" ).encode("utf-8" )).hexdigest() } def translate (word ): url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule' r = salt_sign(word) data = { "i" : word, "from" : "AUTO" , "to" : "AUTO" , "smartresult" : "dict" , "client" : "fanyideskweb" , "salt" : r["salt" ], "sign" : r["sign" ], "lts" : r["ts" ], "bv" : r["bv" ], "doctype" : "json" , "version" : "2.1" , "keyfrom" : "fanyi.web" , "action" : "FY_BY_REALTlME" } headers = { "Cookie" : "OUTFOX_SEARCH_USER_ID=-286220249@10.108.160.17;" , "Referer" : "http://fanyi.youdao.com/" , "User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" , } res = requests.post(url=url, data=data, headers=headers).json() print (word + "的译文:" + res['translateResult' ][0 ][0 ]['tgt' ]) print ("翻译类型:" + res['type' ]) print ("用法(一):" + res['smartResult' ]['entries' ][0 ]) print ("用法(二):" + res['smartResult' ]['entries' ][1 ]) if __name__ == '__main__' : while True : try : word = input ("请输入你要翻译的单词或短句:" ) translate(word) except Exception as e: print ("错误:" , e)
实现过程 寻找接口 目标网址:有道翻译
随便翻译一个单词,F12 进入控制台,选择网络,选择XHR
查看,很快就发现了一个接口。
查看发送请求的表单数据,是用json
传递的数据,我们就可以用Python发送请求。实现很简单。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 import requestsdef translate (word ): url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule' data = { "i" : word, "from" : "AUTO" , "to" : "AUTO" , "smartresult" : "dict" , "client" : "fanyideskweb" , "salt" : "16632096342368" , "sign" : "f68df14c3fd6c01e6820cd3ffd826e55" , "lts" : "1663209634236" , "bv" : "47edca4d7e6ec9bf4fca7156ea36b8ef" , "doctype" : "json" , "version" : "2.1" , "keyfrom" : "fanyi.web" , "action" : "FY_BY_REALTlME" } headers = { "Cookie" : "OUTFOX_SEARCH_USER_ID=-286220249@10.108.160.17;" , "Referer" : "http://fanyi.youdao.com/" , "User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" , } res = requests.post(url=url, data=data, headers=headers).json() print (res) if __name__ == '__main__' : while True : try : word = input ("请输入你要查询的字符串:" ) translate(word) except Exception as e: print ("错误:" , e)
随意输入一个单词,居然报错了,怎会如此?
破解反爬措施 还是来分析一下表单数据吧:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 data = { "i" : word, "from" : "AUTO" , "to" : "AUTO" , "smartresult" : "dict" , "client" : "fanyideskweb" , "salt" : r["salt" ], "sign" : r["sign" ], "lts" : r["ts" ], "bv" : r["bv" ], "doctype" : "json" , "version" : "2.1" , "keyfrom" : "fanyi.web" , "action" : "FY_BY_REALTlME" }
改变输入的单词,发现salt
、sign
、lts
、bv
每次都变化,而且很容易看出来sign
和salt
经过md5 加密。这可能是有道设置的反爬措施,和之前写过的token很类似。暴力破解之token绕过
我们来分析一下有道翻译的网页源代码,找一下这几个数据是怎么生成的。
发现了一个名为fanyi.min.js 的脚本,猜测这四个数据可能就是这个脚本生成的。打开源代码 ,密密麻麻一大堆还没有格式化。
搜索一下关于sign
和salt
等的代码部分。找到了,格式化一下得到了下面这部分关键代码。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 var r = function (e ) { var t = n.md5 (navigator.appVersion ), r = "" + (new Date ) .getTime (), i = r + parseInt (10 * Math .random (), 10 ); return { ts : r, bv : t, salt : i, sign : n.md5 ("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5" ) } }; t.recordUpdate = function (e ) { var t = e.i , i = r (t); n.ajax ({ type : "POST" , contentType : "application/x-www-form-urlencoded; charset=UTF-8" , url : "/bettertranslation" , data : { i : e.i , client : "fanyideskweb" , salt : i.salt , sign : i.sign , lts : i.ts , bv : i.bv , tgt : e.tgt , modifiedTgt : e.modifiedTgt , from : e.from , to : e.to }, success : function (e ) { }, error : function (e ) { } }) }, t.recordMoreResultLog_get = function (e ) { n.ajax ({ type : "POST" , contentType : "application/x-www-form-urlencoded; charset=UTF-8" , url : "/ctlog" , data : { i : e.i , action : "GET_MORE_TRANSLATION" , from : e.from , to : e.to }, success : function (e ) { }, error : function (e ) { } }) }, t.recordMoreResultLog_choose = function (e ) { n.ajax ({ type : "POST" , contentType : "application/x-www-form-urlencoded; charset=UTF-8" , url : "/ctlog" , data : { i : e.i , tgt : e.tgt , systemName : e.systemName , pos : e.pos , action : "SELECT_OTHER_TRANSLATION" , from : e.from , to : e.to }, success : function (e ) { }, error : function (e ) { } }) };
后面就很简单了,分析一下这段代码。总结一下表单各参数的释义。
1 2 3 4 5 6 7 8 9 10 11 12 13 i:需要进行翻译的字符串 from:被翻译语言的语种 to:翻译后的语言的语种 smartresult:智能结果,固定值 client:客户端,固定值 salt:加密用到的盐值,待定 sign:签名字符串,待定 lts:毫秒时间戳 bv:未知的md5值,固定值 doctype:文档类型,固定值 version:版本,固定值 keyfrom:键来源,固定值 action:操作动作,固定值
根据源代码分析salt
、sign
、lts
、bv
的规律。
1 2 3 4 5 6 7 8 9 10 11 12 13 # 最关键代码 var r = function (e ) { var t = n.md5 (navigator.appVersion ), r = "" + (new Date ) .getTime (), i = r + parseInt (10 * Math .random (), 10 ); return { ts : r, bv : t, salt : i, sign : n.md5 ("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5" ) } };
salt:当前毫秒时间戳与10以内随机数字字符串的拼接
sign:”fanyideskweb”+i+salt+”Ygy_4c=r#e#4EX^NUGUc5”的md5值
ts:当前毫秒时间戳
bv: 浏览器版本md5值
所以绕过就很简单了。这是获取salt
、sign
、lts
、bv
四个值的函数。代码实现很简单,就不多解释了。
1 2 3 4 5 6 7 8 9 10 11 def salt_sign (e ): navigator_appVersion = "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" t = hashlib.md5(navigator_appVersion.encode("utf-8" )).hexdigest() r = str (int (time.time() * 1000 )) i = r + str (random.randint(1 , 10 )) return { "ts" : r, "bv" : t, "salt" : i, "sign" : hashlib.md5(str ("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5" ).encode("utf-8" )).hexdigest() }
格式化返回的JSON数据 返回的数据是JSON格式的,既然是实现翻译功能,当然需要解析一下这个数据了,更好看懂。
1 2 3 4 5 6 { 'errorCode': 0 , 'translateResult': [ [ { 'tgt': '你好', 'src': 'hello'} ] ] , 'type': 'en2zh-CHS', 'smartResult': { 'entries': [ '', 'int. 喂,你好(用于问候或打招呼);喂,你好(打电话时的招呼语);喂,你好(引起别人注 意的招呼语);<非正式>喂,嘿 (认为别人说了蠢话或分心);<英,旧>嘿(表示惊讶)\r\n', 'n. 招呼,问候;(Hello)(法、印、 美、俄)埃洛(人名)\r\n', 'v. 说(或大声说)“喂”;打招呼\r\n'] , 'type': 1 } }
解析如下:
1 2 3 4 5 res = requests.post(url=url, data=data, headers=headers).json() print (word + "的译文:" + res['translateResult' ][0 ][0 ]['tgt' ])print ("翻译类型:" + res['type' ])print ("用法(一):" + res['smartResult' ]['entries' ][0 ])print ("用法(二):" + res['smartResult' ]['entries' ][1 ])
最终代码 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 import requestsimport randomimport hashlibimport timedef salt_sign (e ): navigator_appVersion = "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" t = hashlib.md5(navigator_appVersion.encode("utf-8" )).hexdigest() r = str (int (time.time() * 1000 )) i = r + str (random.randint(1 , 10 )) return { "ts" : r, "bv" : t, "salt" : i, "sign" : hashlib.md5(str ("fanyideskweb" + e + i + "Ygy_4c=r#e#4EX^NUGUc5" ).encode("utf-8" )).hexdigest() } def translate (word ): url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule' r = salt_sign(word) data = { "i" : word, "from" : "AUTO" , "to" : "AUTO" , "smartresult" : "dict" , "client" : "fanyideskweb" , "salt" : r["salt" ], "sign" : r["sign" ], "lts" : r["ts" ], "bv" : r["bv" ], "doctype" : "json" , "version" : "2.1" , "keyfrom" : "fanyi.web" , "action" : "FY_BY_REALTlME" } headers = { "Cookie" : "OUTFOX_SEARCH_USER_ID=-286220249@10.108.160.17;" , "Referer" : "http://fanyi.youdao.com/" , "User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" , } res = requests.post(url=url, data=data, headers=headers).json() print (word + "的译文:" + res['translateResult' ][0 ][0 ]['tgt' ]) print ("翻译类型:" + res['type' ]) print ("用法(一):" + res['smartResult' ]['entries' ][0 ]) print ("用法(二):" + res['smartResult' ]['entries' ][1 ]) if __name__ == '__main__' : while True : try : word = input ("请输入你要翻译的单词或短句:" ) translate(word) except Exception as e: print ("错误:" , e)
我的博客即将同步至腾讯云开发者社区,邀请大家一同入驻:https://cloud.tencent.com/developer/support-plan?invite_code=35w4tlmp1aec4