您好,登錄后才能下訂單哦!
2019常見反爬策略及應對方案大匯總了。如果你對反爬蟲的策略和手段還掌握的不很全面,進來學就對了!一切都是剛剛好,一切都不晚!
1 . 構(gòu)造合理的HTTP請求頭。
from fake_useragent import UserAgent ua = UserAgent() ua.ie # Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US); ua.msie # Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)' ua['Internet Explorer'] # Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US) ua.opera # Opera/9.80 (X11; Linux i686; U; ru) Presto/2.8.131 Version/11.11 ua.chrome # Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2' ua.google # Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1290.1 Safari/537.13 ua['google chrome'] # Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11 ua.firefox # Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1 ua.ff # Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0.1 ua.safari # Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25 # and the best one, random via real world browser usage statistic ua.random
2 . 檢查網(wǎng)站生成的Cookie。
3 . 抓取動態(tài)內(nèi)容。
4 . 限制爬取的速度。
5 . 處理表單中的隱藏域。
6 . 處理表單中的驗證碼。
from hashlib import md5 class ChaoClient(object): def __init__(self, username, password, soft_id): self.username = username password = password.encode('utf-8') self.password = md5(password).hexdigest() self.soft_id = soft_id self.base_params = { 'user': self.username, 'pass2': self.password, 'softid': self.soft_id, } self.headers = { 'Connection': 'Keep-Alive', 'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)', } def post_pic(self, im, codetype): params = { 'codetype': codetype, } params.update(self.base_params) files = {'userfile': ('captcha.jpg', im)} r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files, headers=self.headers) return r.json() if __name__ == '__main__': client = ChaoClient('用戶名', '密碼', '軟件ID') with open('captcha.jpg', 'rb') as file: print(client.post_pic(file, 1902))
7 . 繞開“陷阱”。
8 . 隱藏身份。
yum -y install tor useradd admin -d /home/admin passwd admin chown -R admin:admin /home/admin chown -R admin:admin /var/run/tor tor
免責聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。