一個很水的Python代碼分析

發(fā)布時間：2021-11-22 14:30:35 來源：億速云閱讀：148 作者：iii 欄目：編程語言

本篇內(nèi)容主要講解“一個很水的Python代碼分析”，感興趣的朋友不妨來看看。本文介紹的方法操作簡單快捷，實用性強。下面就讓小編來帶大家學習“一個很水的Python代碼分析”吧!

＃ _ * _編碼：UTF-8 _ * _
“””xianhu的python_spider.py“””
導入請求import urllib.errorimport urllib.parseimport urllib.request導入 http.cookiejar

第一段

＃首先定義下邊可能需要的變量
url =  “ https://www.baidu.com ”headers = { “ User-Agent ”：“ Mozilla / 4.0（兼容; MSIE 5.5; Windows NT）” }
＃最簡單的網(wǎng)頁抓取方式
response = urllib.request.urlopen（url，timeout = 10）
html = response.read（）。decode（“ utf-8 ”）
＃使用請求實例代替網(wǎng)址
request = urllib.request.Request（url，data = None，headers = {}）
response = urllib.request.urlopen（request，timeout = 10）

第二段

＃發(fā)送數(shù)據(jù)，即在請求（）中添加數(shù)據(jù)參數(shù)
data = urllib.parse.urlencode（{ “ act ”：“ login ”，“ email ”：“ xianhu@qq.com ”，“ password ”：“ 123456 ” }）
request1 = urllib.request.Request（URL，數(shù)據(jù)=數(shù)據(jù)）            ＃ POST方法
請求2 = urllib.request.Request（URL + “？％S ” ％數(shù)據(jù)）          ＃ GET方法
response = urllib.request.urlopen（request，timeout = 10）
＃發(fā)送報頭，即在請求（）中添加報頭參數(shù)
request = urllib.request.Request（url，data = data，headers = headers）    ＃參數(shù)中添加header參數(shù)
request.add_header（“ Referer ”，“ http://www.baidu.com ”）                ＃另一種添加header的方式，添加Referer是為了應對“反盜鏈”response = urllib.request.urlopen（request，timeout = 10）

第三段

＃網(wǎng)頁抓取引發(fā)異常：urllib.error.HTTPError，urllib.error.URLError，兩者存在繼承關(guān)系
嘗試：
    urllib.request.urlopen（request，timeout = 10）
除了 urllib.error.HTTPError 為 e：
    打?。╡.code，e.reason）
除了 urllib.error.URLError 為 e：
    打?。╡.errno，e.reason）
＃使用代理，以防止IP被封或IP次數(shù)受限：
proxy_handler = urllib.request.ProxyHandler（proxies = { “ http ”：“ 111.123.76.12：8080 ” }）
opener = urllib.request.build_opener（proxy_handler）      ＃利用代理創(chuàng)建opener實例
響應= opener.open（URL）                             ＃直接利用開啟器實例打開URL
urllib.request.install_opener（opener）                    ＃安裝全局opener，然后利用urlopen打開網(wǎng)址
response = urllib.request.urlopen（url）

第四段

＃使用餅干和cookiejar，應對服務器檢查
cookie_jar = http.cookiejar.CookieJar（）
cookie_jar_handler = urllib.request.HTTPCookieProcessor（cookiejar = cookie_jar）
opener = urllib.request.build_opener（cookie_jar_handler）
response = opener.open（url）
＃發(fā)送在瀏覽器中獲取的餅干，兩種方式：
＃（1）直接放到headers里
headers = {
    “ User-Agent ”： “ Mozilla / 4.0（兼容; MSIE 5.5; Windows NT）”，
    “ Cookie ”： “ PHPSESSID = btqkg9amjrtoeev8coq0m78396; USERINFO = n6nxTHTY％2BJA39z6CpNB4eKN8f0KsYLjAQTwPe％2BhLHLruEbjaeh5ulhWAS5RysUM％2B; ”
}
request = urllib.request.Request（url，headers = headers）
＃（2）構(gòu)建cookie，添加到cookiejar中
cookie = http.cookiejar.Cookie（name = “ xx ”，value = “ xx ”，domain = “ xx ”，...）
cookie_jar.set_cookie（餅干）
response = opener.open（url）

＃同時使用代理和cookiejar
opener = urllib.request.build_opener（cookie_jar_handler）
opener.add_handler（proxy_handler）
response = opener.open（“ https://www.baidu.com/ ”）＃抓取網(wǎng)頁中的圖片：同樣適用于抓取網(wǎng)絡上的文件右擊鼠標，找到圖片屬性中的地址，然后進行保存。
response = urllib.request.urlopen（“ http://ww3.sinaimg.cn/large/7d742c99tw1ee7dac2766j204q04qmxq.jpg ”，超時= 120）使用 open（“ test.jpg ”，“ wb ”）作為 file_img：
    file_img.write（response.read（））
＃ HTTP認證：即HTTP身份驗證
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm（）      ＃創(chuàng)建一個PasswordMgr
password_mgr.add_password（realm = None，uri = url，user = ' username '，passwd = ' password '）    ＃添加用戶名和密碼
handler = urllib.request.HTTPBasicAuthHandler（password_mgr）          ＃創(chuàng)建HTTPBasicAuthHandler
opener = urllib.request.build_opener（handler）                        ＃創(chuàng)建opner
response = opener.open（url，timeout = 10）                              ＃獲取數(shù)據(jù)
＃使用套接字代理
進口襪子
導入套接字
socks.setdefaultproxy（socks.PROXY_TYPE_SOCKS5，“ 127.0.0.1 ”，1080）
socket.socket = socks.socksocket
requests.get（“ http://www.baidu.com/s?ie=utf-8&wd=ip ”）

到此，相信大家對“一個很水的Python代碼分析”有了更深的了解，不妨來實際操作一番吧！這里是億速云網(wǎng)站，更多相關(guān)內(nèi)容可以進入相關(guān)頻道進行查詢，關(guān)注我們，繼續(xù)學習！

向AI問一下細節(jié)

一個很水的Python代碼分析

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標簽