溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務(wù)條款》

python爬取貼吧圖片并下載

發(fā)布時間:2020-03-02 11:44:59 來源:網(wǎng)絡(luò) 閱讀:401 作者:薩瓦迪迪卡 欄目:系統(tǒng)運維
# cording = utf-8
import urllib2
import urllib
import re
import random
import time
def get_url(url):
    Agent_list = ['Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36',
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
    "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11"]  
    ip_list = ['123.169.165.255:9999','117.69.13.64:9999','223.198.1.147:9999']    
    user_agent = random.choice(Agent_list)    
    httpproxy_handler = urllib2.ProxyHandler({'http':random.choice(ip_list)})    
    opener = urllib2.build_opener(httpproxy_handler)    
    urllib2.install_opener(opener)    
    page = urllib2.Request(url)   
    page.add_header('User-Agent',user_agent)    
    response = urllib2.urlopen(page)    
    html = response.read()    
    return html       

def down_img(html):
    reg = r'src="(http:.+?\.jpg)'  
    urlre = re.compile(reg)    
    imglist = re.findall(urlre,html)    
    for img in imglist:    
        filename = img.split("/")[-1]        
        urllib.urlretrieve(img,filename,None)        
while True:            
    url = raw_input("請輸入下載圖片的貼吧地址:")    
    if url == q:    
       print("運行終止!")    
       break    
    else:  
       print('開始獲取網(wǎng)頁信息...')
       get_url(url)
       print('獲取網(wǎng)頁信息成功!')
       print('開始下載圖片...')
       down_img(get_url(url))
       print('圖片下載完成!') 
       timesleep(3)
向AI問一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI