您好,登錄后才能下訂單哦!
這篇文章主要介紹了Python怎么爬取漫畫圖片,具有一定借鑒價值,感興趣的朋友可以參考下,希望大家閱讀完這篇文章之后大有收獲,下面讓小編帶著大家一起了解一下。
開發(fā)環(huán)境:
Python 3.6
Pycharm
目標地址
https://www.dmzj.com/info/yaoshenji.html
代碼
導(dǎo)入工具
import requests import os import re from bs4 import BeautifulSoup from contextlib import closing from tqdm import tqdm import time
獲取動漫章節(jié)鏈接和章節(jié)名
r = requests.get(url=target_url) bs = BeautifulSoup(r.text, 'lxml') list_con_li = bs.find('ul', class_="list_con_li") cartoon_list = list_con_li.find_all('a') chapter_names = [] chapter_urls = [] for cartoon in cartoon_list: href = cartoon.get('href') name = cartoon.text chapter_names.insert(0, name) chapter_urls.insert(0, href) print(chapter_urls)
下載漫畫
for i, url in enumerate(tqdm(chapter_urls)): print(i,url) download_header = { 'Referer':url, 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36' } name = chapter_names[i] # 去掉. while '.' in name: name = name.replace('.', '') chapter_save_dir = os.path.join(save_dir, name) if name not in os.listdir(save_dir): os.mkdir(chapter_save_dir) r = requests.get(url=url) html = BeautifulSoup(r.text, 'lxml') script_info = html.script pics = re.findall('\d{13,14}', str(script_info)) for j, pic in enumerate(pics): if len(pic) == 13: pics[j] = pic + '0' pics = sorted(pics, key=lambda x: int(x)) chapterpic_hou = re.findall('\|(\d{5})\|', str(script_info))[0] chapterpic_qian = re.findall('\|(\d{4})\|', str(script_info))[0] for idx, pic in enumerate(pics): if pic[-1] == '0': url = 'https://images.dmzj.com/img/chapterpic/' + chapterpic_qian + '/' + chapterpic_hou + '/' + pic[ :-1] + '.jpg' else: url = 'https://images.dmzj.com/img/chapterpic/' + chapterpic_qian + '/' + chapterpic_hou + '/' + pic + '.jpg' pic_name = '%03d.jpg' % (idx + 1) pic_save_path = os.path.join(chapter_save_dir, pic_name) print(url) response = requests.get(url,headers=download_header) # with closing(requests.get(url, headers=download_header, stream=True)) as response: # chunk_size = 1024 # content_size = int(response.headers['content-length']) print(response) if response.status_code == 200: with open(pic_save_path, "wb") as file: # for data in response.iter_content(chunk_size=chunk_size): file.write(response.content) else: print('鏈接異常') time.sleep(2)
創(chuàng)建保存目錄
save_dir = '妖神記' if save_dir not in os.listdir('./'): os.mkdir(save_dir) target_url = "https://www.dmzj.com/info/yaoshenji.html"
感謝你能夠認真閱讀完這篇文章,希望小編分享的“Python怎么爬取漫畫圖片”這篇文章對大家有幫助,同時也希望大家多多支持億速云,關(guān)注億速云行業(yè)資訊頻道,更多相關(guān)知識等著你來學(xué)習(xí)!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。