<menuitem id="gvwya"></menuitem>

<menuitem id="gvwya"></menuitem>

溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊(cè)×

獲取短信驗(yàn)證碼

其他方式登錄

點(diǎn)擊登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請(qǐng)使用微信掃描上方二維碼

使用幫助

請(qǐng)求超時(shí)！

請(qǐng)點(diǎn)擊重新獲取二維碼

Python如何爬取全站小說(shuō)

發(fā)布時(shí)間：2021-11-25 14:30:20 來(lái)源：億速云閱讀：208 作者：iii 欄目：大數(shù)據(jù)

這篇文章主要講解了“Python如何爬取全站小說(shuō)”，文中的講解內(nèi)容簡(jiǎn)單清晰，易于學(xué)習(xí)與理解，下面請(qǐng)大家跟著小編的思路慢慢深入，一起來(lái)研究和學(xué)習(xí)“Python如何爬取全站小說(shuō)”吧！

開(kāi)發(fā)環(huán)境：

版本：anaconda5.2.0(python3.6.5)
編輯器：pycharm 社區(qū)版

PS：如有需要Python學(xué)習(xí)資料的小伙伴可以加下方的群去找免費(fèi)管理員領(lǐng)取

點(diǎn)擊加群即可免費(fèi)獲取Python學(xué)習(xí)資料

可以免費(fèi)領(lǐng)取源碼、項(xiàng)目實(shí)戰(zhàn)視頻、PDF文件等

開(kāi)始擼代碼：

1、導(dǎo)入工具

import requests
import parsel

2、偽造瀏覽器的環(huán)境

headers = {
    # "Cookie": "bcolor=; font=; size=; fontcolor=; width=; Hm_lvt_3806e321b1f2fd3d61de33e5c1302fa5=1596800365,1596800898; Hm_lpvt_3806e321b1f2fd3d61de33e5c1302fa5=1596802442",
    "Host": "www.shuquge.com",
    "Referer": "http://www.shuquge.com/txt/8659/index.html",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36",
}

3、解析網(wǎng)站，爬取小說(shuō)

def download_one_chapter(url_chapter, book):
    """爬取一章小說(shuō)"""
    # 從瀏覽器里面分析出來(lái)的
    response = requests.get(url_chapter, headers=headers)
    # response.apparent_encoding
    # 自適應(yīng)編碼,萬(wàn)能的  正確率是百分之 99%
    response.encoding = response.apparent_encoding
    # print(response.text)
    """提取數(shù)據(jù)"""
    """ 
    工具  bs4 parsel
    
    xpath
    css
    re
    """
    # 把html轉(zhuǎn)化為提取對(duì)象
    # 標(biāo)簽重復(fù)怎么辦 id class 怎么二次進(jìn)行提取
    sel = parsel.Selector(response.text)
    h2 = sel.css('h2::text')
    title = h2.get()
    print(title)

    content = sel.css('#content ::text').getall()
    # print(content)
    # text = "".join(content)
    # print(text)
    # w write 寫(xiě)入
    """寫(xiě)入數(shù)據(jù)"""
    # with open(title + '.txt', mode='w', encoding='utf-8') as f:
    with open(book + '.txt', mode='w', encoding='utf-8') as f:
        f.write(title)
        f.write('\n')
        for line in content:
            f.write(line.strip())
            f.write('\n')
"""爬取一本小說(shuō) 會(huì)有很多章"""
# download_one_chapter('http://www.shuquge.com/txt/8659/2324752.html')
# download_one_chapter('http://www.shuquge.com/txt/8659/2324753.html')
def download_one_book(book_url):
    response = requests.get(book_url, headers=headers)
    response.encoding = response.apparent_encoding
    html = response.text
    sel = parsel.Selector(html)
    title = sel.css('h3::text').get()

    index_s = sel.css('body > div.listmain > dl > dd > a::attr(href)').getall()
    print(index_s)
    for index in index_s:
        print(book_url[:-10] + index)
        one_chapter_url = book_url[:-10] + index
        download_one_chapter(one_chapter_url, title)

1. 異常不會(huì) try except

2. 錯(cuò)誤重試報(bào)錯(cuò)之后,重新嘗試,或者是記錄下來(lái),重新請(qǐng)求

下載一本小說(shuō)需要哪些東西

download_one_book('http://www.shuquge.com/txt/8659/index.html')
download_one_book('http://www.shuquge.com/txt/122230/index.html')
download_one_book('http://www.shuquge.com/txt/117456/index.html')

根據(jù)每一章的地址下載每一章小說(shuō)根據(jù)每一本小說(shuō)的目錄頁(yè)下載一個(gè)本小說(shuō)

下載整個(gè)網(wǎng)站的小說(shuō) -> 下載所有類別的小說(shuō) -> 下載每一個(gè)類別下面的每一頁(yè)小說(shuō)

運(yùn)行代碼后的效果：

Python如何爬取全站小說(shuō)

感謝各位的閱讀，以上就是“Python如何爬取全站小說(shuō)”的內(nèi)容了，經(jīng)過(guò)本文的學(xué)習(xí)后，相信大家對(duì)Python如何爬取全站小說(shuō)這一問(wèn)題有了更深刻的體會(huì)，具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是億速云，小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章，歡迎關(guān)注！

向AI問(wèn)一下細(xì)節(jié)

推薦閱讀：

免責(zé)聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng)，如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱：is@yisu.com進(jìn)行舉報(bào)，并提供相關(guān)證據(jù)，一經(jīng)查實(shí)，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
Python中pynput庫(kù)怎么用
下一篇新聞：
Python中錯(cuò)誤與異常如何處理

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動(dòng)

幫助支持

關(guān)于我們

售后咨詢

7*24小時(shí)在線電話：400-100-2938

7*24小時(shí)在線 QQ：800811969

關(guān)注億速云

億速云公眾號(hào)

手機(jī)網(wǎng)站二維碼

<sup id="z6ess"></sup>

<listing id="z6ess"><ul id="z6ess"></ul></listing>

<legend id="z6ess"></legend>