您好,登錄后才能下訂單哦!
這篇文章將為大家詳細(xì)講解有關(guān)用Python如何爬取下載kindle網(wǎng)站電子書,小編覺得挺實(shí)用的,因此分享給大家做個(gè)參考,希望大家閱讀完這篇文章后可以有所收獲。
一個(gè)下載看kindle(kankindle.com)的所有電子書的python腳本,程序會(huì)自動(dòng)下載首頁部分13頁的所有電子書,下載到ebook目錄下,程序會(huì)檢測是否下載過。
#!/usr/bin/env python # coding=utf-8 from bs4 import BeautifulSoup import urllib2 import socket import re import unicodedata import os from urwid.text_layout import trim_line def download(url): print 'starting download %s' % url response=urllib2.urlopen(url,timeout=30) html_data=response.read() soup=BeautifulSoup(html_data) print 'start to analayse---------------' title_soup=soup.find_all(class_='yanshi_xiazai') name_soup = soup.find_all('h2') tag_a = title_soup[0].a.attrs['href'] tag_name= title_soup[0].a.contents link_name = name_soup[0] link_name = str(link_name).replace("<h2>","").replace("</h2>","") #print tag_name[0] #print link_name filename = link_name+".mobi" filename = "ebook/"+filename print 'filename is :%s' % filename print "downloading with urllib2 %s" % tag_a if os.path.exists(filename): print 'already donwload ,ignore' else: try: f = urllib2.urlopen(tag_a,timeout=60) data = f.read() #print 'the data is %s'% data with open(filename, "wb") as code: code.write(data) except Exception,e: print e def get_all_link(url): print 'Starting get all the list' response=urllib2.urlopen(url,timeout=30) html_data=response.read() #print html_data soup=BeautifulSoup(html_data) link_soup = soup.find_all('a') #print link_soup for each_link in link_soup: if re.search('view',str(each_link)): #print each_link print each_link print each_link.attrs['href'] download(each_link.attrs['href']) if __name__ == '__main__': for page in range(1,13): url = "http://kankindle.com/simple/page/3"+str(page) url = url.strip() print url get_all_link(url)
關(guān)于用Python如何爬取下載kindle網(wǎng)站電子書就分享到這里了,希望以上內(nèi)容可以對大家有一定的幫助,可以學(xué)到更多知識(shí)。如果覺得文章不錯(cuò),可以把它分享出去讓更多的人看到。
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。