Python多線程如何下載有聲小說

發(fā)布時(shí)間：2022-01-11 15:24:54 來源：億速云閱讀：197 作者：柒染欄目：編程語言

這篇文章跟大家分析一下“Python多線程如何下載有聲小說”。內(nèi)容詳細(xì)易懂，對(duì)“Python多線程如何下載有聲小說”感興趣的朋友可以跟著小編的思路慢慢深入來閱讀一下，希望閱讀后能夠?qū)Υ蠹矣兴鶐椭?。下面跟著小編一起深入學(xué)習(xí)“Python多線程如何下載有聲小說”的知識(shí)吧。

我特地買了大屏幕的Note II 以便看pdf，另外耳朵也不能閑著，不過咱不是聽英語而是聽小說，我在讀書的時(shí)候就喜歡聽廣播，特別是說書、相聲等，所以我需要大量的有聲小說，現(xiàn)在網(wǎng)上這些資源多的很，但是下載頁記為麻煩，為了掙取更多的流量和廣告點(diǎn)擊，這些網(wǎng)站的下載鏈接都需要打開至少兩個(gè)以上的網(wǎng)頁才能找到真正的鏈接，甚是麻煩，為了節(jié)省整體下載時(shí)間，我寫了這個(gè)小程序，方便自己和大家下載有聲小說（當(dāng)然，還有任何其他類型的資源）

先說明一下，我不是為了爬很多資料和數(shù)據(jù)，僅僅是為了娛樂和學(xué)習(xí)，所以這里不會(huì)漫無目的的取爬取一個(gè)網(wǎng)站的所有鏈接，而是給定一個(gè)小說，比方說我要下載小說《童年》，我會(huì)在我聽評(píng)書網(wǎng)上找到該小說的主頁然后用程序下載所有mp3音頻，具體做法見下面代碼，所有代碼都在模塊crawler5tps中：

1. 先設(shè)定一下start url 和保存文件的目錄

#-*-coding:GBK-*-   import urllib,urllib2   import re,threading,os   baseurl = 'http://www.5tps.com' #base url    down2path = 'E:/enovel/'        #saving path   save2path = ''                  #saving file name (full path)

2. 從start url 解析下載頁面的url

def parseUrl(starturl):       '''''       parse out download page from start url.       eg. we can get 'http://www.5tps.com/down/8297_52_1_1.html' from 'http://www.5tps.com/html/8297.html'       '''      global save2path       rDownloadUrl = re.compile(".*?<A href=\'(/down/\w+\.html)\'.*") #find the link of download page       #rTitle = re.compile("<TITILE>.{4}\s{1}(.*)\s{1}.*</TITLE>")       #<TITLE>有聲小說 悶騷1 播音:劉濤 全集</TITLE>       f = urllib2.urlopen(starturl)       totalLine =  f.readlines()         ''''' create the name of saving file '''      title = totalLine[3].split(" ")[1]       if os.path.exists(down2path+title) is not True:           os.mkdir(down2path+title)           save2path = down2path+title+"/"             downUrlLine = [ line for line in totalLine if rDownloadUrl.match(line)]       downLoadUrl = [];       for dl in downUrlLine:           while True:               m = rDownloadUrl.match(dl)               if not m:                   break              downUrl = m.group(1)               downLoadUrl.append(downUrl.strip())               dl = dl.replace(downUrl,'')       return downLoadUrl

3. 從下載頁面解析出真正的下載鏈接

def getDownlaodLink(starturl):       '''''       find out the real download link from download page.       eg. we can get the download link 'http://180j-d.ysts8.com:8000/人物紀(jì)實(shí)/童年/001.mp3?\       1251746750178x1356330062x1251747362932-3492f04cf54428055a110a176297d95a' from \       'http://www.5tps.com/down/8297_52_1_1.html'       '''      downUrl = []       gbk_ClickWord = '點(diǎn)此下載'      downloadUrl = parseUrl(starturl)       rDownUrl = re.compile('<a href=\"(.*)\"><font color=\"blue\">'+gbk_ClickWord+'.*</a>') #find the real download link       for url in downloadUrl:           realurl = baseurl+url           print realurl           for line in urllib2.urlopen(realurl).readlines():               m = rDownUrl.match(line)               if m:                   downUrl.append(m.group(1))            return downUrl

4. 定義下載函數(shù)

def download(url,filename):       ''''' download mp3 file '''      print url       urllib.urlretrieve(url, filename)

5. 創(chuàng)建用于下載文件的線程類

class DownloadThread(threading.Thread):       ''''' dowanload thread class '''      def __init__(self,func,savePath):           threading.Thread.__init__(self)           self.function = func           self.savePath = savePath              def run(self):           download(self.function,self.savePath)

6. 開始下載

if __name__ == '__main__':       starturl = 'http://www.5tps.com/html/8297.html'      downUrl = getDownlaodLink(starturl)       aliveThreadDict = {}        # alive thread       downloadingUrlDict = {}     # downloading link          i = 0;       while i < len(downUrl):           ''''' Note:我聽評(píng)說網(wǎng) 只允許同時(shí)有三個(gè)線程下載同一部小說，但是有時(shí)受網(wǎng)絡(luò)等影響，\                           為確保下載的是真實(shí)的mp3，這里將線程數(shù)設(shè)為2 '''          while len(downloadingUrlDict)< 2 :               downloadingUrlDict[i]=i               i += 1          for urlIndex in downloadingUrlDict.values():               #argsTuple = (downUrl[urlIndex],save2path+str(urlIndex+1)+'.mp3')               if urlIndex not in aliveThreadDict.values():                   t = DownloadThread(downUrl[urlIndex],save2path+str(urlIndex+1)+'.mp3')                   t.start()                   aliveThreadDict[t]=urlIndex           for (th,urlIndex) in aliveThreadDict.items():               if th.isAlive() is not True:                   del aliveThreadDict[th] # delete the thread slot                   del downloadingUrlDict[urlIndex] # delete the url from url list needed to download               print 'Completed Download Work'

這樣就可以了，讓他盡情的下吧，咱還得碼其他的項(xiàng)目去，哎 >>>

Python多線程如何下載有聲小說

關(guān)于Python多線程如何下載有聲小說就分享到這里啦，希望上述內(nèi)容能夠讓大家有所提升。如果想要學(xué)習(xí)更多知識(shí)，請(qǐng)大家多多留意小編的更新。謝謝大家關(guān)注一下億速云網(wǎng)站！

向AI問一下細(xì)節(jié)

Python多線程如何下載有聲小說

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽