溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務(wù)條款》

Python爬取嗶哩嗶哩視頻的示例

發(fā)布時間:2020-12-10 13:46:04 來源:億速云 閱讀:213 作者:小新 欄目:編程語言

小編給大家分享一下Python爬取嗶哩嗶哩視頻的示例,相信大部分人都還不怎么了解,因此分享這篇文章給大家參考一下,希望大家閱讀完這篇文章后大有收獲,下面讓我們一起去了解一下吧!

一、環(huán)境準備

我這里使用的是環(huán)境如下僅供參考:
開發(fā)工具: pycharm
python環(huán)境:python-3.8.0
依賴的包: shutil,os,re,json,choice,requests,lxml

二、頁面分析

我在這里就拿前段時間非?;鸬鸟R老師的視頻來舉例子吧。
視頻鏈接: https://www.bilibili.com/video/BV1Ef4y1i78b?from=search&seid=12072538764197074893

  1. 視頻鏈接解析 我們這里只需要 BV1Ef4y1i78b 也就是 video后面? 號前面
  2. 第二部分抓包,嗶哩嗶哩這里的視頻被分成多個小段了經(jīng)過看源碼分析后我們可以解析</script><script>中的內(nèi)容返回一個json串解析獲取我們想要的數(shù)據(jù)即可。. Python爬取嗶哩嗶哩視頻的示例
  3. 分析返回json中的具體內(nèi)容

返回給我們的們?nèi)缦?真正對我們有用的信息在data中
Python爬取嗶哩嗶哩視頻的示例
在data 下面我們就可以清晰的看到我們想要的內(nèi)容了,如視頻的畫質(zhì),以及視頻的地址等,注意:如果你拿到地址直接進行訪問的話是訪問不到了,嗶哩嗶哩中添加了Referer如果你直接在瀏覽器訪問是沒有Referer的是找不到頁面的。
我們需要解析的內(nèi)容如下:

  1. 視頻的時長
  2. 視頻的質(zhì)量
  3. 視頻的URL
  4. 音頻的URL
  5. 音頻和視頻合并
    Python爬取嗶哩嗶哩視頻的示例

三、代碼實操

3.1 準備工作

依賴的包

import jsonimport osimport reimport shutilimport sslimport timeimport requestsfrom concurrent.futures import ThreadPoolExecutorfrom random import choicefrom lxml import etree

添加請求頭和隨機用戶代理

#設(shè)置請求頭等參數(shù),防止被反爬headers = {
   'Accept': '*/*',
   'Accept-Language': 'en-US,en;q=0.5',
   'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36'}def get_user_agent():
   '''獲取隨機用戶代理'''
   user_agents = [
       "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
       "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
       "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
       "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
       "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",
       "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",
       "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",
       "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",
       "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",
       "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",
       "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",
       "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",
       "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",
       "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
       "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
       "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
       "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
       "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
       "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
       "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)",
       "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",
       "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
       "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)",
       "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
       "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
       "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
       "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
       "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
       "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
       "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",
       "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",
       "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",
       "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
       "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
       "MQQBrowser/26 Mozilla/5.0 (Linux; U; Android 2.3.7; zh-cn; MB200 Build/GRJ22; CyanogenMod-7) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1",
       "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1",
       "Mozilla/5.0 (Linux; Android 5.1.1; Nexus 6 Build/LYZ28E) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.23 Mobile Safari/537.36",
       "Mozilla/5.0 (iPod; U; CPU iPhone OS 2_1 like Mac OS X; ja-jp) AppleWebKit/525.18.1 (KHTML, like Gecko) Version/3.1.1 Mobile/5F137 Safari/525.20",
       "Mozilla/5.0 (Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko) Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
       "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
   ]
   # 在user_agent列表中隨機產(chǎn)生一個代理,作為模擬的瀏覽器
   user_agent = choice(user_agents)
   return user_agent

3.2 編寫下載代碼

def single_download(aid, acc_quality):
    '''單個視頻實現(xiàn)下載'''
    # 請求視頻鏈接,獲取信息
    origin_video_url = 'https://www.bilibili.com/video/' + aid
    res = requests.get(origin_video_url, headers=headers)
    html = etree.HTML(res.text)
    title = html.xpath('//*[@id="viewbox_report"]/h2/span/text()')[0]
    print('您當前正在下載:', title)

    video_info_temp = re_video_info(res.text, '__playinfo__=(.*?)</script><script>')
    video_info = {}
    # 獲取視頻質(zhì)量
    quality = video_info_temp['data']['accept_description'][acc_quality]
    # 獲取視頻時長
    video_info['duration'] = video_info_temp['data']['dash']['duration']
    # 獲取視頻鏈接
    video_url = video_info_temp['data']['dash']['video'][acc_quality]['baseUrl']
    # 獲取音頻鏈接
    audio_url = video_info_temp['data']['dash']['audio'][acc_quality]['baseUrl']
    # 計算視頻時長
    video_time = int(video_info.get('duration', 0))
    video_minute = video_time // 60
    video_second = video_time % 60
    print('當前視頻清晰度為{},時長{}分{}秒'.format(quality, video_minute, video_second))
    # 調(diào)用函數(shù)下載保存視頻
    download_video_single(origin_video_url, video_url, audio_url, title)

3.3 編寫下載代碼

def download_video_single(referer_url, video_url, audio_url, video_name):
    '''單個視頻下載'''
    # 更新請求頭
    headers.update({"Referer": referer_url})
    print("視頻下載開始:%s" % video_name)
    # 下載并保存視頻
    video_content = requests.get(video_url, headers=headers)
    print('%s\t視頻大小:' % video_name, round(int(video_content.headers.get('content-length', 0)) / 1024 / 1024, 2), '\tMB')

    received_video = 0
    with open('%s_video.mp4' % video_name, 'ab') as output:
        headers['Range'] = 'bytes=' + str(received_video) + '-'
        response = requests.get(video_url, headers=headers)
        output.write(response.content)
    # 下載并保存音頻
    audio_content = requests.get(audio_url, headers=headers)
    print('%s\t音頻大?。?#39; % video_name, round(int(audio_content.headers.get('content-length', 0)) / 1024 / 1024, 2), '\tMB')
    received_audio = 0
    with open('%s_audio.mp4' % video_name, 'ab') as output:
        headers['Range'] = 'bytes=' + str(received_audio) + '-'
        response = requests.get(audio_url, headers=headers)
        output.write(response.content)
        received_audio += len(response.content)
    print("視頻下載結(jié)束:%s" % video_name)
    video_audio_merge_single(video_name)

3.4 將下載好的音頻和視頻合并

def video_audio_merge_single(video_name):
    '''使用ffmpeg單個視頻音頻合并'''
    print("視頻合成開始:%s" % video_name)
    import subprocess
    command = 'ffmpeg -i %s_video.mp4 -i %s_audio.mp4 -c copy %s.mp4 -y -loglevel quiet' % (
        video_name, video_name, video_name)
    subprocess.Popen(command, shell=True)
    print("視頻合成結(jié)束:%s" % video_name)

3.4 運行測試

Python爬取嗶哩嗶哩視頻的示例

以上是“Python爬取嗶哩嗶哩視頻的示例”這篇文章的所有內(nèi)容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內(nèi)容對大家有所幫助,如果還想學(xué)習(xí)更多知識,歡迎關(guān)注億速云行業(yè)資訊頻道!

向AI問一下細節(jié)

免責聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI