溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點(diǎn)擊登錄注冊即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點(diǎn)擊重新獲取二維碼

如何使用Python實現(xiàn)爬蟲抓取與讀寫、追加到excel文件操作

發(fā)布時間：2021-04-12 13:53:44 來源：億速云閱讀：202 作者：小新欄目：開發(fā)技術(shù)

這篇文章給大家分享的是有關(guān)如何使用Python實現(xiàn)爬蟲抓取與讀寫、追加到excel文件操作的內(nèi)容。小編覺得挺實用的，因此分享給大家做個參考，一起跟隨小編過來看看吧。

具體如下：

爬取糗事百科熱門

安裝讀寫excel 依賴 pip install xlwt安裝追加excel文件內(nèi)容依賴 pip install xlutils安裝 lxml

Python示例：

import csv
import requests
from lxml import etree
import time
import xlwt
import os
from xlutils.copy import copy
import xlrd
data_infos_list = []
headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 '
         '(KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36'}
# f = open('C:\\Users\\Administrator\\Desktop\\qiubaibook.csv', 'a+', newline='', encoding='utf-8')
# writer = csv.writer(f)
# writer.writerow(('author', 'sex', 'rank', 'content', 'great', 'comment', 'time'))
filename = 'C:\\Users\\Administrator\\Desktop\\qiubaibook.xls'
def get_info(url):
  res = requests.get(url, headers=headers)
  selector = etree.HTML(res.text)
  # print(res.text)
  htmls = selector.xpath('//div[contains(@class,"article block untagged mb15")]')
  # // *[ @ id = "qiushi_tag_120024357"] / a[1] / div / span 內(nèi)容
  # //*[@id="qiushi_tag_120024357"]/div[2]/span[1]/i 好笑
  # //*[@id="c-120024357"]/i 評論
  # //*[@id="qiushi_tag_120024357"]/div[1]/a[2]/h3 作者
  # //*[@id="qiushi_tag_120024357"]/div[1]/div 等級
  # // womenIcon manIcon 性別
  for html in htmls:
    author = html.xpath('div[1]/a[2]/h3/text()')
    if len(author) == 0:
      author = html.xpath('div[1]/span[2]/h3/text()')
    rank = html.xpath('div[1]/div/text()')
    sex = html.xpath('div[1]/div/@class')
    if len(sex) == 0:
      sex = '未知'
    elif 'manIcon' in sex[0]:
      sex = '男'
    elif 'womenIcon' in sex[0]:
      sex = '女'
    if len(rank) == 0:
      rank = '-1'
    contents = html.xpath('a[1]/div/span/text()')
    great = html.xpath('div[2]/span[1]/i/text()') # //*[@id="qiushi_tag_112746244"]/div[3]/span[1]/i
    if len(great) == 0:
      great = html.xpath('div[3]/span[1]/i/text()')
    comment = html.xpath('div[2]/span[2]/a/i/text()') # //*[@id="c-112746244"]/i
    if len(comment) == 0:
      comment = html.xpath('div[3]/span[2]/a/i/text()')
    # classes = html.xpath('a[1]/@class')
    # writer.writerow((author[0].strip(), sex, rank[0].strip(), contents[0].strip(), great[0].strip(),
    #         comment[0].strip(), time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))))
    data_infos = [author[0].strip(), sex, rank[0].strip(), contents[0].strip(), great[0].strip(),
           comment[0].strip(), time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))]
    data_infos_list.append(data_infos)
def write_data(sheet, row):
  for data_infos in data_infos_list:
    j = 0
    for data in data_infos:
      sheet.write(row, j, data)
      j += 1
    row += 1
if __name__ == '__main__':
  urls = ['https://www.qiushibaike.com/8hr/page/{}/'.format(num) for num in range(1, 14)]
  for url in urls:
    print(url)
    get_info(url)
    time.sleep(2)
  # 如果文件存在，則追加。如果文件不存在，則新建
  if os.path.exists(filename):
    # 打開excel
    rb = xlrd.open_workbook(filename, formatting_info=True) # formatting_info=True 保留原有字體顏色等樣式
    # 用 xlrd 提供的方法獲得現(xiàn)在已有的行數(shù)
    rn = rb.sheets()[0].nrows
    # 復(fù)制excel
    wb = copy(rb)
    # 從復(fù)制的excel文件中得到第一個sheet
    sheet = wb.get_sheet(0)
    # 向sheet中寫入文件
    write_data(sheet, rn)
    # 刪除原先的文件
    os.remove(filename)
    # 保存
    wb.save(filename)
  else:
    header = ['author', 'sex', 'rank', 'content', 'great', 'comment', 'time']
    book = xlwt.Workbook(encoding='utf-8')
    sheet = book.add_sheet('糗百')
    # 向 excel 中寫入表頭
    for h in range(len(header)):
      sheet.write(0, h, header[h])
    # 向sheet中寫入內(nèi)容
    write_data(sheet, 1)
    book.save(filename)

感謝各位的閱讀！關(guān)于“如何使用Python實現(xiàn)爬蟲抓取與讀寫、追加到excel文件操作”這篇文章就分享到這里了，希望以上內(nèi)容可以對大家有一定的幫助，讓大家可以學(xué)到更多知識，如果覺得文章不錯，可以把它分享出去讓更多的人看到吧！

向AI問一下細(xì)節(jié)

推薦閱讀：

免責(zé)聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點(diǎn)不代表本網(wǎng)站立場，如果涉及侵權(quán)請聯(lián)系站長郵箱：is@yisu.com進(jìn)行舉報，并提供相關(guān)證據(jù)，一經(jīng)查實，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
如何使用python實現(xiàn)爬取圖書封面
下一篇新聞：
Python如何實現(xiàn)聊天機(jī)器人

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動

幫助支持

關(guān)于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關(guān)注億速云

億速云公眾號

手機(jī)網(wǎng)站二維碼