您好,登錄后才能下訂單哦!
小編給大家分享一下如何使用python爬取當(dāng)當(dāng)網(wǎng)所有數(shù)據(jù)分析書籍信息,相信大部分人都還不怎么了解,因此分享這篇文章給大家參考一下,希望大家閱讀完這篇文章后大有收獲,下面讓我們一起去了解一下吧!
urls = ['http://search.dangdang.com/?key=%CA%FD%BE%DD%B7%D6%CE%F6&act=input&page_index={}'.format(i) for i in range(1,101)]
html=requests.get(url,headers=headers)
# html.encoding = "utf-8"
# print('第一層調(diào)用是否返回正常:',html)
html.encoding = html.apparent_encoding # 將亂碼進(jìn)行編碼
selector=etree.HTML(html.text)
# print(selector)
datas=selector.xpath('//div[@class="con shoplist"]')
# print(datas)
for data in datas:
Classs = data.xpath('div/ul/li/@class') #line1-line60
IDDs = data.xpath('div/ul/li/@id') #id
titles = data.xpath('div/ul/li/a/@title') #標(biāo)題
prices = data.xpath('div/ul/li/p[3]/span[1]/text()') #書籍價(jià)格
source_prices = data.xpath('div/ul/li/p[3]/span[2]/text()') #書籍原價(jià)
discounts = data.xpath('div/ul/li/p[3]/span[3]/text()') #書籍折扣
# dian_prices = data.xpath('div/ul/li/p[3]/a[2]/i/text()') #電子書價(jià)格
authors = data.xpath('div/ul/li/p[5]/span[1]/a[1]/@title') #作者
publish_times = data.xpath('div/ul/li/p[5]/span[2]/text()') #出版時(shí)間
publishs = data.xpath('div/ul/li/p[5]/span[3]/a/text()') #出版社
comments = data.xpath('div/ul/li/p[4]/a/text()') #書籍評(píng)論量
urls=data.xpath('div/ul/li/a/@href')
db = pymysql.connect(host='localhost', user='root', passwd='庫(kù)密碼', db='庫(kù)名稱:Learn_data', port=3306, charset='utf8')print("數(shù)據(jù)庫(kù)連接")cursor = db.cursor()cursor.execute("DROP TABLE IF EXISTS Learn_data.dangdangweb_info_detail")sql = """CREATE TABLE IF not EXISTS Learn_data.dangdangweb_info_detail ( id int auto_increment primary key, Class CHAR(100), IDD CHAR(100), title CHAR(100), price CHAR(100), source_price CHAR(100), discount CHAR(100), author CHAR(100), publish_time CHAR(100), publish CHAR(100), comment CHAR(100), dian_price CHAR(100))DEFAULT CHARSET=utf8"""cursor.execute(sql)
cursor.execute("insert into dangdangweb_info_detail (Class,IDD,title,price,source_price,discount,author,publish_time,publish,comment,dian_price)" "values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)", (str(Class),str(IDD),str(title),str(price),str(source_price),str(discount),str(author) ,str(publish_time),str(publish),str(comment),str(dian_price[0])))
以上是“如何使用python爬取當(dāng)當(dāng)網(wǎng)所有數(shù)據(jù)分析書籍信息”這篇文章的所有內(nèi)容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內(nèi)容對(duì)大家有所幫助,如果還想學(xué)習(xí)更多知識(shí),歡迎關(guān)注億速云行業(yè)資訊頻道!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。