<button id="dkssm"><button id="dkssm"></button></button>

<button id="dkssm"></button>

溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊(cè)×

獲取短信驗(yàn)證碼

其他方式登錄

點(diǎn)擊登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請(qǐng)使用微信掃描上方二維碼

使用幫助

請(qǐng)求超時(shí)！

請(qǐng)點(diǎn)擊重新獲取二維碼

Python 爬蟲之?dāng)?shù)據(jù)解析模塊bs4基礎(chǔ)

發(fā)布時(shí)間：2020-08-02 16:58:49 來源：網(wǎng)絡(luò) 閱讀：443 作者：insist_way 欄目：編程語言

介紹：

最近在學(xué)Python爬蟲，在這里對(duì)數(shù)據(jù)解析模塊bs4做個(gè)學(xué)習(xí)筆記。

用途：

bs4用于解析xml文檔，而html只是xml的一種

bs4 官方文檔地址：

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

學(xué)習(xí)筆記：

from bs4 import BeautifulSoup

html_doc = """

<html><head><title>The Dormouse's story</title></head>

<body>

<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were

<a class=... ... ... ... ... ... "sister" id="link1">Elsie</a>,

<a class="sister" id="link2">Lacie</a> and

<a class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.</p>

<p class="story">...</p>

"""

soup = BeautifulSoup(html_doc,'html.parser')? ? #創(chuàng)建一個(gè)BeautifulSoup對(duì)象，添加html文件解析器，在不同平臺(tái)可能不同，在Linux上就不需要

print(soup.prettify())? ? #美化輸出

print(soup.get_text())? ??#將html_doc變量中保存的全部?jī)?nèi)容輸出(Linux系統(tǒng)會(huì)以\n隔開)

print('')

print(type(soup.title))

print(dir(soup.title))

print(soup.title)? ? #獲取html標(biāo)題

????<title>The Dormouse's story</title>

print(soup.title.text)? ? #獲取html標(biāo)題內(nèi)容

????"The Dormouse's story"

print(soup.a)? ? ? ?#獲取a標(biāo)簽(第一個(gè))

????<a class="sister" id="link1">Elsie</a>

print(soup.a.attrs)? ?#獲取第一個(gè)a標(biāo)簽的所有屬性，組成一個(gè)字典

????{'href': 'http://example.com/elsie', 'class': ['sister'], 'id': 'link1'}

print(soup.a.attrs['href'])? ? #獲取第一個(gè)a標(biāo)簽的href屬性

????'http://example.com/elsie'

print(soup.a.has_attr('class'))? ? ?#判斷class屬性是否存在

????True

print(soup.p)? ? #獲取p標(biāo)簽(第一個(gè))

????<p class="title"><b>The Dormouse's story</b></p>

print(soup.p.children)? ? #獲取第一個(gè)p標(biāo)簽下的所有子節(jié)點(diǎn)

????<list_iterator object at 0x7fe8185261d0>

print(list(soup.p.children))

????[<b>The Dormouse's story</b>]

print(list(soup.p.children)[0])

????<b>The Dormouse's story</b>

print(list(soup.p.children)[0].text)

????"The Dormouse's story"

print(soup.find_all('a'))? ? #獲取所有的a標(biāo)簽

????[<a class="sister" id="link1">Elsie</a>, <a class="sister" id=a class="sister" id="link3">Tillie</a>]

for a in soup.find_all('a'):? ?#遍歷所有的a標(biāo)簽

? ? print(a.attrs['href'])

print(soup.find(id='link3'))? ? #獲取id=link3的標(biāo)簽

????<a class="sister" id="link3">Tillie</a>

print('#'*150)

#支持CSS選擇器

#查找類名為story的節(jié)點(diǎn)

print(soup.select('.story'))

print('')

print(soup.select('.story a'))

print('')

#查找id=link1的節(jié)點(diǎn)

print(soup.select('#link1'))

向AI問一下細(xì)節(jié)

推薦閱讀：

免責(zé)聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng)，如果涉及侵權(quán)請(qǐng)聯(lián)系站長郵箱：is@yisu.com進(jìn)行舉報(bào)，并提供相關(guān)證據(jù)，一經(jīng)查實(shí)，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
“說反話”實(shí)現(xiàn)字符串反轉(zhuǎn)
下一篇新聞：
hadoop單節(jié)點(diǎn)搭建

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動(dòng)

幫助支持

關(guān)于我們

售后咨詢

7*24小時(shí)在線電話：400-100-2938

7*24小時(shí)在線 QQ：800811969

關(guān)注億速云

億速云公眾號(hào)

手機(jī)網(wǎng)站二維碼

<td id="uvoqr"><span id="uvoqr"></span></td>