python如何獲取整個(gè)網(wǎng)頁(yè)源碼

發(fā)布時(shí)間：2020-08-03 09:40:40 來(lái)源：億速云閱讀：266 作者：小豬欄目：開發(fā)技術(shù)

小編這次要給大家分享的是python如何獲取整個(gè)網(wǎng)頁(yè)源碼，文章內(nèi)容豐富，感興趣的小伙伴可以來(lái)了解一下，希望大家閱讀完這篇文章之后能夠有所收獲。

1、Python中獲取整個(gè)頁(yè)面的代碼：

import requests
res = requests.get('https://blog.csdn.net/yirexiao/article/details/79092355')
res.encoding = 'utf-8'
print(res.text)

2、運(yùn)行結(jié)果

實(shí)例擴(kuò)展：

from bs4 import BeautifulSoup
import time,re,urllib2
t=time.time()
websiteurls={}
def scanpage(url):
 websiteurl=url
 t=time.time()
 n=0
 html=urllib2.urlopen(websiteurl).read()
 soup=BeautifulSoup(html)
 pageurls=[]
 Upageurls={}
 pageurls=soup.find_all("a",href=True)
 for links in pageurls:
  if websiteurl in links.get("href") and links.get("href") not in Upageurls and links.get("href") not in websiteurls:
   Upageurls[links.get("href")]=0
 for links in Upageurls.keys():
  try:
   urllib2.urlopen(links).getcode()
  except:
   print "connect failed"
  else:
   t2=time.time()
   Upageurls[links]=urllib2.urlopen(links).getcode()
   print n,
   print links,
   print Upageurls[links]
   t1=time.time()
   print t1-t2
  n+=1
 print ("total is "+repr(n)+" links")
 print time.time()-t
scanpage(http://news.163.com/)

看完這篇關(guān)于python如何獲取整個(gè)網(wǎng)頁(yè)源碼的文章，如果覺得文章內(nèi)容寫得不錯(cuò)的話，可以把它分享出去給更多人看到。

向AI問(wèn)一下細(xì)節(jié)

python如何獲取整個(gè)網(wǎng)頁(yè)源碼

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽