Python中Selenium如何設(shè)置元素等待

發(fā)布時間：2021-05-10 13:45:41 來源：億速云閱讀：176 作者：小新欄目：開發(fā)技術(shù)

這篇文章將為大家詳細講解有關(guān)Python中Selenium如何設(shè)置元素等待，小編覺得挺實用的，因此分享給大家做個參考，希望大家閱讀完這篇文章后可以有所收獲。

Python的優(yōu)點有哪些

1、簡單易用，與C/C++、Java、C# 等傳統(tǒng)語言相比，Python對代碼格式的要求沒有那么嚴(yán)格；2、Python屬于開源的，所有人都可以看到源代碼，并且可以被移植在許多平臺上使用；3、Python面向?qū)ο螅軌蛑С置嫦蜻^程編程,也支持面向?qū)ο缶幊蹋?、Python是一種解釋性語言，Python寫的程序不需要編譯成二進制代碼，可以直接從源代碼運行程序；5、Python功能強大，擁有的模塊眾多，基本能夠?qū)崿F(xiàn)所有的常見功能。

Selenium 設(shè)置元素等待的三種方式

1. sleep 強制等待
2. implicitly_wait() 隱性等待
3. WebDriverWait（）顯示等待

三種方式的優(yōu)缺點

1. sleep 強制等待

from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
sleep(2)    #設(shè)置等待2秒鐘
driver.get('http://www.baidu.com')

優(yōu)點：
代碼簡介，簡單明了

缺點：
如果設(shè)置sleep等待時間過短，元素還沒加載出來，程序報錯，sleep設(shè)置等待時間過長，元素早就加載出來了，程序還在等待，浪費是時間，影響代碼整體的運行效率

個人看法：
簡單粗暴，根據(jù)網(wǎng)站的響應(yīng)速度和自己的網(wǎng)速來設(shè)置合理的休眠時間

2. implicitly_wait() 隱性等待

from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
driver.implicitly_wait(20) #設(shè)置等待20秒鐘
driver.get('http://www.baidu.com')

優(yōu)點：
   1.代碼簡介
   2.在代碼前部分加implicitly_wait(10) ，整個的程序運行過程中都會有效（作用于全局，直接在初始化driver的后面加，后面的代碼都會受影響），都會等待元素加載完成
   3.在設(shè)置的時間內(nèi)沒有加載到整個頁面，則會報NosuchElementError。如果元素在第10s被加載出來，自動執(zhí)行下面的腳本，不會一直等待10s

缺點：
1. 非要加載到整個頁面才執(zhí)行代碼，這樣影響代碼的執(zhí)行效率，一般情況下，我們想要的結(jié)果是只需加載到了我要定位的元素就執(zhí)行代碼，不需要等待整個頁面的完全加載出來再執(zhí)行代碼。

個人看法：
1.不適合用在數(shù)據(jù)在ajax的網(wǎng)站中，比如翻頁什么的，某個元素一直存在，但是數(shù)據(jù)一直在變，這樣的話只要加載出來第一頁，后面翻頁的數(shù)據(jù)全部會和第一頁的數(shù)據(jù)相同，因為代碼判斷了這個元素已經(jīng)被加載出來了，不會等ajax去加載

3. WebDriverWait（）顯示等待

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait    #WebDriverWait注意大小寫
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get('http://www.baidu.com')
try:
  element = 
  WebDriverWait(driver,10).until(EC.presence_of_element_located((By.ID,'kw')))
  element.send_keys('123')
  driver.find_element_by_id('su').click()
except Exception as message:
  print('元素定位報錯%s'%message)
finally:
  pass

優(yōu)點：
代碼執(zhí)行效率快。無需等待整個頁面加載完成，只需加載到你要定位的元素就可以執(zhí)行代碼。是最智能的設(shè)置元素等待的方式。

缺點：
1.要導(dǎo)入from selenium.webdriver.support import expected_conditions as EC

 from selenium.webdriver.support.ui import WebDriverWait
 from selenium.webdriver.common.by import By

必須要導(dǎo)入以上3個包，導(dǎo)包路徑相當(dāng)?shù)膹?fù)雜，啰嗦而且麻煩
2.寫等待時間的代碼也是復(fù)雜。步驟稍微有點多。

element=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.ID,‘kw')))
element.send_keys(‘123')

個人看法：相比于兩種，這種方式可以算的上好的了，但是就是麻煩，寫的代碼太多，使用的話可以和第一種方式sleep混合使用，不過我還是喜歡用sleep，本身使用selenium就是沒辦法破開網(wǎng)站，或者使用selenium比直接破解的方式更好才使用這種，我個人是能不用就不用，抓取速度太慢了。

附上我抓取一個網(wǎng)站的代碼，這網(wǎng)站作者的成果抓不到，只好用這種方式來抓了：

from selenium import webdriver
import time
from lxml.html import etree
import copy
import json
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
 
def getAuthors():
  j1 = set()
  f = open('Author.json', 'r', encoding='utf-8')
  data = f.read()
  data_list = data.split('\n')
  for dt in data_list:
    j1.add(dt)
  f.close()
  print('j1= ', len(j1))
  j2 = set()
  f1 = open('yzq.json', 'r', encoding='utf-8')
  data1 = f1.read()
  data_list1 = data1.split('\n')
  for dt in data_list1:
    j2.add(dt)
  print('j2= ', len(j2))
  countSet = j1 - j2
  print('countset= ', len(countSet))
  AuthorsData = []
  for dt in countSet:
    dt_json = json.loads(dt)
    if int(dt_json["成果"]) > 0:
      AuthorsData.append(dt_json)
  # dt = {'img': 'https://cache.yisu.com/upload/information/20200622/113/8258.jpg', 'name': '吳偉',
  #    'url': 'https://www.scholarmate.com/P/aeiUZr', 'org': '復(fù)旦大學(xué), 教授', '項目': 20, '成果': 234, 'H指數(shù)': '24'}
  print('AuthorData= ', len(AuthorsData))
  return AuthorsData
 
def parseHtml(html, i):
  temp_list = []
  html_data = etree.HTML(html)
  project_html = html_data.xpath('//div[@class="pub-idx__main"]')
  for p in project_html:
    # pro_name = p.xpath('./div[@class="pub-idx__main_title"]/a/@title')[0]
    pro_name = p.xpath('.//a/@title')[0].strip().replace(r'\xa0', '')
    # pro_url = p.xpath('./div[@class="pub-idx__main_title"]/a/@href')[0]
    pro_url = p.xpath('.//a/@href')[0]
    pro_author = p.xpath('./div[2]/@title')[0].strip().replace('\xa0', '')
    # pro_author = p.xpath('.//div[@class="pub-idx__main_author"]/@title')
    pro_inst = p.xpath('./div[3]/@title')[0]
    temp_dict = {
      'num': i,
      'pro_name': pro_name,
      'pro_url': pro_url,
      'pro_author': pro_author,
      'pro_inst': pro_inst
    }
    temp_list.append(copy.deepcopy(temp_dict))
  return temp_list 
 
def parseData(author_data):
  try:
    url = author_data['url']
    ach_num = int(author_data['成果'])
    pages = ach_num // 10
    pages_ys = ach_num % 10
    if pages_ys > 0:
      pages += 1
    driver = webdriver.Chrome()
    # driver.implicitly_wait(10)
    driver.get(url)
    psn_data = []
    for i in range(1, pages+1):
      if i == 1:
        # 防止抓取到半路的時候頁面沒有響應(yīng)，這部分?jǐn)?shù)據(jù)就直接扔掉
        try:
          # time.sleep(2)
          driver.find_element_by_xpath('//*[@id="pubTab"]').click()
          # time.sleep(3)
          # 有以下這些選擇
          # WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.ID, 'pub-idx__main')))
          # WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CLASS_NAME, 'pub-idx__main')))
          # WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, './/pub-idx__main')))
          # 這個也不適合這個網(wǎng)站，還是會抓到重復(fù)的
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH, '//div[@class="pub-idx__main"]')))
          html = driver.page_source
          temp_dict = parseHtml(html, i)
          psn_data.append(copy.deepcopy(temp_dict))
        except:
          import traceback
          print(traceback.print_exc())
          pass
      else:
        # driver.find_element_by_xpath('//*[@id="pubTab"]').click()
        # 將頁面拉到底部
        try:
          js = "var q=document.documentElement.scrollTop=100000"
          driver.execute_script(js)
          # time.sleep(1)
          driver.find_element_by_xpath('//div[@class="pagination__pages_next"]').click()
          # time.sleep(2)
          WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH, '//div[@class="pub-idx__main"]')))
          html = driver.page_source
          temp_dict = parseHtml(html, i)
          psn_data.append(copy.deepcopy(temp_dict))
        except:
          pass
    driver.close()
    psn_data = {
      'init_data': author_data,
      'psn_data': psn_data
    }
    print(psn_data)
    psn_data_string = json.dumps(psn_data, ensure_ascii=False)
    with open('data.json', 'a+', encoding='utf-8') as f:
      f.write('{}\n'.format(psn_data_string))
 
    author_data_string = json.dumps(author_data, ensure_ascii=False)
    with open('yzq.json', 'a+', encoding='utf-8') as f:
      f.write('{}\n'.format(author_data_string))
 
  except:
    pass
    # import traceback
    # print(traceback.print_exc())
    # au_strign = json.dumps(author_data, ensure_ascii=False)
    # author_data_string = json.dumps(au_strign, ensure_ascii=False)
    # with open('error.json', 'a+', encoding='utf-8') as f:
    #   f.write('{}\n'.format(author_data_string))
 
def main():
  # authors的值：給出三條
  # {"img": "https://cache.yisu.com/upload/information/20200622/113/8259.png?A=DMkT", "name": "胡婷",
  # "url": "https://www.scholarmate.com/P/QFFbae", "org": "四川大學(xué), 主治醫(yī)師", "項目": "0", "成果": "11", "H指數(shù)": "0"}
  # {"img": "https://cache.yisu.com/upload/information/20200622/113/8260.png?A=DVUy", "name": "白曉涓",
  # "url": "https://www.scholarmate.com/P/73me22", "org": "", "項目": "6", "成果": "8", "H指數(shù)": "0"}
  # {"img": "https://cache.yisu.com/upload/information/20200622/113/8261.png?A=D65r", "name": "原鵬飛",
  # "url": "https://www.scholarmate.com/P/77nIFr", "org": "國家統(tǒng)計局統(tǒng)計科學(xué)研究所, 副研究員", "項目": "0", "成果": "90", "H指數(shù)": "0"}
 
  AuthorsData = getAuthors()
  for authors in AuthorsData:
    print('author= ', authors)
    parseData(authors)
 
if __name__ == '__main__':
  main()

關(guān)于“Python中Selenium如何設(shè)置元素等待”這篇文章就分享到這里了，希望以上內(nèi)容可以對大家有一定的幫助，使各位可以學(xué)到更多知識，如果覺得文章不錯，請把它分享出去讓更多的人看到。

向AI問一下細節(jié)

Python中Selenium如何設(shè)置元素等待

Python的優(yōu)點有哪些

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽