Python爬蟲遇到驗證碼的處理方式有哪些

發(fā)布時間：2021-10-25 17:07:19 來源：億速云閱讀：180 作者：iii 欄目：編程語言

這篇文章主要介紹“Python爬蟲遇到驗證碼的處理方式有哪些”，在日常操作中，相信很多人在Python爬蟲遇到驗證碼的處理方式有哪些問題上存在疑惑，小編查閱了各式資料，整理出簡單好用的操作方法，希望對大家解答”Python爬蟲遇到驗證碼的處理方式有哪些”的疑惑有所幫助！接下來，請跟著小編一起來學習吧！

封裝源碼：

學會調(diào)用百度的aip接口：

1. 首先需要注冊一個賬號：
https://login.bce.baidu.com/

注冊完成之后登入

2. 創(chuàng)建項目

在這些技術里面找到文字識別，然后點擊創(chuàng)建一下項目

創(chuàng)建完成之后：

圖片中 AppID , API key, Secret Key 這些待會是需要用的。

下一步可以查看官網(wǎng)文檔，或者直接使用我寫的代碼

3. 安裝一下依賴庫 pip install baidu-aip

這只是一個接口，需要前面的一些設置。

def return_ocr_by_baidu(self, test_image):         """         ps: 先在__init__  函數(shù)中完成你自己的baidu_aip 的一些參數(shù)設置          這次測試使用 高精度版本測試                     如果速度很慢 可以換回一般版本                     self.client.basicGeneral(image, options)                     相關參考網(wǎng)址:                     https://cloud.baidu.com/doc/OCR/s/3k3h7yeqa         :param test_image: 待測試的文件名稱         :return:  返回這個驗證碼的識別效果 如果錯誤  可以多次調(diào)用         """         image = self.return_image_content(test_image=self.return_path(test_image))          # 調(diào)用通用文字識別（高精度版）         # self.client.basicAccurate(image)          # 如果有可選參數(shù) 相關參數(shù)可以在上面的網(wǎng)址里面找到         options = {}         options["detect_direction"] = "true"         options["probability"] = "true"          # 調(diào)用         result = self.client.basicAccurate(image, options)         result_s = result['words_result'][0]['words']         # 不打印關閉         print(result_s)         if result_s:             return result_s.strip()         else:             raise Exception("The result is None , try it !")

擴展百度的色情識別接口：

我們寫代碼肯定是要找點樂子的，不可能這么枯燥無味吧?

色情識別接口在內(nèi)容審核中，找一下就可以了。

調(diào)用方式源碼：

# -*- coding :  utf-8 -*- # @Time      :  2020/10/22  17:30 # @author    :  沙漏在下雨 # @Software  :  PyCharm # @CSDN      :  https://me.csdn.net/qq_45906219  from aip import AipContentCensor from ocr import MyOrc   class Auditing(MyOrc):     """     這是一個調(diào)用百度內(nèi)容審核的aip接口     主要用來審核一些色情 反恐 惡心 之類的東西     網(wǎng)址:  https://ai.baidu.com/ai-doc/ANTIPORN/tk3h7xgkn     """      def __init__(self):         # super().__init__()         APP_ID = '填寫你的ID'         API_KEY = '填寫你的KEY'         SECRET_KEY = '填寫你的SECRET_KEY'          self.client = AipContentCensor(APP_ID, API_KEY, SECRET_KEY)      def return_path(self, test_image):         return super().return_path(test_image)      def return_image_content(self, test_image):         return super().return_image_content(test_image)      def return_Content_by_baidu_of_image(self, test_image, mode=0):         """         繼承ocr中的一些方法， 因為都是放一起的 少些一點代碼         內(nèi)容審核: 關于圖片中是否存在一些非法不良信息         內(nèi)容審核還可以實現(xiàn)文本審核 我覺得有點雞肋  就沒一起封裝進去         url: https://ai.baidu.com/ai-doc/ANTIPORN/Wk3h7xg56         :param test_image: 待測試的圖片 可以本地文件 也可以網(wǎng)址         :param mode:  默認 = 0 表示 識別的本地文件   mode = 1 表示識別的圖片網(wǎng)址連接         :return: 返回識別結果         """         if mode == 0:             filepath = self.return_image_content(self.return_path(test_image=test_image))         elif mode == 1:             filepath = test_image         else:             raise Exception("The mode is 0 or 1 but your mode is ", mode)         # 調(diào)用色情識別接口         result = self.client.imageCensorUserDefined(filepath)          # """ 如果圖片是url調(diào)用如下 """         # result = self.client.imageCensorUserDefined('http://www.example.com/image.jpg')         print(result)         return result   a = Auditing() a.return_Content_by_baidu_of_image("test_image/2.jpg", mode=0)

學會muggle_ocr 識別接口：

這個包是最近火起來的，使用起來很簡單，沒多少其他函數(shù)

安裝 pip install muggle-ocr 這個下載有點慢最好使用手機熱點目前鏡像網(wǎng)站(清華/阿里) 還沒有更新到這個包因為這個包是最新的一個ocr模型 12
調(diào)用接口

def return_ocr_by_muggle(self, test_image, mode=1):        """            調(diào)用這個函數(shù)使用 muggle_ocr 來進行識別            :param  test_image  待測試的文件名稱 最好絕對路徑            :param  模型 mode = 0  即 ModelType.OCR 表示識別普通印刷文本                  當 mode = 1 默認  即 ModelType.Captcha 表示識別4-6位簡單英輸驗證碼             官方網(wǎng)站: https://pypi.org/project/muggle-ocr/            :return: 返回這個驗證碼的識別結果 如果錯誤 可以多次調(diào)用        """        # 確定識別物品        if mode == 1:            sdk = muggle_ocr.SDK(model_type=muggle_ocr.ModelType.Captcha)        elif mode == 0:            sdk = muggle_ocr.SDK(model_type=muggle_ocr.ModelType.OCR)        else:            raise Exception("The mode is 0 or 1 , but your mode  == ", mode)         filepath = self.return_path(test_image=test_image)         with open(filepath, 'rb') as fr:            captcha_bytes = fr.read()            result = sdk.predict(image_bytes=captcha_bytes)            # 不打印關閉            print(result)            return result.strip()

封裝源碼：

# -*- coding :  utf-8 -*- # @Time      :  2020/10/22  14:12 # @author    :  沙漏在下雨 # @Software  :  PyCharm # @CSDN      :  https://me.csdn.net/qq_45906219  import muggle_ocr import os from aip import AipOcr  """     PS: 這個作用主要是作了一個封裝 把2個常用的圖片/驗證碼識別方式合在一起 怎么用 取決于自己          接口1: muggle_ocr            pip install muggle-ocr 這個下載有點慢 最好使用手機熱點           目前鏡像網(wǎng)站(清華/阿里)  還沒有更新到這個包 因為這個包是最新的一個ocr模型                接口2: baidu-aip           pip install baidu-aip           這個知道的人應該很多很多， 但是我覺得還是muggle 這個新包猛的一比           調(diào)用方式 可以參考官網(wǎng)文檔: https://cloud.baidu.com/doc/OCR/index.html           或者使用我如下的方式  都是ok的     :param image_path  待識別的圖片路徑  如果目錄很深 推薦使用絕對路徑      """   class MyOrc:     def __init__(self):         # 設置一些必要信息 使用自己百度aip的內(nèi)容         APP_ID = '你的ID'         API_KEY = '你的KEY'         SECRET_KEY = '你的SECRET_KEY'          self.client = AipOcr(APP_ID, API_KEY, SECRET_KEY)      def return_path(self, test_image):          """:return abs image_path"""         # 確定路徑         if os.path.isabs(test_image):             filepath = test_image         else:             filepath = os.path.abspath(test_image)         return filepath      def return_image_content(self, test_image):         """:return the image content """         with open(test_image, 'rb') as fr:             return fr.read()      def return_ocr_by_baidu(self, test_image):         """         ps: 先在__init__  函數(shù)中完成你自己的baidu_aip 的一些參數(shù)設置          這次測試使用 高精度版本測試                     如果速度很慢 可以換回一般版本                     self.client.basicGeneral(image, options)                     相關參考網(wǎng)址:                     https://cloud.baidu.com/doc/OCR/s/3k3h7yeqa         :param test_image: 待測試的文件名稱         :return:  返回這個驗證碼的識別效果 如果錯誤  可以多次調(diào)用         """         image = self.return_image_content(test_image=self.return_path(test_image))          # 調(diào)用通用文字識別（高精度版）         # self.client.basicAccurate(image)          # 如果有可選參數(shù) 相關參數(shù)可以在上面的網(wǎng)址里面找到         options = {}         options["detect_direction"] = "true"         options["probability"] = "true"          # 調(diào)用         result = self.client.basicAccurate(image, options)         result_s = result['words_result'][0]['words']         # 不打印關閉         print(result_s)         if result_s:             return result_s.strip()         else:             raise Exception("The result is None , try it !")      def return_ocr_by_muggle(self, test_image, mode=1):         """             調(diào)用這個函數(shù)使用 muggle_ocr 來進行識別             :param  test_image  待測試的文件名稱 最好絕對路徑             :param  模型 mode = 0  即 ModelType.OCR 表示識別普通印刷文本                   當 mode = 1 默認  即 ModelType.Captcha 表示識別4-6位簡單英輸驗證碼              官方網(wǎng)站: https://pypi.org/project/muggle-ocr/             :return: 返回這個驗證碼的識別結果 如果錯誤 可以多次調(diào)用         """         # 確定識別物品         if mode == 1:             sdk = muggle_ocr.SDK(model_type=muggle_ocr.ModelType.Captcha)         elif mode == 0:             sdk = muggle_ocr.SDK(model_type=muggle_ocr.ModelType.OCR)         else:             raise Exception("The mode is 0 or 1 , but your mode  == ", mode)          filepath = self.return_path(test_image=test_image)          with open(filepath, 'rb') as fr:             captcha_bytes = fr.read()             result = sdk.predict(image_bytes=captcha_bytes)             # 不打印關閉             print(result)             return result.strip()   # a = MyOrc()  # a.return_ocr_by_baidu(test_image='test_image/digit_img_1.png')

到此，關于“Python爬蟲遇到驗證碼的處理方式有哪些”的學習就結束了，希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習，快去試試吧！若想繼續(xù)學習更多相關知識，請繼續(xù)關注億速云網(wǎng)站，小編會繼續(xù)努力為大家?guī)砀鄬嵱玫奈恼拢?/p>

向AI問一下細節(jié)

Python爬蟲遇到驗證碼的處理方式有哪些

猜你喜歡

最新資訊

相關推薦

相關標簽