您好,登錄后才能下訂單哦!
前言
在前程無(wú)憂上投遞簡(jiǎn)歷發(fā)現(xiàn)有競(jìng)爭(zhēng)力分析,免費(fèi)能看到匹配度評(píng)價(jià)和綜合競(jìng)爭(zhēng)力分?jǐn)?shù),可以做投遞參考
計(jì)算方式
綜合競(jìng)爭(zhēng)力得分應(yīng)該越高越好,匹配度評(píng)語(yǔ)也應(yīng)該評(píng)價(jià)越高越好
抓取所有職位關(guān)鍵字搜索結(jié)果并獲取綜合競(jìng)爭(zhēng)力得分和匹配度評(píng)語(yǔ),最后篩選得分評(píng)語(yǔ)自動(dòng)投遞合適的簡(jiǎn)歷
登陸獲取cookie
from selenium import webdriver from selenium.webdriver.chrome.options import Options chrome_options = Options() # chrome_options.add_argument('--headless') from time import sleep import re from lxml import etree import requests import os import json driver = webdriver.Chrome(chrome_options=chrome_options,executable_path = 'D:\python\chromedriver.exe') headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"} driver.get(https://search.51job.com/list/020000,000000,0000,00,9,99,%2520,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=)
webdriver需要在相應(yīng)域名寫(xiě)入cookie,所以轉(zhuǎn)到職位搜索頁(yè)面
def get_cookie(): driver.get("https://login.51job.com/login.php?loginway=1&lang=c&url=") sleep(2) phone=input("輸入手機(jī)號(hào):") driver.find_element_by_id("loginname").send_keys(phone) driver.find_element_by_id("btn7").click() sleep(1) code=input("輸入短信:") driver.find_element_by_id("phonecode").send_keys(code) driver.find_element_by_id("login_btn").click() sleep(2) cookies = driver.get_cookies() with open("cookie.json", "w")as f: f.write(json.dumps(cookies))
檢查cookie文件是否存在,如果不存在執(zhí)行g(shù)et_cookie把cookie寫(xiě)入文件,在登陸的時(shí)候最好不用無(wú)頭模式,偶爾有滑動(dòng)驗(yàn)證碼
前程無(wú)憂手機(jī)短信一天只能發(fā)送三條,保存cookie下次登陸用
def get_job(): driver.get("https://search.51job.com/list/020000,000000,0000,00,9,99,%2520,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=") sleep(2) job=input("輸入職位:") driver.find_element_by_id("kwdselectid").send_keys(job) driver.find_element_by_xpath('//button[@class="p_but"]').click() url=driver.current_url page=driver.page_source return url,page
在職位搜索獲取職位搜索結(jié)果,需要返回頁(yè)面源碼和地址
分析頁(yè)碼結(jié)構(gòu)html前的是頁(yè)碼,全部頁(yè)碼數(shù)量通過(guò)共XX頁(yè)得到
def get_pages(url,page): tree=etree.HTML(page) href=[] x = tree.xpath('//span[@class="td"]/text()')[0] total_page=int(re.findall("(\d+)", x)[0]) for i in range(1,total_page+1): href.append(re.sub("\d.html", f'{i}.html', url)) return href
獲取全部頁(yè)碼
def get_job_code(url): headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"} r=session.get(url,headers=headers) tree=etree.HTML(r.text) divs=tree.xpath('//div[@class="el"]/p/span/a/@href') job=str(divs) job_id=re.findall("\/(\d+).html",job) return job_id
獲取職位id
修改id請(qǐng)求網(wǎng)址到競(jìng)爭(zhēng)力分析頁(yè)面
def get_info(job_id): href=f"https://i.51job.com/userset/bounce_window_redirect.php?jobid={job_id}&redirect_type=2" r=session.get(href,headers=headers) r.encoding=r.apparent_encoding tree=etree.HTML(r.text) pingjia=tree.xpath('//div[@class="warn w1"]//text()')[0].strip() gongsi=[] for i in tree.xpath('//div[@class="lf"]//text()'): if i.strip(): gongsi.append(i.strip()) fenshu=[] for i in tree.xpath('//ul[@class="rt"]//text()'): if i.strip(): fenshu.append(i.strip()) url=f"https://jobs.51job.com/shanghai/{job_id}.html?s=03&t=0" return {"公司":gongsi[1],"職位":gongsi[0],"匹配度":pingjia,fenshu[3]:fenshu[2],"鏈接":url,"_id":job_id}
抓取競(jìng)爭(zhēng)力分析頁(yè)面,返回一個(gè)字典
主程序
if not os.path.exists("cookie.json"): get_cookie() f=open("cookie.json","r") cookies=json.loads(f.read()) f.close()
檢查cookie文件載入cookie,不存在執(zhí)行g(shù)et_cookie()把cookie保存到文件
session = requests.Session() for cookie in cookies: driver.add_cookie(cookie) session.cookies.set(cookie['name'],cookie['value']) url, page = get_job() driver.close()
在session和webdriver寫(xiě)入cookie登陸
獲取第一頁(yè)和url后webdriver就可以關(guān)掉了
code=[] for i in get_pages(url,page): code=code+get_job_code(i)
獲取的職位id添加到列表
import pymongo client=pymongo.MongoClient("localhost",27017) db=client["job_he"] job_info=db["job_info"] for i in code: try: if not job_info.find_one({"_id":i}): info=get_info(i) sleep(1) job_info.insert_one(info) print(info,"插入成功") except: print(code)
龜速爬取,用MongDB保存結(jié)果,職位id作為索引id,插入之前檢查id是否存在簡(jiǎn)單去重減少訪問(wèn)
吃完飯已經(jīng)抓到8000個(gè)職位了,篩選找到127個(gè)匹配度好的,開(kāi)始批量投遞
登陸狀態(tài)點(diǎn)擊申請(qǐng)職位,用wevdriver做
for i in job_info.find({"匹配度":{$regex:"排名很好"},"綜合競(jìng)爭(zhēng)力得分":{$gte:"80"}}): print(i) try: driver.get(i) driver.find_element_by_id("app_ck").click() sleep(2) except: pass
用cookie登陸簡(jiǎn)單for循環(huán)投遞,在Mongodb里查表,正則篩選匹配度和競(jìng)爭(zhēng)力得分獲取所有匹配結(jié)果
投遞成功
代碼
from selenium import webdriver from selenium.webdriver.chrome.options import Options chrome_options = Options() # chrome_options.add_argument('--headless') from time import sleep import re from lxml import etree import requests import os import json driver = webdriver.Chrome(chrome_options=chrome_options,executable_path = 'D:\python\chromedriver.exe') headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"} driver.get("https://search.51job.com/list/020000,000000,0000,00,9,99,%2520,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=") def get_cookie(): driver.get("https://login.51job.com/login.php?loginway=1&lang=c&url=") sleep(2) phone=input("輸入手機(jī)號(hào):") driver.find_element_by_id("loginname").send_keys(phone) driver.find_element_by_id("btn7").click() sleep(1) code=input("輸入短信:") driver.find_element_by_id("phonecode").send_keys(code) driver.find_element_by_id("login_btn").click() sleep(2) cookies = driver.get_cookies() with open("cookie.json", "w")as f: f.write(json.dumps(cookies)) def get_job(): driver.get("https://search.51job.com/list/020000,000000,0000,00,9,99,%2520,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=") sleep(2) job=input("輸入職位:") driver.find_element_by_id("kwdselectid").send_keys(job) driver.find_element_by_xpath('//button[@class="p_but"]').click() url=driver.current_url page=driver.page_source return url,page def close_driver(): driver.close() def get_pages(url,page): tree=etree.HTML(page) href=[] x = tree.xpath('//span[@class="td"]/text()')[0] total_page=int(re.findall("(\d+)", x)[0]) for i in range(1,total_page+1): href.append(re.sub("\d.html", f'{i}.html', url)) return href def get_job_code(url): headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"} r=session.get(url,headers=headers) tree=etree.HTML(r.text) divs=tree.xpath('//div[@class="el"]/p/span/a/@href') job=str(divs) job_id=re.findall("\/(\d+).html",job) return job_id def get_info(job_id): href=f"https://i.51job.com/userset/bounce_window_redirect.php?jobid={job_id}&redirect_type=2" r=session.get(href,headers=headers) r.encoding=r.apparent_encoding tree=etree.HTML(r.text) pingjia=tree.xpath('//div[@class="warn w1"]//text()')[0].strip() gongsi=[] for i in tree.xpath('//div[@class="lf"]//text()'): if i.strip(): gongsi.append(i.strip()) fenshu=[] for i in tree.xpath('//ul[@class="rt"]//text()'): if i.strip(): fenshu.append(i.strip()) url=f"https://jobs.51job.com/shanghai/{job_id}.html?s=03&t=0" return {"公司":gongsi[1],"職位":gongsi[0],"匹配度":pingjia,fenshu[3]:fenshu[2],"鏈接":url,"_id":job_id} if not os.path.exists("cookie.json"): get_cookie() f=open("cookie.json","r") cookies=json.loads(f.read()) f.close() session = requests.Session() for cookie in cookies: driver.add_cookie(cookie) session.cookies.set(cookie['name'], cookie['value']) url, page = get_job() driver.close() code=[] for i in get_pages(url,page): code=code+get_job_code(i) import pymongo client=pymongo.MongoClient("localhost",27017) db=client["job_he"] job_info=db["job_info"] for i in code: try: if not job_info.find_one({"_id":i}): info=get_info(i) sleep(1) job_info.insert_one(info) print(info) print("插入成功") except: print(code)
總結(jié)
以上就是這篇文章的全部?jī)?nèi)容了,希望本文的內(nèi)容對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,謝謝大家對(duì)億速云的支持。
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。