您好,登錄后才能下訂單哦!
web訪問日志中含有來訪IP,通過IP查看歸屬地,最后統(tǒng)計(jì)訪問的區(qū)域分布,可細(xì)化到省、市
淘寶接口地址:http://ip.taobao.com/service/getIpInfo.php?ip=14.215.177.38,后面的IP按需修改
例如要查看14.215.177.38這個(gè)地址的相關(guān)信息,返回的信息如下:
{"code":0,"data":
{"country":"\u4e2d\u56fd",
"country_id":"CN",
"area":"\u534e\u5357",
"area_id":"800000",
"region":"\u5e7f\u4e1c\u7701",
"region_id":"440000",
"city":"\u5e7f\u5dde\u5e02",
"city_id":"440100",
"county":"",
"county_id":"-1",
"isp":"\u7535\u4fe1",
"isp_id":"100017",
"ip":"14.215.177.38"}
}
返回內(nèi)容以字典形式保存,code表示查詢狀態(tài)(0為成功,1為失?。唧w的信息有:所屬國家、區(qū)域、省份、市、所屬運(yùn)營商。由于用unicode編碼,中文保存成\u4e2d等形式,使用unicode轉(zhuǎn)中文工具即可查看其中的內(nèi)容。
要求,分析訪問IP的所屬省份(國外IP劃分在一起),分析各個(gè)省份分布比例。日志中的IP先處理保存成次數(shù)+IP的格式:
代碼如下:
#!/usr/bin/env python #coding:utf-8 from __future__ import division import urllib2 bs_url = " # 定義一個(gè)全局字典,用來存放最終的統(tǒng)計(jì)數(shù)據(jù),保存格式{'省份':{'IP':次數(shù),...},...} region_dic = { } # 用于獲取IP信息的函數(shù),并計(jì)入以上的字典 def get_data(IP,WIGHT=1): city = "" area = "" country = "" region = "" isp = "" request = urllib2.Request(bs_url+IP) reponse = urllib2.urlopen(request) #print result result = eval(reponse.read()) #print result code = result['code'] country_id = result['data']['country_id'] #print country_id if code == 0: if country_id == 'CN': city = result['data']['city'].decode('unicode-escape') area = result['data']['area'].decode('unicode-escape') country = result['data']['country'].decode('unicode-escape') region = result['data']['region'].decode('unicode-escape') isp = result['data']['isp'].decode('unicode-escape') else: region = u"國外" #print region if region not in region_dic.keys(): region_dic['%s'%region] = { } region_dic['%s'%region]['%s'%IP] = int(WIGHT) else: print "request error" #print "IP:%s\nCity:%s\nArea:%s\nCountry:%s\nRegion:%s\nISP:%s"%(IP,city,area,country,region,isp) if __name__ == '__main__': count = -1 ip_list = [] fo = open('ips.txt','r') # 要分析的IP保存在文件中 for line in fo.xreadlines(): wight,ip = line.strip().split() get_data(ip,wight) count += int(wight) fo.close() print u'合計(jì):' for regions,stats in region_dic.items(): times = 0 for time in stats.values(): times += time print "%s:%.2f %%"%(regions.encode('utf-8'),int(times)/count)
運(yùn)行結(jié)果:
注:其他可用的IP庫接口:
新浪接口 http://int.dpool.sina.com.cn/iplookup/iplookup.php?format=js&ip=14.215.177.38
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。