您好,登錄后才能下訂單哦!
最近有個SQL運行時長超過兩個小時,所以準備優(yōu)化下
首先查看hive sql 產(chǎn)生job的counter數(shù)據(jù)發(fā)現(xiàn)
總的CPU time spent 過高估計100.4319973小時
每個map的CPU time spent
排第一的耗了2.0540889小時
建議設(shè)置如下參數(shù):
1、mapreduce.input.fileinputformat.split.maxsize現(xiàn)在是256000000 往下調(diào)增加map數(shù)(此招立竿見影,我設(shè)為32000000產(chǎn)生了500+的map,最后任務(wù)由原先的2小時提速到47分鐘就完成)
2、優(yōu)化UDF getPageID getSiteId getPageValue (這幾個方法用了很多正則表達式的文本匹配)
2.1 正則表達式處理優(yōu)化可以參考
http://www.fasterj.com/articles/regex1.shtml
http://www.fasterj.com/articles/regex2.shtml
2.2 UDF優(yōu)化見
1 Also you should use class level privatete members to save on object incantation and garbage collection. 2 You also get benefits by matching the args with what you would normally expect from upstream. Hive converts text to string when needed, but if the data normally coming into the method is text you could try and match the argument and see if it is any faster. Exapmle: 優(yōu)化前: >>>> import org.apache.hadoop.hive.ql.exec.UDF; >>>> import java.net.URLDecoder; >>>> >>>> public final class urldecode extends UDF { >>>> >>>> public String evaluate(final String s) { >>>> if (s == null) { return null; } >>>> return getString(s); >>>> } >>>> >>>> public static String getString(String s) { >>>> String a; >>>> try { >>>> a = URLDecoder.decode(s); >>>> } catch ( Exception e) { >>>> a = ""; >>>> } >>>> return a; >>>> } >>>> >>>> public static void main(String args[]) { >>>> String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; >>>> System.out.println( getString(t) ); >>>> } >>>> }
優(yōu)化后:
import java.net.URLDecoder; public final class urldecode extends UDF { private Text t = new Text(); public Text evaluate(Text s) { if (s == null) { return null; } try { t.set( URLDecoder.decode( s.toString(), "UTF-8" )); return t; } catch ( Exception e) { return null; } } //public static void main(String args[]) { //String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; //System.out.println( getString(t) ); //} }
3 繼承實現(xiàn)GenericUDF
3、如果是Hive 0.14 + 可以開啟hive.cache.expr.evaluation UDF Cache功能
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。