HBase 之HFileOutputFormat

發(fā)布時間：2020-06-08 05:59:20 來源：網(wǎng)絡(luò) 閱讀：9301 作者：yyj0531 欄目：關(guān)系型數(shù)據(jù)庫

hadoop mr 輸出需要導(dǎo)入hbase的話最好先輸出成HFile格式，再導(dǎo)入到HBase,因為HFile是HBase的內(nèi)部存儲格式，所以導(dǎo)入效率很高,下面是一個示例
1. 創(chuàng)建HBase表t1

hbase(main):157:0* create 't1','f1' 
0 row(s) in 1.3280 seconds 
 
hbase(main):158:0> scan 't1' 
ROW                   COLUMN+CELL                                                
0 row(s) in 1.2770 seconds

2.寫MR作業(yè)
HBaseHFileMapper.java

package com.test.hfile; 
import java.io.IOException; 
import org.apache.hadoop.hbase.io.ImmutableBytesWritable; 
import org.apache.hadoop.hbase.util.Bytes; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Mapper; 
 
public class HBaseHFileMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Text> { 
    private ImmutableBytesWritable immutableBytesWritable = new ImmutableBytesWritable(); 
    @Override 
    protected void map(LongWritable key, Text value, 
            org.apache.hadoop.mapreduce.Mapper.Context context) 
            throws IOException, InterruptedException { 
        immutableBytesWritable.set(Bytes.toBytes(key.get())); 
        context.write(immutableBytesWritable, value); 
    } 
}

HBaseHFileReducer.java

package com.test.hfile; 
import java.io.IOException; 
import org.apache.hadoop.hbase.KeyValue; 
import org.apache.hadoop.hbase.io.ImmutableBytesWritable; 
import org.apache.hadoop.hbase.util.Bytes; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Reducer; 
 
public class HBaseHFileReducer extends Reducer<ImmutableBytesWritable, Text, ImmutableBytesWritable, KeyValue> {     
    protected void reduce(ImmutableBytesWritable key, Iterable<Text> values, 
            Context context) 
            throws IOException, InterruptedException { 
        String value=""; 
        while(values.iterator().hasNext()) 
        { 
            value = values.iterator().next().toString(); 
            if(value != null && !"".equals(value)) 
            { 
                KeyValue kv = createKeyValue(value.toString()); 
                if(kv!=null) 
                    context.write(key, kv); 
            } 
        } 
    } 
    // str格式為row:family:qualifier:value 簡單模擬下
    private KeyValue createKeyValue(String str) 
    { 
        String[] strstrs = str.split(":"); 
        if(strs.length<4) 
            return null; 
        String row=strs[0]; 
        String family=strs[1]; 
        String qualifier=strs[2]; 
        String value=strs[3]; 
        return new KeyValue(Bytes.toBytes(row),Bytes.toBytes(family),Bytes.toBytes(qualifier),System.currentTimeMillis(), Bytes.toBytes(value)); 
    } 
}

HbaseHFileDriver.java

package com.test.hfile; 
import java.io.IOException; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.hbase.HBaseConfiguration; 
import org.apache.hadoop.hbase.client.HTable; 
import org.apache.hadoop.hbase.io.ImmutableBytesWritable; 
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.util.GenericOptionsParser; 
 
public class HbaseHFileDriver { 
    public static void main(String[] args) throws IOException, 
            InterruptedException, ClassNotFoundException { 
         
        Configuration conf = new Configuration(); 
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); 
 
        Job job = new Job(conf, "testhbasehfile"); 
        job.setJarByClass(HbaseHFileDriver.class); 
 
        job.setMapperClass(com.test.hfile.HBaseHFileMapper.class); 
        job.setReducerClass(com.test.hfile.HBaseHFileReducer.class); 
 
        job.setMapOutputKeyClass(ImmutableBytesWritable.class); 
        job.setMapOutputValueClass(Text.class); 

        // 偷懶， 直接寫死在程序里了，實際應(yīng)用中不能這樣, 應(yīng)從命令行獲取
        FileInputFormat.addInputPath(job, new Path("/home/yinjie/input")); 
        FileOutputFormat.setOutputPath(job, new Path("/home/yinjie/output")); 
 
        Configuration HBASE_CONFIG = new Configuration(); 
        HBASE_CONFIG.set("hbase.zookeeper.quorum", "localhost"); 
        HBASE_CONFIG.set("hbase.zookeeper.property.clientPort", "2181"); 
        HBaseConfiguration cfg = new HBaseConfiguration(HBASE_CONFIG); 
        String tableName = "t1"; 
        HTable htable = new HTable(cfg, tableName); 
        HFileOutputFormat.configureIncrementalLoad(job, htable); 
 
        System.exit(job.waitForCompletion(true) ? 0 : 1); 
    } 
}

/home/yinjie/input目錄下有一個hbasedata.txt文件,內(nèi)容為

[root@localhost input]# cat hbasedata.txt  
r1:f1:c1:value1 
r2:f1:c2:value2 
r3:f1:c3:value3

將作業(yè)打包，我的到處路徑為/home/yinjie/job/hbasetest.jar
提交作業(yè)到hadoop運行:

[root@localhost job]# hadoop jar /home/yinjie/job/hbasetest.jar com.test.hfile.HbaseHFileDriver -libjars /home/yinjie/hbase-0.90.3/hbase-0.90.3.jar

作業(yè)運行完畢后查看下輸出目錄:

[root@localhost input]# hadoop fs -ls /home/yinjie/output 
Found 2 items 
drwxr-xr-x   - root supergroup          0 2011-08-28 21:02 /home/yinjie/output/_logs 
drwxr-xr-x   - root supergroup          0 2011-08-28 21:03 /home/yinjie/output/f1

OK, 已經(jīng)生成以列族f1命名的文件夾了。
接下去使用Bulk Load將數(shù)據(jù)導(dǎo)入到HBbase

[root@localhost job]# hadoop jar /home/yinjie/hbase-0.90.3/hbase-0.90.3.jar completebulkload /home/yinjie/output t1

導(dǎo)入完畢，查詢hbase表t1進行驗證

hbase(main):166:0> scan 't1' 
ROW                              COLUMN+CELL                                                                                  
 r1                              column=f1:c1, timestamp=1314591150788, value=value1                                          
 r2                              column=f1:c2, timestamp=1314591150814, value=value2                                          
 r3                              column=f1:c3, timestamp=1314591150815, value=value3                                          
3 row(s) in 0.0210 seconds

數(shù)據(jù)已經(jīng)導(dǎo)入!

向AI問一下細節(jié)

HBase 之HFileOutputFormat

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標簽