spark寫orc格式文件

發(fā)布時(shí)間：2020-07-24 12:11:01 來源：網(wǎng)絡(luò) 閱讀：8215 作者：xiaobin0303 欄目：大數(shù)據(jù)

在hive中建表格式存儲(chǔ)格式為orc
create table user(id int,name string) stored as orc;
spark寫文件

    val jsons = "hdfs://localhost:9000/test/artist_orc.json"
    val people = sc.textFile(jsons)
    val schemaString = "id name"
    val schema = StructType(schemaString.split(" ").map(fieldName => {if(fieldName == "name")
      StructField(fieldName, StringType, true) else StructField(fieldName, IntegerType, true)}))

    val rowRDD = people.map(line=>{
      JSONObject.fromObject(line)
    }).map(p => Row(new Integer(p.get("id").toString), p.get("name")))

    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
    val peopleSchemaRDD = hiveContext.createDataFrame(rowRDD, schema)
    peopleSchemaRDD.write.format("orc").save("hdfs://localhost:9000/user/xb/warehouse/artist_orc/adf")

向AI問一下細(xì)節(jié)

spark寫orc格式文件

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽