您好,登錄后才能下訂單哦!
這篇文章將為大家詳細(xì)講解有關(guān)DataFrame和SparkSql取值誤區(qū)的示例分析,小編覺得挺實(shí)用的,因此分享給大家做個(gè)參考,希望大家閱讀完這篇文章后可以有所收獲。
1、DataFrame返回的不是對(duì)象。
2、DataFrame查出來的數(shù)據(jù)返回的是一個(gè)dataframe數(shù)據(jù)集。
3、DataFrame只有遇見Action的算子才能執(zhí)行
4、SparkSql查出來的數(shù)據(jù)返回的是一個(gè)dataframe數(shù)據(jù)集。
原始數(shù)據(jù)
scala> val parquetDF = sqlContext.read.parquet("hdfs://hadoop14:9000/yuhui/parquet/part-r-00004.gz.parquet") df: org.apache.spark.sql.DataFrame = [timestamp: string, appkey: string, app_version: string, channel: string, lang: string, os_type: string, os_version: string, display: string, device_type: string, mac: string, network: string, nettype: string, suuid: string, register_days: int, country: string, area: string, province: string, city: string, event: string, use_interval_cat: string, use_duration_cat: string, use_interval: bigint, use_duration: bigint, os_upgrade_from: string, app_upgrade_from: string, page_name: string, event_name: string, error_type: string]
代碼
package DataFrame import org.apache.spark.sql.SQLContext import org.apache.spark.{SparkConf, SparkContext} /** * Created by yuhui on 2016/6/14. */ object DataFrameTest { def main(args: Array[String]) { DataFrameInto() } def DataFrameInto() { val conf = new SparkConf() val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) val df = sqlContext.read.parquet("hdfs://hadoop14:9000/yuhui/parquet") //df.map(line => printinfo(line.getString(0))) //df.foreach(line => printinfo(line.getString(0)+" , "+line.getString(14)+" , "+line.getString(15))) //df.select("timestamp","country","area").foreach(line=>printinfo(line.toString)) df.registerTempTable("infotable") sqlContext.sql("SELECT timestamp , country , area from infotable").foreach(line=>printinfo(line.toString)) } def printinfo(msg: String) {println("printinfo函數(shù)-->" + msg) } }
代碼解析
1、df.map(line => printinfo(line.getString(0)))
這段代碼不行執(zhí)行printinfo()函數(shù),因?yàn)橹挥衜ap算子,沒有Action算子。
2、df.foreach(line => printinfo(line.getString(0)+" , "+line.getString(14)+" , "+line.getString(15)))
通過Spark的Action算子接收數(shù)據(jù)進(jìn)行操作,執(zhí)行結(jié)果如下:
3、df.select("timestamp","country","area").foreach(line=>printinfo(line.toString))
通過DataFrame的API進(jìn)行操作,再通過Spark的Action算子打印出來,執(zhí)行結(jié)果如下:
4、sqlContext.sql("SELECT timestamp , country , area from infotable").foreach(line=>printinfo(line.toString))
執(zhí)行結(jié)果如下:
關(guān)于“DataFrame和SparkSql取值誤區(qū)的示例分析”這篇文章就分享到這里了,希望以上內(nèi)容可以對(duì)大家有一定的幫助,使各位可以學(xué)到更多知識(shí),如果覺得文章不錯(cuò),請(qǐng)把它分享出去讓更多的人看到。
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請(qǐng)聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。