您好,登錄后才能下訂單哦!
Spark2.2.0中RDD轉(zhuǎn)DataFrame的方式是怎樣的,相信很多沒(méi)有經(jīng)驗(yàn)的人對(duì)此束手無(wú)策,為此本文總結(jié)了問(wèn)題出現(xiàn)的原因和解決方法,通過(guò)這篇文章希望你能解決這個(gè)問(wèn)題。
Spark SQL將現(xiàn)有的RDDs轉(zhuǎn)換為數(shù)據(jù)集。
方法:使用反射來(lái)推斷包含特定對(duì)象類(lèi)型的RDD的模式。這種基于反射的方法使代碼更加簡(jiǎn)潔,并且當(dāng)您在編寫(xiě)Spark應(yīng)用程序時(shí)已經(jīng)了解了模式時(shí),它可以很好地工作。
第一種方法代碼實(shí)例java版本實(shí)現(xiàn):
數(shù)據(jù)準(zhǔn)備studentDatatxt
1001,20,zhangsan1002,17,lisi1003,24,wangwu1004,16,zhaogang
本地模式代碼實(shí)現(xiàn):
package com.unicom.ljs.spark220.study;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SQLContext;
/**
* @author: Created By lujisen
* @company ChinaUnicom Software JiNan
* @date: 2020-01-20 08:58
* @version: v1.0
* @description: com.unicom.ljs.spark220.study
*/
public class RDD2DataFrameReflect {
public static void main(String[] args) {
SparkConf sparkConf = new SparkConf().setMaster("local[*]").setAppName("RDD2DataFrameReflect");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
SQLContext sqlContext=new SQLContext(sc);
JavaRDD<String> lines = sc.textFile("C:\\Users\\Administrator\\Desktop\\studentData.txt");
JavaRDD<Student2> studentRDD = lines.map(new Function<String, Student2>() {
@Override
public Student2 call(String line) throws Exception {
String[] split = line.split(",");
Student2 student=new Student2();
student.setId(Integer.valueOf(split[0]));
student.setAge(Integer.valueOf(split[1]));
student.setName(split[2]);
return student;
}
});
//使用反射方式將RDD轉(zhuǎn)換成dataFrame
//將Student.calss傳遞進(jìn)去,其實(shí)就是利用反射的方式來(lái)創(chuàng)建DataFrame
Dataset<Row> dataFrame = sqlContext.createDataFrame(studentRDD, Student2.class);
//拿到DataFrame之后將其注冊(cè)為臨時(shí)表,然后針對(duì)其中的數(shù)據(jù)執(zhí)行SQL語(yǔ)句
dataFrame.registerTempTable("studentTable");
//針對(duì)student臨時(shí)表,執(zhí)行sql語(yǔ)句查詢(xún)年齡小于18歲的學(xué)生,
/*DataFrame rowDF */
Dataset<Row> dataset = sqlContext.sql("select * from studentTable where age < 18");
JavaRDD<Row> rowJavaRDD = dataset.toJavaRDD();
JavaRDD<Student2> ageRDD = rowJavaRDD.map(new Function<Row, Student2>() {
@Override
public Student2 call(Row row) throws Exception {
Student2 student = new Student2();
student.setId(row.getInt(0));
student.setAge(row.getInt(1));
student.setName(row.getString(2));
return student;
}
});
ageRDD.foreach(new VoidFunction<Student2>() {
@Override
public void call(Student2 student) throws Exception {
System.out.println(student.toString());
}
});
}
}
Student2類(lèi):
package com.unicom.ljs.spark220.study;
import java.io.Serializable;
/**
* @author: Created By lujisen
* @company ChinaUnicom Software JiNan
* @date: 2020-01-20 08:57
* @version: v1.0
* @description: com.unicom.ljs.spark220.study
*/
public class Student2 implements Serializable {
int id;
int age;
String name;
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
@Override
public String toString() {
return "Student2{" +
"id=" + id +
", age=" + age +
", name='" + name + '\'' +
'}';
}
}
pom.xml關(guān)鍵依賴(lài):
<spark.version>2.2.0</spark.version>
<scala.version>2.11.8</scala.version>
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>${spark.version}</version></dependency><dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>${spark.version}</version></dependency>
看完上述內(nèi)容,你們掌握Spark2.2.0中RDD轉(zhuǎn)DataFrame的方式是怎樣的的方法了嗎?如果還想學(xué)到更多技能或想了解更多相關(guān)內(nèi)容,歡迎關(guān)注億速云行業(yè)資訊頻道,感謝各位的閱讀!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀(guān)點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。