10. spark-shell
● 除了sc之外,還會起SQL Context
Spark context available as sc.
15/03/22 02:09:11 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
11. JAR
val sc: SparkContext // An existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
12. DF from RDD
● 先轉成RDD
scala> val data = sc.textFile("hdfs://localhost:54310/user/hadoop/ml-100k/u.data")
● 建立case class
case class Rattings(userId: Int, itemID: Int, rating: Int, timestmap:String)
● 轉成Data Frame
scala> val ratting = data.map(_.split("t")).map(p => Rattings(p(0).trim.toInt,
p(1).trim.toInt, p(2).trim.toInt, p(3))).toDF()
ratting: org.apache.spark.sql.DataFrame = [userId: int, itemID: int, rating: int, timestmap:
string]
13. DF from json
● 格式
{"movieID":242,"name":"test1"}
{"movieID":307,"name":"test2"}
● 可以直接呼叫
scala> val movie = sqlContext.jsonFile("hdfs://localhost:54310/user/hadoop/ml-
100k/movies.json")
20. GROUP BY
● 其他
o avg
o max
o min
o mean
o sum
● 更多Function
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$