使用Spark-shell 進行word count
Step 1: 將word.txt上傳至HDFS
$ hadoop fs –put word.txt
Step 2: 啟動spark-shell
$ spark-shell
Step 3: 用spark-shell執行word count
scala> val textFile = sc.textFile("word.txt")
textFile: spark.RDD[String] = spark.MappedRDD@2ee9b6e3
scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
scala> wordCounts.collect()
res6: Array[(String, Int)] = Array((bbb,2), (eee,1), (ccc,1), (aaa,2))