使用spark-submit執行word count
Step 1: 將word.txt上傳至HDFS
$ hadoop fs –put word.txt
Step 2: 下載範例程式
$ git clone https://github.com/ogre0403/Spark-101.git
Initialized empty Git repository in /home/ogre/Spark-101/.git/
remote: Counting objects: 9, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 9 (delta 0), reused 9 (delta 0), pack-reused 0
Unpacking objects: 100% (9/9), done.
Step 3: 編譯spark程式
$ mvn clean package
…
[INFO] Building jar: /home/ogre/Spark-101/target/spark-sample-0.0.1.jar
[INFO] --------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] --------------------------------------------------------------------
[INFO] Total time: 1.700 s
[INFO] Finished at: 2016-03-30T13:24:08+08:00
[INFO] Final Memory: 39M/1556M
[INFO] -------------------------------------------------------------------
Step 4: 執行
$ spark-submit --master yarn \
--deploy-mode cluster \
--class WordCount \
spark-sample-0.0.1.jar word.txt output
Step 5: 檢查輸出
$ hadoop fs –ls output
Found 3 items
-rw-r--r-- 3 ogre supergroup 0 2016-03-30 13:31 output/_SUCCESS
-rw-r--r-- 3 ogre supergroup 16 2016-03-30 13:31 output/part-00000
-rw-r--r-- 3 ogre supergroup 16 2016-03-30 13:31 output/part-00001
$ hadoop fs –cat output/part-00000
(bbb,1)
(ddd,2)